nvidia驱动突然报错:Failed to initialize NVML: Driver/library version mismatch解决方法
nvm
nvm-sh/nvm: 是一个 Node.js 版本管理器,用于在不同的 Node.js 版本之间进行切换。它可以帮助开发者轻松管理多个 Node.js 版本,方便进行开发和测试。特点包括轻量级、易于使用、支持跨平台等。
项目地址:https://gitcode.com/gh_mirrors/nv/nvm
免费下载资源
·
报错现象
最近机器查看显卡经常报错驱动版本不匹配,然后每次卸载重装过了1天就又报错
nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
原因分析
查看了依赖库版本发现没有我安装的的驱动版本,问了一圈没有人更新过系统内核和依赖库
#查看当前系统安装的驱动版本
cat /proc/driver/nvidia/version
-----------------------------------------------------------------------------------------
NVRM version: NVIDIA UNIX x86_64 Kernel Module 515.105.01 Mon Feb 27 12:49:44 UTC 2023
GCC version: gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)
-----------------------------------------------------------------------------------------
#查看库版本
dpkg --list | grep nvidia-
-----------------------------------------------------------------------------------------------------------------------------------------------------------
ii libnvidia-compute-495:amd64 510.108.03-0ubuntu0.22.04.1 amd64 Transitional package for libnvidia-compute-510
ii libnvidia-compute-510:amd64 525.147.05-0ubuntu2.22.04.1 amd64 Transitional package for libnvidia-compute-535
ii libnvidia-compute-535:amd64 535.183.01-0ubuntu0.22.04.1 amd64 NVIDIA libcompute package
ii libnvidia-container-tools 1.15.0-1 amd64 NVIDIA container runtime library (command-line tools)
ii libnvidia-container1:amd64 1.15.0-1 amd64 NVIDIA container runtime library
ii libnvidia-ml-dev:amd64 11.5.50~11.5.1-1ubuntu1 amd64 NVIDIA Management Library (NVML) development files
ii nvidia-container-runtime 3.14.0-1 all NVIDIA Container Toolkit meta-package
ii nvidia-container-toolkit 1.15.0-1 amd64 NVIDIA Container toolkit
ii nvidia-container-toolkit-base 1.15.0-1 amd64 NVIDIA Container Toolkit Base
ii nvidia-cuda-dev:amd64 11.5.1-1ubuntu1 amd64 NVIDIA CUDA development files
ii nvidia-cuda-gdb 11.5.114~11.5.1-1ubuntu1 amd64 NVIDIA CUDA Debugger (GDB)
rc nvidia-cuda-toolkit 11.5.1-1ubuntu1 amd64 NVIDIA CUDA development toolkit
ii nvidia-cuda-toolkit-doc 11.5.1-1ubuntu1 all NVIDIA CUDA and OpenCL documentation
ii nvidia-docker2 2.14.0-1 all NVIDIA Container Toolkit meta-package
ii nvidia-fabricmanager-515 525.147.05-0ubuntu2.22.04.1 amd64 Fabric Manager for NVSwitch based systems. (transitional package)
ii nvidia-fabricmanager-535 535.183.01-0ubuntu0.22.04.1 amd64 Fabric Manager for NVSwitch based systems.
ii nvidia-opencl-dev:amd64 11.5.1-1ubuntu1 amd64 NVIDIA OpenCL development files
ii nvidia-profiler 11.5.114~11.5.1-1ubuntu1 amd64 NVIDIA Profiler for CUDA and OpenCL
ii nvidia-visual-profiler 11.5.114~11.5.1-1ubuntu1 amd64 NVIDIA Visual Profiler for CUDA and OpenCL
-----------------------------------------------------------------------------------------------------------------------------------------------------------
查看系统日志发现有个定时自动更新任务,这个任务启动后我的驱动也挂了
Jun 28 06:11:38 ubuntu-server-A203 systemd[1]: Starting Daily apt upgrade and clean activities...
Jun 28 06:11:45 ubuntu-server-A203 kernel: [2646161.007373] NVRM: API mismatch: the client has the version 535.183.01, but
Jun 28 06:11:45 ubuntu-server-A203 kernel: [2646161.007373] NVRM: this kernel module has the version 515.105.01. Please
Jun 28 06:11:45 ubuntu-server-A203 kernel: [2646161.007373] NVRM: make sure that this kernel module and all NVIDIA driver
Jun 28 06:11:45 ubuntu-server-A203 kernel: [2646161.007373] NVRM: components have the same version.
Jun 28 06:11:45 ubuntu-server-A203 kernel: [2646161.135308] NVRM: API mismatch: the client has the version 535.183.01, but
Jun 28 06:11:45 ubuntu-server-A203 kernel: [2646161.135308] NVRM: this kernel module has the version 515.105.01. Please
Jun 28 06:11:45 ubuntu-server-A203 kernel: [2646161.135308] NVRM: make sure that this kernel module and all NVIDIA driver
Jun 28 06:11:45 ubuntu-server-A203 kernel: [2646161.135308] NVRM: components have the same version.
Jun 28 06:11:45 ubuntu-server-A203 kernel: [2646161.264986] NVRM: API mismatch: the client has the version 535.183.01, but
解决方法
关闭掉定时更新,重新安装一下驱动即可
#查看定时更新任务
systemctl list-timers apt-daily.timer
#停止定时更新任务
systemctl stop apt-daily.timer
#关闭开机自启动
sudo systemctl disable apt-daily.timer
验证
结论
由于系统每日定时自动更新并清除旧安装包,导致库将原有的驱动版本清除了,从而导致我机器的驱动挂掉,所以关闭这个每日自动更新即可
GitHub 加速计划 / nv / nvm
78.06 K
7.82 K
下载
nvm-sh/nvm: 是一个 Node.js 版本管理器,用于在不同的 Node.js 版本之间进行切换。它可以帮助开发者轻松管理多个 Node.js 版本,方便进行开发和测试。特点包括轻量级、易于使用、支持跨平台等。
最近提交(Master分支:2 个月前 )
9c9ff4ba
Moved issue template into ISSUE_TEMPLATE folder
13 天前
51ea809d - 12 天前
更多推荐
已为社区贡献1条内容
所有评论(0)