深度学习(16):print(torch.cuda.is_available()) False的一个解决流程/思路
好久没使用cuda,今天需要使用cuda,但是报错了,
>>> print(torch.cuda.is_available())
False
下面记录排查和解决步骤:
(1)查看torch版本,及其是否与cuda版本是否匹配:
>>> import torch
>>> print(torch.__version__)
1.7.1+cu110
当时是按照torch官网上面的命令安装的,torch和cuda的版本是匹配的版本,如上所示,其中torch版本为对应cuda为11.0的1.7.1版本
(2)查看cuda是否还存在以及cuda的版本
nvcc -V
如下所示,可以看出cuda版本确实为11.0
meng@meng:~/ideas/python_kit/pytorch$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Thu_Jun_11_22:26:38_PDT_2020
Cuda compilation tools, release 11.0, V11.0.194
Build cuda_11.0_bu.TC445_37.28540450_0
(3)查看显卡驱动是否正常
nvidia-smi
meng@meng:~/ideas/python_kit/pytorch$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
没有出现显卡驱动的信息内容,我第一印象是显卡驱动可能被删了,或者出错了
(4)安装显卡驱动--这一步可能不需要执行
(注:这里采用“系统推荐”这个流程来重新安装显卡驱动)
查看推荐版本,为nvidia-driver-470(查看驱动与cuda的版本要求,470符合cuda-11.0的要求)
ubuntu-drivers devices
meng@meng:~/ideas/python_kit/pytorch$ ubuntu-drivers devices
WARNING:root:_pkg_get_support nvidia-driver-510-server: package has invalid Support PBheader, cannot determine support level
WARNING:root:_pkg_get_support nvidia-driver-510: package has invalid Support PBheader, cannot determine support level
== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==
modalias : pci:v000010DEd00002484sv00001462sd00003906bc03sc00i00
vendor : NVIDIA Corporation
driver : nvidia-driver-470-server - distro non-free
driver : nvidia-driver-510-server - distro non-free
driver : nvidia-driver-510 - distro non-free
driver : nvidia-driver-470 - distro non-free recommended
driver : xserver-xorg-video-nouveau - distro free builtin
== /sys/devices/pci0000:00/0000:00:14.3 ==
modalias : pci:v00008086d000043F0sv00008086sd00000074bc02sc80i00
vendor : Intel Corporation
manual_install: True
driver : backport-iwlwifi-dkms - distro free
安装推荐版本:
sudo ubuntu-drivers autoinstall
中途可能需要执行
sudo apt-get update
或
sudo apt-get update --fix-missing
测试安装效果,又回到“(3)查看显卡驱动是否正常”
meng@meng:~/ideas/python_kit/pytorch$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
(5)解决显卡驱动问题
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running
是一个常见问题,经常出现在ubuntu系统中,主要原因还是系统内核升级了,导致新版本内核和原来显卡驱动不匹配
查看nvidia版本号,我这里出现了两个版本号:
ll /usr/src/
安装dkms
sudo apt-get install dkms
匹配内核与显卡驱动:因为上面查看得到的nvidia版本号有两个,而“(4)安装显卡驱动--这一步可能不需要执行”中下载的先看驱动为470
先尝试让470版本与系统内核建立匹配--出错
尝试用495版本的----居然通了(不知为什么成了???)
sudo dkms install -m nvidia -v 495.46
查看显卡驱动是否正常:--正常
(6) 测试cuda是否可用
重新打开一个终端,并进入python环境----cuda可用!!!
程序也可以跑通了:
更多推荐
所有评论(0)