👉 原文地址 : https://blog.hidavid.cn/cuda-cudnn-install-successful/

最近又捡起YOLOv3来练练手,检测医学B超图像。

重新搭建环境,由于网速时快时慢,搭建起来相当痛苦,最终还是搭建完成了。

下面分享下如何判断CUDA是否正常使用。

正文

一、判断安装情况

首先是判断cuda是否安装成功。
一般安装路径为/usr/local/cuda
使用nvcc -v命令可以输出cuda版本

然后是判断cudnn,这个库安装很简单,只需把cudnn的include和lib64里面的文件拷到cuda相应目录即可,所以判断是否安装的方式是,到cuda的include和lib64,用ls | grep cudnn 命令查看是否有cudnn相关的文件。

二、判断是否正常使用CUDA

判断方式很多,我以使用tensorflow为例。

在启动tensorflow的时候,会有下面的log,这能看出来cuda和cudnn的库都顺利加载进来了。

2020-10-28 13:23:32.729663: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2020-10-28 13:23:32.732189: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2020-10-28 13:23:32.734612: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2020-10-28 13:23:32.735064: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2020-10-28 13:23:32.737413: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2020-10-28 13:23:32.739131: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2020-10-28 13:23:32.744086: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2020-10-28 13:23:32.754026: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0, 1
2020-10-28 13:23:32.754078: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2020-10-28 13:23:32.759785: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-10-28 13:23:32.759804: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 1
2020-10-28 13:23:32.759842: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N Y
2020-10-28 13:23:32.759868: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 1:   Y N
2020-10-28 13:23:32.767512: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 30265 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:8b:00.0, compute capability: 7.0)
2020-10-28 13:23:32.770834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 30265 MB memory) -> physical GPU (device: 1, name: Tesla V100-SXM2-32GB, pci bus id: 0000:8d:00.0, compute capability: 7.0)
2020-10-28 13:24:08.051764: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7

在训练过程,也可以通过观察CPU和GPU的使用情况来判断。

比如输入top指令,可以试试查看cpu和mem的使用情况,可看出cpu使用率还挺高的,由于多核的原因,使用率超过100%了。我在没使用GPU训练的时候,没记错的话cpu占用率接近800%。

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  5374 root      20   0 65.426g 6.152g 1.499g S 226.6  1.2  27:42.84 python

然后是nvidia-smi命令,查看gpu使用情况,下表可看出gpu显存的使用率为76%,那就表示GPU正被使用了。当GPU显存使用率接近100%,tensorflow就会蹦了,此时一般要降低batchsize来处理。

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:8B:00.0 Off |                    0 |
| N/A   58C    P0   160W / 300W |  31361MiB / 32480MiB |     76%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  On   | 00000000:8D:00.0 Off |                    0 |
| N/A   45C    P0    59W / 300W |    625MiB / 32480MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|

Logo

旨在为数千万中国开发者提供一个无缝且高效的云端环境,以支持学习、使用和贡献开源项目。

更多推荐