新老用户进群领券,100% 有券等你来冲!新老用户进群领券,100% 有券等你来冲! 体验按量付费开发机!立即领券
Skip to content
回到全部文章

在容器化 Ubuntu 22.04 上安装 CUDA / cuDNN

如果需要特定版本的 CUDA 和 cuDNN,建议自行安装 CUDA ToolKit。本指南提供了一个安装过程示例。

NOTE

用户可以自行安装 CUDA 运行时和开发工具来编写和执行 GPU 应用程序,但无法单独安装 NVIDIA 驱动程序,因为 NVIDIA Driver 由宿主机管理。

目录

  1. 环境准备
  2. 安装 CUDA 11.8
  3. 安装 cuDNN
  4. 安装 PyTorch
  5. 验证安装
  6. 故障排除
  7. 参考资料

版本要求

假设需要安装以下软件包版本。已知平台预置镜像并未提供预装以下版本组合的镜像,因此可以尝试自行安装:

  • cuda=11.8.0-1
  • libcudnn8=8.9.2.26-1+cuda11.8
  • libcudnn8-dev=8.9.2.26-1+cuda11.8

环境信息

  • 开发机使用 Ubuntu Ubuntu 22.04 为基础镜像:cr.infini-ai.com/infini-ai/ubuntu:22.04-20240429

环境准备

查看已安装的 Nvidia 驱动版本

shell
nvidia-smi

输出类似,CUDA Version: 12.2 表示最高可支持 CUDA 12.2。

shell
root@is-c76fxbatxfq26ehn-devmachine-0:~# nvidia-smi
Tue Oct  8 15:43:47 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4090        On  | 00000000:A1:00.0 Off |                  Off |
| 30%   28C    P8              12W / 450W |      1MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

WARNING

您无法在容器中独立安装或更改 NVIDIA 驱动程序版本,只能使用宿主机上已经安装的版本。

更新系统包列表:

shell
apt update

安装 Python

安装 Python 3:

shell
apt install python3 python3-pip -y

安装 lsmod 和 dkms(可选)

如果需要使用 CUDA Toolkit 提供的 Kernel Objects,则需要提前安装 lsmod 和 dkms。

shell
apt-get update
apt-get install kmod dkms

安装 CUDA 11.8

以 Runfile 的方式安装 CUDA 11.8。

alt text

下载 CUDA Toolkit 安装文件

CUDA Toolkit 下载地址:

我们需要历史版本,进入下载页后选择 CUDA Toolkit 11.8.0,依次筛选得到 Linux Ubuntu 22.04 x86_64 的下载和安装命令。

shell
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run

安装 CUDA Toolkit

执行安装脚本:

shell
sudo sh cuda_11.8.0_520.61.05_linux.run

稍等片刻,会提示接受 EULA 协议。输入 accept 接受协议。

┌──────────────────────────────────────────────────────────────────────────────┐
│  End User License Agreement                                                  │
│  --------------------------                                                  │
│                                                                              │
│  NVIDIA Software License Agreement and CUDA Supplement to                    │
│  Software License Agreement. Last updated: October 8, 2021                   │
│                                                                              │
│  The CUDA Toolkit End User License Agreement applies to the                  │
│  NVIDIA CUDA Toolkit, the NVIDIA CUDA Samples, the NVIDIA                    │
│  Display Driver, NVIDIA Nsight tools (Visual Studio Edition),                │
│  and the associated documentation on CUDA APIs, programming                  │
│  model and development tools. If you do not agree with the                   │
│  terms and conditions of the license agreement, then do not                  │
│  download or use the software.                                               │
│                                                                              │
│  Last updated: October 8, 2021.                                              │
│                                                                              │
│                                                                              │
│  Preface                                                                     │
│  -------                                                                     │
│                                                                              │
│──────────────────────────────────────────────────────────────────────────────│
│ Do you accept the above EULA? (accept/decline/quit):                         │
│                                                                              │
└──────────────────────────────────────────────────────────────────────────────┘

同意协议后,按照提示进行安装,选择自定义安装,只选择 CUDA Toolkit 和相关库。

┌──────────────────────────────────────────────────────────────────────────────┐
│ CUDA Installer                                                               │
│ - [ ] Driver                                                                 │
│      [ ] 520.61.05                                                           │
│ + [X] CUDA Toolkit 11.8                                                      │
│   [X] CUDA Demo Suite 11.8                                                   │
│   [X] CUDA Documentation 11.8                                                │
│ - [ ] Kernel Objects                                                         │
│      [ ] nvidia-fs                                                           │
│   Options                                                                    │
│   Install                                                                    │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│ Up/Down: Move | Left/Right: Expand | 'Enter': Select | 'A': Advanced options │
└──────────────────────────────────────────────────────────────────────────────┘

安装完成后,输出如下:

shell
===========
= Summary =
===========

Driver:   Not Selected
Toolkit:  Installed in /usr/local/cuda-11.8/

Please make sure that
 -   PATH includes /usr/local/cuda-11.8/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-11.8/lib64, or, add /usr/local/cuda-11.8/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.8/bin
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 520.00 is required for CUDA 11.8 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
    sudo <CudaInstaller>.run --silent --driver

Logfile is /var/log/cuda-installer.log

配置 CUDA 环境变量

设置 PATHLD_LIBRARY_PATHCUDA_HOME(通用路径和 CUDA 11.8 特定路径):

shell
echo 'export PATH=/usr/local/cuda/bin:/usr/local/cuda-11.8/bin${PATH:+:${PATH}}' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda-11.8/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}' >> ~/.bashrc
echo 'export CUDA_HOME=/usr/local/cuda' >> ~/.bashrc

将 CUDA 库路径添加到 /etc/ld.so.conf

shell
echo '/usr/local/cuda/lib64' | sudo tee -a /etc/ld.so.conf
echo '/usr/local/cuda-11.8/lib64' | sudo tee -a /etc/ld.so.conf

运行 ldconfig

shell
sudo ldconfig

应用更改到当前会话:

shell
source ~/.bashrc

验证设置:

shell
echo $PATH | grep -E "cuda|cuda-11.8"
echo $LD_LIBRARY_PATH | grep -E "cuda|cuda-11.8"
echo $CUDA_HOME
ldconfig -p | grep "libcudart"

NOTE

通用路径 (/usr/local/cuda) 通常是指向最新安装版本的符号链接。

验证 nvcc 可用:

shell
nvcc --version

输出:

shell
root@is-c76fxbatxfq26ehn-devmachine-0:~# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

安装 cuDNN

cuDNN(CUDA Deep Neural Network library)不随 CUDA Toolkit 分发,因此需要单独下载和安装。

下载 cuDNN 安装文件

下载 cuDNN 需要 NVIDIA 开发者账号,如果没有请先注册。

cuDNN 下载地址:

我们需要历史版本,进入下载页后选择 cuDNN 8.x - 1.x,点击后找到 Download cuDNN v8.9.2 (June 1st, 2023), for CUDA 11.x,再点击后得到 Local Installer for Ubuntu22.04 x86_64 (Deb)。此处无法直接获取下载链接,可以在点击下载后通过浏览器下载页面获取临时下载链接。或者下载到您本地计算机后上传到开发机。

shell
wget -O cudnn-local-repo-ubuntu2204-8.9.2.26_1.0-1_amd64.deb 'https://developer.download.nvidia.cn/compute/cudnn/secure/8.9.2/local_installers/11.x/cudnn-local-repo-ubuntu2204-8.9.2.26_1.0-1_amd64.deb?qXsRCTioDTcdUcliWfUeVtaCLd1JPDBsrZ8se-9MIRoZvicekr7xz1khYQ53nsSJ-ljIhSjSOcNvpdNWFRhYoigdxk0_d1ho7ht99lt_jnhpMjX_eTNX3_KbkBcEg6bmK5pzPh1oklZcf_IZ9Tj9Q_uO6cqMxfYjYo-zQiHrakth4KMjq5ZGuXFwyEa12G81KtLV_pJv-W5FDvtT1dX2XzcizGo=&t=eyJscyI6IndlYnNpdGUiLCJsc2QiOiJkZXZlbG9wZXIubnZpZGlhLmNvbS9jdWRubiJ9'

检查下载文件大小:

shell
root@is-c76fxbatxfq26ehn-devmachine-0:~# ls -lh cudnn-local-repo-ubuntu2204-8.9.2.26_1.0-1_amd64.deb

从输出可以看到文件约为 879MB:

shell
-rw-r--r-- 1 root root 879M Jun  1  2023 cudnn-local-repo-ubuntu2204-8.9.2.26_1.0-1_amd64.deb

安装 cudnn

安装 cuDNN 本地源:

shell
root@is-c76fxbatxfq26ehn-devmachine-0:~# dpkg -i cudnn-local-repo-ubuntu2204-8.9.2.26_1.0-1_amd64.deb

输出如下:

shell
Selecting previously unselected package cudnn-local-repo-ubuntu2204-8.9.2.26.
(Reading database ... 44872 files and directories currently installed.)
Preparing to unpack cudnn-local-repo-ubuntu2204-8.9.2.26_1.0-1_amd64.deb ...
Unpacking cudnn-local-repo-ubuntu2204-8.9.2.26 (1.0-1) ...
Setting up cudnn-local-repo-ubuntu2204-8.9.2.26 (1.0-1) ...

The public cudnn-local-repo-ubuntu2204-8.9.2.26 GPG key does not appear to be installed.
To install the key, run this command:
sudo cp /var/cudnn-local-repo-ubuntu2204-8.9.2.26/cudnn-local-D7CBF0C2-keyring.gpg /usr/share/keyrings/

根据提示,执行安装密钥命令:

shell
sudo cp /var/cudnn-local-repo-ubuntu2204-8.9.2.26/cudnn-local-D7CBF0C2-keyring.gpg /usr/share/keyrings/

更新软件包列表:

shell
root@is-c76fxbatxfq26ehn-devmachine-0:~# sudo apt update

安装 cuDNN 库:

shell
root@is-c76fxbatxfq26ehn-devmachine-0:~# sudo apt-get install -y libcudnn8=8.9.2.26-1+cuda11.8 libcudnn8-dev=8.9.2.26-1+cuda11.8

输出如下:

shell
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following NEW packages will be installed:
  libcudnn8 libcudnn8-dev
0 upgraded, 2 newly installed, 0 to remove and 84 not upgraded.
Need to get 0 B/919 MB of archives.
After this operation, 2,510 MB of additional disk space will be used.
Get:1 file:/var/cudnn-local-repo-ubuntu2204-8.9.2.26  libcudnn8 8.9.2.26-1+cuda11.8 [465 MB]
Get:2 file:/var/cudnn-local-repo-ubuntu2204-8.9.2.26  libcudnn8-dev 8.9.2.26-1+cuda11.8 [455 MB]
debconf: delaying package configuration, since apt-utils is not installed
Selecting previously unselected package libcudnn8.
(Reading database ... 44888 files and directories currently installed.)
Preparing to unpack .../libcudnn8_8.9.2.26-1+cuda11.8_amd64.deb ...
Unpacking libcudnn8 (8.9.2.26-1+cuda11.8) ...
Selecting previously unselected package libcudnn8-dev.
Preparing to unpack .../libcudnn8-dev_8.9.2.26-1+cuda11.8_amd64.deb ...
Unpacking libcudnn8-dev (8.9.2.26-1+cuda11.8) ...
Setting up libcudnn8 (8.9.2.26-1+cuda11.8) ...
Setting up libcudnn8-dev (8.9.2.26-1+cuda11.8) ...
update-alternatives: using /usr/include/x86_64-linux-gnu/cudnn_v8.h to provide /usr/include/cudnn.h (libcudnn) in auto mode

验证 cuDNN 安装

  1. 检查 cuDNN 头文件:

    shell
    ls /usr/include/cudnn*.h

    使用 apt-get 方式安装 cuDNN 会将头文件复制到 /usr/include 目录中。

  2. 检查 cuDNN 版本信息:

    shell
    cat /usr/include/cudnn_version.h | grep CUDNN_MAJOR -A 2

    如果这个命令没有输出,可能需要检查其他头文件。

  3. 验证 cuDNN 库文件安装:

    shell
    ls -l /usr/lib/x86_64-linux-gnu/libcudnn*
  4. 如果无法在头文件中找到版本信息,可以尝试检查已安装的 libcudnn 包版本:

    shell
    apt list --installed | grep libcudnn

NOTE

cuDNN 文件的位置和结构可能因安装方法和版本而异。如果使用这些方法无法找到版本信息,可能需要查阅 NVIDIA 文档或使用 CUDA 运行时 API 调用来查询版本。

更新环境变量

如果需要,更新 ~/.bashrc 文件:

shell
echo 'export CPATH=/usr/include:$CPATH' >> ~/.bashrc
echo 'export LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:$LIBRARY_PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc

更新 ldconfig 缓存:

shell
sudo ldconfig

参考资料