기존에 설치되어 있는 것들 제거. Ubuntu 20.04에는 기본으로 Nvidia Driver가 설치되어 있으므로 이를 지워주어야 함.
$ sudo apt-get purge nvidia*
$ sudo apt-get autoremove
$ sudo apt-get autoclean
CUDA 설치
1. 레포지토리 키 등록
$ sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
2. apt에 레포지토리 주소 등록
$ sudo bash -c 'echo "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 /" >> /etc/apt/sources.list.d/cuda.list'
$ sudo bash -c 'echo "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /" >> /etc/apt/sources.list.d/cuda.list'
3. CUDA, Nvidia Driver 설치 (CUDA를 설치하면 자동으로 이에 의존된 드라이버 자동으로 설치)
$ sudo apt update
$ sudo apt install cuda
$ sudo apt install cuda-10-1
4. cuDNN 설치를 위한 repo 패키지 설치
$ wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64/nvidia-machine-learning-repo-ubuntu2004_1.0.0-1_amd64.deb
$ sudo dpkg -i nvidia-machine-learning-repo-ubuntu2004_1.0.0-1_amd64.deb
5. cuDNN 레포지토리 추가
$ sudo bash -c 'echo "deb http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 /" >> /etc/apt/sources.list.d/nvidia-machine-learning.list'
6. cuDNN 설치 (CUDA 버전에 맞는 버전으로 설치해야 함)
$ sudo apt install libcudnn7-dev=7.6.5.32-1+cuda10.1 libcudnn7=7.6.5.32-1+cuda10.1
추후 업데이트시 최신버전으로 업데이트 되는 것을 방지
$ sudo apt-mark hold libcudnn7 libcudnn7-dev
libcudnn7 set on hold.
libcudnn7-dev set on hold.
이와 같이 설치를 완료하고, 재부팅한 다음… 드라이버가 잘 설치되어 있는지 확인
$ nvidia-smi
TensorFlow2 설치
$ sudo apt install python3-pip
$ sudo pip3 install -U tensorflow-gpu
환경변수 셋업
$ echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.1/lib64:/usr/local/cuda-10.2/lib64:/usr/local/cuda-10.1/lib64' >> ~/.bashrc
$ source ~/.bashrc
실행 및 설치 확인
$ python3
Python 3.8.5 (default, Jul 28 2020, 12:59:40)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
import tensorflow as tf
2020-10-08 15:02:50.803732: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
tf.config.list_physical_devices('GPU')
2020-10-08 15:03:01.717511: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-10-08 15:03:01.733336: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-08 15:03:01.733585: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1650 computeCapability: 7.5
coreClock: 1.56GHz coreCount: 16 deviceMemorySize: 3.82GiB deviceMemoryBandwidth: 119.24GiB/s
2020-10-08 15:03:01.733603: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-10-08 15:03:01.734679: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-10-08 15:03:01.735901: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-10-08 15:03:01.736061: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-10-08 15:03:01.737163: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-10-08 15:03:01.737773: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-10-08 15:03:01.740221: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-10-08 15:03:01.740295: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-08 15:03:01.740563: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-10-08 15:03:01.740768: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]