安装最新版Ubuntu20.04作为体验,发现安装本地Tensorflow遇到很多问题,此处记录一下解决方法。

问题:

  • CUDA 10.1 requires gcc <= 8
  • Python3.8
1
2
3
$cat /var/log/cuda-installer.log
...
[ERROR]: unsupported compiler version: 9.3.0. Use --override to override this check.

解决Python版本问题

参考这个,使用Conda或Docker创建多版本Python环境

解决gcc版本问题

这里可参考旧文Linux系统中安装多版本gcc

安装CUDA Toolkit

下载 run 版本的cuda

1
2
$chmod a+x cuda_10.1.243_418.87.00_linux.run
$sudo ./cuda_10.1.243_418.87.00_linux.run --silent --toolkit --samples --librarypath=/usr/local/cuda

你也可以输入./cuda_10.1.243_418.87.00_linux.run --help 看看其他参数。

查看CUDA版本

1
2
3
4
5
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

cuDNN

cuDNN下载地址在这里cudnnlib

1
2
3
4
$ tar -xzvf cudnn-10.1-linux-x64-v7.6.5.32.tgz
$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
$ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

环境变量

.bashrc

1
2
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64

安装Tensorflow

conda create env

1
pip -i https://pypi.tuna.tsinghua.edu.cn/simple install tensorflow

测试

写一个简单的模型测试

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras import backend as K

num_classes = 10
img_rows, img_cols = 28, 28

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# channel last
x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

y_train = tf.keras.utils.to_categorical(y_train, num_classes)
y_test = tf.keras.utils.to_categorical(y_test, num_classes)

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
activation='relu',
input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

model.compile(loss=tf.keras.losses.categorical_crossentropy,
optimizer=tf.keras.optimizers.Adam(),
metrics=['accuracy'])

model.fit(x_train, y_train,
batch_size=1024,
epochs=20,
verbose=1,
validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

watch nvidia-smi

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100 Driver Version: 440.100 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 207... Off | 00000000:01:00.0 On | N/A |
| 37% 59C P2 212W / 255W | 3770MiB / 7974MiB | 91% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1232 G /usr/lib/xorg/Xorg 60MiB |
| 0 1943 G /usr/lib/xorg/Xorg 283MiB |
| 0 2145 G /usr/bin/gnome-shell 135MiB |
| 0 2551 G ...AAAAAAAAAAAACAAAAAAAAAA= --shared-files 236MiB |
| 0 8349 G ...quest-channel-token=1436292411171661387 296MiB |
| 0 27901 G /usr/bin/totem 18MiB |
| 0 56093 C python3 2715MiB |
+-----------------------------------------------------------------------------+

此外,你也可以下载 https://github.com/tensorflow/benchmarks 上面的源码来测试。

总结

以上是在Ubuntu20.04上安装Tensorflow,不过Ubuntu20.04发布不久,不知道会遇到什么问题,而且很多工具还不支持,建议还是作为尝鲜试试,不要把开发环境迁移到这里。

转载请包括本文地址:https://allenwind.github.io/blog/12238/
更多文章请参考:https://allenwind.github.io/blog/archives/