How to install Tensorflow with NVIDIA GPU - using the GPU for computing and display. GPU in the example is GTX 1080 and Ubuntu 16(updated for Linux MInt 19). The installation of tensorflow is by Virtualenv. For pip install of Tensorflow for CPU you can check here:
Installing tensorflow on Ubuntu google cloud platform
Steps described in this article:
- Prerequisite
- Install Required libraries
- Install Cuda Toolkit
- Install CUDNN
- Additional steps
- Install Tensorflow
- Create virtual environment
- Install Tensorflow
- Test Tensorflow
- Uninstall TensorFlow
- Script for activating TensorFlow
- TensorFlow Errors
Initial article setup:
- python 3.5
- Linux Mint 18
- CUDA 9.0
- CUDNN 7.1
Updated version(August 2018):
- python 3.6
- Linux Mint 19
- CUDA 9.2
- CUDNN 7.2
Why to use GPU vs normal CPU tensorflow. My tests are showing that a single NVidia 1080 is 10 times faster that 24 CPUs used from Google cloud platform. Of course this measurement is pretty lame and doesn't take into account many factors. But from practical point of view - one and the same NN with the same training set takes 48 hours on 24 CPUs and 4 on a single 1080(used in dual mode - display and compute).
Prerequisite
In order to follow this article you need:
- Ubuntu or Linux Mint
- GPU with cuda architecture
- installed python
- knowledge how to use Linux terminal commands
Install Required libraries
Update: you can install Cuda also by:
sudo apt install cuda-9-0
Older version of CUDA (like 7.0 and 8.0) can be found here:
CUDA Toolkit Archive
Install Cuda Toolkit
In order to use your CUDA GPU you need to install Cuda Toolkit. The latest available is 9.1 but so far it's not compatible with tensorflow and I had to downgrade it to 9.0 in order to avoid this error:
libcublas.so.9.0: cannot open shared object file: No such file or directory
The official documentation from NVidia is here: NVIDIA CUDA Installation Guide for Linux
In short:
- Verify the kernel headers and gcc - more info on the link above - in my case everything was fine. No need of actions for my installation of Ubuntu 16 ( also tested on Linux Mint 18)
- Download version for you from - CUDA Toolkit 9.1 Download
2.1 Select version 9.0 ( from legacy releases)
2.2 Operating System - Linux
2.3 Architecture - x86_64
2.4 Distribution - Ubuntu
2.5 Version - 16.04
2.6 Installer Type - deb (local) - Download the file
- Run the following scripts:
sudo dpkg -i cuda-repo-ubuntu1604-9-0-local_9.0.176-1_amd64.deb
sudo apt-key add /var/cuda-repo-9-0-local/7fa2af80.pub
sudo apt-get update
sudo apt-get install cuda
for 9.1
sudo dpkg -i cuda-repo-ubuntu1604-9-1-local_9.1.85-1_amd64.deb
sudo apt-key add /var/cuda-repo-9-1-local/7fa2af80.pub
sudo apt-get update
sudo apt-get install cuda
Set environmental variables:
add this to your path by adding line
export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}
or
export PATH=/usr/local/cuda-9.1/bin${PATH:+:${PATH}}
to your local variables by this command( or any other way to add a single line to file bashrc) :
sudo nano ~/.bashrc
Install CUDNN
You also need to install CUDNN from this link: cuDNN 7.1
- Create account
- Download cuDNN v7.1 (latest for 9.0 and 9.1)
- Select the cuDNNlibraries for Linux: development, documentation and runtime
- Install them by
sudo dpkg -i libcudnn7_7.1.2.21-1+cuda9.1_amd64.deb
sudo dpkg -i libcudnn7-dev_7.1.2.21-1+cuda9.1_amd64.deb
sudo dpkg -i libcudnn7-doc_7.1.2.21-1+cuda9.1_amd64.deb
Install Nvidia Driver
You need to go to: NVIDIA Driver Downloads and select driver for your card. For 1080 this are my filters:
- Product Type: GeForce
- Product Series: GeForce 10 Series
- Product GeForce GTX 1080
- Operating System: Linux 64-bit
Install the driver by
sudo sh NVIDIA-Linux-x86_64-390.48.run
you may need to run it without user interface only from terminal
Additional steps
Run:
sudo apt-get install cuda-command-line-tools
Set to your path:
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH:+${LD_LIBRARY_PATH}:}/usr/local/cuda/extras/CUPTI/lib64
with
sudo nano ~/.bashrc
You may need to install Java:
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install default-jre
sudo apt-get install default-jdk
Check your GPU information
You may need to check your GPU information in order to avoid error:
InvalidArgumentError (see above for traceback): Cannot assign a device to node 'MatMul_1': Could not satisfy explicit device specification '/device:GPU:0' because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0
One solution is to set your GPU as CUDA visible device by:
CUDA_VISIBLE_DEVICES=0
You can use some of the following commands:
lspci | grep -i nvidia
nvidia-smi
inxi -Fxz
Another error which could raised after fresh test of tensorflow with GPU support is:
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory
In this case you need to check if the GPU drivers are properly installed and working by:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.48 Driver Version: 390.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 Off | 00000000:20:00.0 On | N/A |
| 0% 46C P8 15W / 200W | 528MiB / 8116MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1291 G /usr/lib/xorg/Xorg 255MiB |
| 0 1826 G cinnamon 93MiB |
| 0 2045 G ...-token=0131A404F88EDFB69CFBE87FE27CF2B5 8MiB |
| 0 2776 G ...-token=4213622D51C92E34E15CBCBCB028F7CA 60MiB |
| 0 25949 G ...-token=81FD9A54DC3DB7C8FBAA2380BC4090AB 68MiB |
| 0 26937 G ...-token=730AF28A86DCED16D77B4CD5AF0378A4 39MiB |
+-----------------------------------------------------------------------------+
If this is not the case you can reinstall the video card driver. To uninstall all graphic drivers related to nvidia do:
sudo apt-get remove --purge nvidia*
Install Tensorflow
I prefer to create virtual environment for tensorflow because:
- you can have several different versions of tensorflow
- if something goes wrong you can easily fix or build new environment
- you can have less problems related to module and required libraries between different projects. For example one requires numpy 2.0 while other project requires different one.
Create virtual environment
Prior creating the environment you need to install several libraries:
sudo apt-get install -y python3-pip
sudo apt-get install build-essential libssl-dev libffi-dev python-dev
sudo apt-get install -y python3-venv
Create a folder for your evnironments. If you create a folder in your home you will be able to use the commands from the official documentation:
source ~/tensorflow/bin/activate
otherwise you need to write a simple script and run them. You can check my script at the end of the post.
Creating the environment:
virtualenv --system-site-packages -p python3 tensorflow
Check you directory in which you are going to create the environment. In the example above the new environment is named - tensorflow and it'll be create in the current folder. If you want you can choose different name like face:
virtualenv --system-site-packages -p python3 face
And activate the environment by
source ~/tensorflow/bin/activate # bash
Verify pip later than 8.1
easy_install -U pip
Install tensorflow
Installing tensorflow can be done by:
pip install --upgrade tensorflow-gpu # for Python 2.7 and GPU
pip3 install --upgrade tensorflow-gpu # for Python 3.n and GPU
Testing Tensorflow
You can activate your environment by(deactivation is simply by command - deactivate):
source ~/tensorflow/bin/activate
and test the tensorflow by this simple code:
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))
How to uninstall TensorFlow
Removing TensorFlow is simple and you only need to remove the environment that you created by:
$ rm -r myVirtEnv
Script for activating and running Tensorflow
This is a simple script activating tensorflow virtual environment from folder:
- /home/user/Software/Tensorflow/environments/tensorflow and running a training
in: - /home/user/Software/Tensorflow/myproject/faceRecognition/
cd /home/user/Software/Tensorflow/environments/tensorflow
source ./bin/activate
cd /home/user/Software/Tensorflow/myproject/faceRecognition/
python3 train.py
Errors
Error Failed to initialize NVML: Driver/library version mismatch
If you have error:
Failed to initialize NVML: Driver/library version mismatch
Most probably you need to restart your computer in order video driver information to be updated.
E cuda_driver.cc:466] failed call to cuInit: CUDA_ERROR_NO_DEVICE
The problem most probably is related to enviroment variable CUDA_VISIBLE_DEVICES. You can check what is set for CUDA_VISIBLE_DEVICES. and if needed to set it to 0:
export CUDA_VISIBLE_DEVICES = 0
if this is not the case then check nvidia driver by:
nvidia-smi
Not loading or using latest nvdia driver
If tensorflow is not using the latest nvidia driver the training of neural nets will take much longer in order to verify that check:
nvidia-smi
and
dpkg --get-selections | grep nvidia
If you see several versions like: libnvidia-common-390 and libnvidia-common-396
its better to remove one of them. In my case I removed 396 and this solve the issue after restart.
InvalidArgumentError (see above for traceback): Cannot assign a device to node 'MatMul_1'
You may need to check your GPU information in order to avoid error:
InvalidArgumentError (see above for traceback): Cannot assign a device to node 'MatMul_1': Could not satisfy explicit device specification '/device:GPU:0' because no devices matching that specification are registered in this process; available devices: /job:localhost/replica:0/task:0/cpu:0
One solution is to set your GPU as CUDA visible device by:
CUDA_VISIBLE_DEVICES=0
Resources and additional information
- official documentation - Installing TensorFlow
- useful guide for 1080 Ti computation only - Installing-Tensorflow-with-GPU
- Short tutorial from Digital ocean- How To Install and Use TensorFlow on Ubuntu 16.04
- NVIDIA CUDA Installation Guide for Linux