Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Install GPU TensorFlow from Sources with Ubuntu 16.04 and Cuda 8.0 RC (alliseesolutions.wordpress.com)
85 points by wagonhelm on Sept 8, 2016 | hide | past | favorite | 32 comments


If using Docker is an option, the official Dockerfile works well, you just need to modify the FROM line to "nvidia/cuda:8.0-cudnn5-devel-ubuntu16.04". Or "nvidia/cuda:8.0-cudnn5-devel-ubuntu14.04", depending on which version of Ubuntu you want.

https://github.com/tensorflow/tensorflow/blob/master/tensorf...


What sort of performance impact would Docker have in this situation? Any at all?

Edit: spelling


No performance impact as long as your I/O is done in volumes, to avoid going through AUFS.


This seems to be a frankenstein of cuda 7.5 instructions and cuda 8.0. Similarly ubuntu 14.04 and 16.04. As far as i know, these instructions will fail from gcc 5.4 errors, amongst other issues.


Wont GCC fail, includes a patch for GCC, I tested on my system before posting. Running great!


One of your section is named "Install Nvidia Toolkit 7.5", this is probably what confused parent @hughperkins.


Just found that one and fixed it, also someone confirmed working with a GTX 1080. I am so happy to finally ditch 14.04


Ah, I see what I was missing before: there's a patch 1 of cuda 8.0, which adds gcc 5.4 support, and is installed in the cuda section using:

    sudo sh cuda_8.0.27.1_linux.run --silent --accept-eula


Still doesnt work for me though: even on a new box, I get:

    ubuntu@somewhere:~/tensorflow$ python3 -c 'import tensorflow'             
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/home/ubuntu/tensorflow/tensorflow/__init__.py", line 23, in <module>
        from tensorflow.python import *
      File "/home/ubuntu/tensorflow/tensorflow/python/__init__.py", line 49, in <mod
    ule>
        from tensorflow.python import pywrap_tensorflow
    ImportError: cannot import name 'pywrap_tensorflow'


Ok. Fixed this by two things:

- using branch r0.10, as suggested by https://news.ycombinator.com/item?id=12464835 - making sure to install the new r0.10 wheel, which has a different name than the r0 wheel built by master :-D


Thanks for sharing your solution!


Why does it just take following directions to make it to the front page of HN these days?


As stated elsewhere, this can actually be a very frustrating process. I lost a good chunk of my long weekend trying to build TF from source for CUDA 8.0 / cuDNN 5.1. Generally speaking the culprit is that the CUDA installers for Linux are highly dependent on your kernel and gcc versions. This is a huge headache for people who want to stay up-to-date on their distro packages. CentOS has no problem because hardly anything changes, but you're essentially handcuffed to whatever version s of Ubuntu or Fedora were out when NVIDIA decided to start packaging up the next release. Bumping gcc to 5.4 in Ubuntu 16.04.1 broke the 16.04 installer, which relied on gcc 5.3.


Because GPU-accelerated learning is exciting, and most of the directions you find for setting it up don't work. (Judging from other replies, this post may be no different.)

This probably has something to do with the fact that GPUs are flaky and idiosyncratic, and all the software that uses them depends on black-box libraries handed down by Nvidia, who is completely shit at maintaining software.



Thank you so much. I tried 3 times in the last week to get it working with Cuda 8.0


Have you actually succeeded yet in building a pip package linked with CUDA 8? I am doing basically the same process, and it only keeps failing.

https://github.com/tensorflow/tensorflow/issues/2559#issueco...


The pip package doesn't build for me either. And I even reinstalled Ubuntu to get a fresh Python installation


So strange I wonder why its working for me? Ubuntu 16.04.1? Fresh install? following exactly? Python 3?


Which branch are you building?

For the first time I was able to complete a build last night, Ubuntu 16.04, CUDA 8.0 RC + compiler patch, cuDNN 5.1, nvidia-driver-370, python-2.7, and compute capability 6.1 (for Pascal GPU) - but only when I switched to the r0.10 branch.

With r0.10 I see none of the multiple failure modes that I always see with master. It just went straight ahead and compiled the whole thing.


I used master but seems like r.0.10 seems to be solving the problem for most.


fwiw: twice now, I've successfully gotten a pip package linked with CUDA 8 & built Tensorflow from source — once for Python 2 and another for Python 3. Both on an Ubuntu 14.04 system


Many welcomes to you


I'm not sure what Tensorflow source you're compiling, but I've been trying many times recently and it fails in many, many different ways. It's a neverending maze of fail, basically. I've never seen the end of it yet. It failed today, too, so the code base is not getting better.

I'm using Ubuntu 16.04, CUDA 8.0RC + the gcc patch, cuDNN 5.1, nvidia-driver-[367|370], tensorflow-master, python-2.7. My process is basically identical to yours.

A few issues are listed here:

https://github.com/tensorflow/tensorflow/issues/2559#issueco...

In some cases, Bazel seems to be the culprit. In other cases, it's Tensorflow itself. I've also seen a "gcc: internal compiler error" https://github.com/tensorflow/tensorflow/issues/4214

Some issues with your howto:

There's a chapter title "Install Nvidia Toolkit 7.5 & CudNN" but the instructions below use 8.0RC

```

Configure TensorFlow Installation

$ cd ~/tensorflow $ ./configure Use defaults by pressing enter for all except:

Please specify the location of python. [Default is /usr/bin/python]:

```

No. If you do that it won't compile with GPU support. You have to hit Enter on every question except these ones:

- Do you wish to build TensorFlow with GPU support? (answer: y)

- Please specify a list of comma-separated Cuda compute capabilities you want to build with. (answer: 6.1, or less for older GPUs)

- Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: (answer 8.0)

You don't have to specify the cuDNN version, apparently it can detect the version automatically. It's only the CUDA version detection that fails. https://github.com/tensorflow/tensorflow/issues/3985

"You must also have the 361.42 NVidia drivers installed"

No, that would not work with Pascal GPUs.

The only way I've seen it work is if you install CUDA 7.5 and cuDNN 4, and install Tensorflow from the binary package. But then you get weird errors if you run complex models on Pascal GPUs, because CUDA 7.5 doesn't work well with Pascal.

Seriously, if you made it work on Ubuntu 16.04 with CUDA 8 and it's GPU enabled, please upload the pip package somewhere. I'd love to give it a try.


Just follow the instructions friend.


I may go ahead and do a literal clone of your instructions. However, looking at your process, it's what I do, step by step, AFAICT without actually going ahead and doing it.

It's also the fact that it fails in so many different ways. Bazel bombs out after ./configure; the master branch today does not even begin to build at all, the old Bazel workaround is not working anymore. Then there's the gcc issues.

You may have gotten lucky once, who knows why.

Again, do you still have the pip package you claim you've built using this procedure? If so, can you upload it somewhere? I would very much like to test it. Thank you.


https://www.dropbox.com/s/0cdoy7e8xh54wrx/tensorflow-0.10.0r...

Here is the pip3 wheel but I am skeptical given it was built for my system.


Pip is building right now, will give dropbox soon.


Sorry missed that other 7.5 reference, replaced with 8. Sorry copy and pasted my 7.5 instructions as they were mainly the same.


and yes sorry thought hitting yes for gpu support was obvious and did not see the need to type that one out.

there is also a link for the list of cuda capabilities in the post.


"No, that would not work with Pascal GPUs."

I cannot test this but possibly just use whatever version the drivers work and make sure the run file does not install different drivers.


fwiw, deeplearning4j is a much easier install: http://deeplearning4j.org/quickstart




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: