
GPU Headaches: Notes on Installing CUDA, CuDNN and Tensorflow on Manjaro - leblancfg
https://leblancfg.com/installing-cuda-cudnn-tensorflow-nvidia-gtx960.html
======
mippie_moe
My company (Lambda Labs) made it possible to install cuda, cudnn, tensorflow,
pytorch, and other deep learning frameworks with one line:

[https://lambdalabs.com/lambda-stack-deep-learning-
software](https://lambdalabs.com/lambda-stack-deep-learning-software)

We wrote debian packages for every framework, including cuda and cudnn. Using
our Debian repository, you can install all these frameworks using
apt/aptitude.

When a new version of a framework comes out, we usually have it available in
1-2 weeks.

~~~
alfalfasprout
Any chance you can get MxNet (and Keras MxNet) in there?

Installing this stuff is a huge nuisance for us and we have some pretty insane
Dockerfiles to handle all the different combinations. I might look into using
this for our ML images.

~~~
mippie_moe
I’ll talk to the team!

------
tebruno99
Yeah, not sure why you would be in Grub for Nvidia graphics drivers. This
whole thing seems pretty odd to me since I've not seen a Nvidia driver problem
in several years. Was he trying to use like Ubuntu 12.04 or something.

Manjaro is a great choice and it does "just work" the best. All the others
work just fine though.

~~~
jamesblonde
I have had issues with centos 7.2 and nvidia drivers. Nvidia assume linux
kernel sources are installed, but centos 7.2 puts them in a directory that the
nvidia driver can't find. I ended up having to scp sources from a working
machine and add symbolics links to get the drivers working (1080Ti card - i
assume the Tesla drivers work better).

~~~
tebruno99
CentOS is based on a stack that is nearing 5 years old so that makes sense.
The driver world (even mesa itself) has matured greatly since then. Mesa and
Nvidia even have libraries now that help them coexist instead of
replacing/overwriting library files.

------
illumin8
So basically, your calculations for deciding whether purchasing physical GPUs
or using cloud GPUs is more cost effective should add $5,000 (50 hours at $100
an hour seems reasonable) onto the cost of any physical rig, and assume you
have a spare 50 hours of opportunity cost to set it up.

All the ML researchers lusting for bare metal need to decide if they want to
be "Gentoo Ricers" \- [https://funroll-
loops.teurasporsaat.org/](https://funroll-loops.teurasporsaat.org/) \- or if
they just want to train models and get real work done.

~~~
_Wintermute
50 hours is definitely an outlier. I'm pretty clueless when it comes to linux
and drivers, and I managed to get an nvidia GPU working on ubuntu in about an
hour or two of swearing and a dozen reboots.

~~~
swebs
I don't know what you guys are doing, but for me installing proprietary Nvidia
drivers on Ubuntu was just a matter of clicking a checkbox.

------
perfinion
Wow that's quite a journey. I had a similar one when I started maintaining
TensorFlow on Gentoo Linux. It's working pretty well now, emerge tensorflow
will have everything installed and working with no fuss.

------
misterbrian88
I spent a long time trying to set this up as well. Ultimately, I was
successful using nvidia-docker. Here’s how I did it
[http://briancaffey.github.io/2017/11/19/tensorflow-gpu-
setup...](http://briancaffey.github.io/2017/11/19/tensorflow-gpu-setup-with-
docker-on-arch-linux.html)

------
gue5t

      export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/extras/CUPTI/lib64
      
      Reboot and cross your fingers.
    

Rebooting will terminate the process tree in which the environment has been
modified, making this line a no-op. Please take the time to understand the
commands you run before suggesting them to others.

~~~
fooker
Now I see why the author had so much trouble.

------
nextos
The only two distributions I've tried where this worked with a one liner were
NixOS and Arch. I'm surprised this doesn't work in Manjaro given that it's an
Arch derivative. But I don't see much point in Arch derivatives as they do
away with Arch's selling point. No layers between you and upstream.

~~~
leblancfg
I see what you mean, and I agree with you in theory. The long learning curve
and installation time for Arch was a non-starter for me though.

Back to your point, Manjaro runs directly on pacman repos, not a layer between
it and upstream. It just comes with IMO sane defaults and an installer GUI.

~~~
nextos
When did you try Arch? With the adoption of systemd it's incredibly easy to
configure _if_ you aim for a simple no desktop enviroment setup or something
minimal.

If not, I think it doesn't make any sense to use it unless you are a seasoned
user. Rolling release will make running big frameworks like KDE tricky, as
things will change often. And unless you have set everything up yourself, you
won't know where to look in order to fix things.

I'd recommend you look into NixOS if you want to run a desktop environment.
It's radically different (totally functional and declarative, whereas Arch is
imperative). All Python and deep learning stuff is neatly packaged and tested.
And rollbacks are trivial. There's no state.

You can even run Nix in other distros or macOS.

~~~
beojan
> Rolling release will make running big frameworks like KDE tricky, as things
> will change often.

On the contrary, you get new releases (with bugfixes) much sooner.

------
jorgemf
Archlinux user here (Manjaro is based on Archlinux). TensorFlow with GPU
support is a supported package you can install it with all it's dependencies
in a simple command with Pacman. But you don't want to do that the same way
you don't want to compile TensorFlow. It is better to use an official docker
image because they include all the dependencies, even the optionals as nccl
2.0. You forgot to install it, and it is speed up if you use more than one
GPU. Just use the official docker image and forget all the burdensome of
configuring everything.

P.S. well you need to install docker with Nvidia support, but you can use it
in the future for other things. This is also a couple of packages in Pacman
(maybe Nvidia docker is in air, I don't remember now).

~~~
leblancfg
That's a fair point! I tried installing the pre-packaged conda Tensorflow
package, but it was expecting CUDA 9.0 and CuDNN 7.0. The ones I had finally
managed to install were 9.2 and 7.2... hence compiling it from source.

I should add that as an addendum to the article, though, great catch!

------
acoye
Do you want to know pain? Try to use OpenCL with a GCN1 GPU (AMD 7970) with
TF.

You can more or less do it with SYCL / ComputeCPP, which is great, but it is
not trivial either. It does require you to compile TF and may fail to operate
some operations.

Full disclosure I have not retried it recently, last time was 6 month aago,
and a great amount of work is done on the SYCL / ComputeCPP front.

~~~
acoye
… Which is somewhat sad, given AMD's open drivers are terrific and a pleasure
to use on my machine, even for Steam games.

------
acd
It is usually headaches to install closed source software on Linux. Ffmpeg
with Cuda hardware acceleration encoding support is also a pain in the... to
install.

If graphic card makers would open source their drivers it would come default
shipped apt-get,pacman,dnf,yum installable by the Linux distros.

------
romaniv
I wonder when we will get to the point where computers (end-user computers)
learn using ML techniques. Will we ever? Will it ever become accessible to
non-developers?

------
solomatov
The best way to setup tensorflow for gpu is to use nvidia-docker and pre-built
docker images. They mention it in their documentation.

~~~
leblancfg
Author here. You would still need working NVIDIA drivers and CUDA to use
nvidia-docker, which I couldn't manage to do.

------
innocenat
Add Bumblebee to the mix and you get even more headache. Argh.

