
Setting Up a Deep Learning Machine from Scratch - IamFermat
https://github.com/saiprashanths/dl-setup
======
minimaxir
The dependency hell required to run a good deep learning machine is one of the
reasons why using Docker/VM is not a bad idea. Even if you follow the
instructions in the OP to the letter, you can still run into issues where a)
an unexpected interaction with permissions/other package versions causes the
build to fail and b) building all the packages can take an hour+ to do even on
a good computer.

The Neural Doodle tool ([https://github.com/alexjc/neural-
doodle](https://github.com/alexjc/neural-doodle)), which appeared on HN a
couple months ago
([https://news.ycombinator.com/item?id=11257566](https://news.ycombinator.com/item?id=11257566)),
is very difficult to set up without errors. Meanwhile, the included Docker
container (for the CPU implementation) can get things running immediately
after a 311MB download, even on Windows which otherwise gets fussy with
machine learning libraries. (I haven't played with the GPU container yet,
though)

Nvidia also has an interesting implementation of Docker which allows
containers to use the GPU on the host: [https://github.com/NVIDIA/nvidia-
docker](https://github.com/NVIDIA/nvidia-docker)

~~~
jre
I can't speak for caffe, but I got a new machine up and running with keras in
about half an hour last week without having to compile anything :

\- install the nvidia drivers, cuda and cudnn which all come as ubuntu
packages.

\- use anaconda for python

\- conda install theano

\- pip install tensorflow as per the website

\- pip install keras

And voilà, I got my keras models running on my shinny new GTX 980.

~~~
zenlikethat
Keep in mind that you also knew how to do this ahead of time (installing the
CUDA-related libraries is overwhelmingly difficult if you haven't done it
before), and didn't run in to any issues with numpy / scipy version
compatibility (I've had quite a bit of "fun" having to install numpy etc. from
source in the past), and were presumably lucky enough to have a well supported
GPU.

Is there a motivation for anaconda other than "it includes the stuff we
usually need"? It strikes me as somewhat strange that anaconda is so often
recommended but it forces a divergence from using the mainline packages with
vanilla Python.

~~~
lqdc13
numpy/scipy super easy to install (apt-get install python3-scipy) if you are
not trying to link it to a manually compiled ATLAS/MKL. Otherwise you have to
download and modify config files to point to the lib in case of MKL and update
alternatives for libblas/liblapack for both MKL/ATLAS.

Cuda on the other hand is annoying because you have to both make sure the
driver works with multiscreen setups and that cuda links against that driver
correctly and uses a specific gcc version.

~~~
zenlikethat
apt-get and/or pip have frequently given me versions of numpy and/or scipy
behind what TensorFlow, Theano, Keras etc. want resulting in cryptic errors
that don't show up until attempting to run a script.

~~~
pwang
Yes, that's why people don't use apt-get or pip, and rather install Anaconda.

Pip and wheels are still not really suitable for scientific Python work,
because the metadata facilities are not sufficiently rich to capture all the
information needed for proper linking of native libraries. By contrast, in
Anaconda, things like MKL linkage and Fortran compiler information can be used
in the package metadata for dependency solving, to minimize these kinds of
compatibility issues.

~~~
zenlikethat
Interesting and thanks for the summary, seems the motivations are a bit
clearer to me now. Is there intention in moving Anaconda's unique features
upstream?

~~~
pwang
Well, kind of. We've tried to work with the python core packaging folks to
improve the built-in metadata facilities. (There has been a checkered history
there in terms of reception to our ideas...)

In terms of making these packages easier to build, that's really not actually
where the problem is. The fact that numpy, scipy, cython, etc. need to have a
shared compiler and build toolchain is really a result of operating systems
and the C language ABI works.

------
JackFr
Disappointed. Misread it -- I thought he was going to do deep learning _with_
[https://scratch.mit.edu/](https://scratch.mit.edu/), not _from_ scratch.

~~~
harryf
Likewise. Might be fascinating for kids to have some "ready to roll" machine
learning routines e.g. Watch the cat explore and learn how to get round a maze

------
pilooch
Commoditizing deep learning is mandatory. After repetitive in production
installs at various corps while connecting to existing pipelines, I ve
convinced some of them to sponsor a commoditized open source deep learning
server.

Code is here:
[https://github.com/beniz/deepdetect](https://github.com/beniz/deepdetect)

There are differenciated CPU and GPU docker versions, and as mentioned
elsewhere in this thread, they are the easiest way to setup even production
system without critical impact on performances, thanks to nvidia-docker. It
seems they are more popular than AMI within our little community.

------
mastazi
I'm sorry if this is only tangentially on topic:

I was reading the article and got to the part related to installing CUDA
drivers.

I am currently on the market for a laptop which will be used for self-learning
purposes and I am interested in trying GPU-based ML solutions.

In my search for the most cost-effective machine, some of the laptops that I
came across are equipped with AMD GPUs and it seems that support for them is
not as good as for their Nvidia counterparts: so far I know of Theano and
Caffe supporting OpenCL and I know support might come in the future from
TensorFlow [1], in addition I saw that there are solutions for Torch [2]
although they seem to be developed by single individuals.

I was wondering if someone with experience in ML could give me some advice: is
the AMD route viable?

[1]
[https://github.com/tensorflow/tensorflow/issues/22](https://github.com/tensorflow/tensorflow/issues/22)

[2]
[https://github.com/torch/torch7/wiki/Cheatsheet#opencl](https://github.com/torch/torch7/wiki/Cheatsheet#opencl)

~~~
SixSigma
We used to have a saying "friends don't make friends run closed software".

I guess those days have gone.

~~~
iaml
Nvidia is doing a pretty damn good job at staying the only viable choice for
machine learning. Or amd's lack of action is, anyway.

------
zacharyfmarion
I posted something similar on my blog ([http://zacharyfmarion.io/machine-
learning-with-amazon-ec2/](http://zacharyfmarion.io/machine-learning-with-
amazon-ec2/)) not too long ago. Would be nice if there was a tool that set all
of this up for you!

------
vonnik
I work on Deeplearning4j, and I'm told that the install process is not too
hellish. Feedback welcome there:

[http://deeplearning4j.org/quickstart](http://deeplearning4j.org/quickstart)

[http://deeplearning4j.org/gettingstarted](http://deeplearning4j.org/gettingstarted)

Someone in the community also Dockerized Spark + Hadoop + OpenBlas:

[https://github.com/crockpotveggies/docker-
spark](https://github.com/crockpotveggies/docker-spark)

The GPU release is coming out Monday.

~~~
agibsonccc
For scala folks,we're working on the corresponding docker containers for spark
+ cuda setup as well.

We'll also make it so you can experiment with models etc from a notebook:
[https://github.com/andypetrella/spark-
notebook](https://github.com/andypetrella/spark-notebook)

Not exactly for the python crowd (we mainly try to be an alternative for the
jvm stack)

------
profen
The steps are pretty neat. Also agree on the driver and tools installation.
Just painful and long.

looks like there are seperate torch and caffe amis as well for amazon. Going
to try later.

[https://aws.amazon.com/marketplace/pp/B01B52CMSO](https://aws.amazon.com/marketplace/pp/B01B52CMSO)

[https://aws.amazon.com/marketplace/pp/B01B4ZSX5S](https://aws.amazon.com/marketplace/pp/B01B4ZSX5S)

~~~
zenlikethat
The issue with using posted AMIs for this is the same as usual: they include
god knows what else in addition to the installed and configured software
(which is likely to also lag behind master / latest release quite a bit). Last
few AMIs I tried for this included some random public keys as authorized users
in a sudoer account! While they're likely benign (belonging to researchers
that created these images), that'd be a nasty surprise to find in your data
pipeline down the line.

~~~
mbajkowski
This is a valid concern, which is one of the reasons we publish these AMIs
through the AWS marketplace. Each of these AMIs had to go through the AWS
security checker script as well as a manual review by the AWS marketplace
team, please see the "Securing an AMI" section here.

[https://aws.amazon.com/marketplace/help/201231340](https://aws.amazon.com/marketplace/help/201231340)

Going through the AWS audit does take a few days to say the least and can be a
hassle at times, but usually we are pretty close to the latest master /
release.

~~~
zenlikethat
Cool, didn't know that about the Marketplace. Thanks for sharing.

------
visarga
Is there a host offering GPU systems preconfigured with ML frameworks and
models, for playing around? Something simple to use like Digital Ocean.

~~~
mbajkowski
I'm one of the devs for some of the AWS AMIs mentioned a few comments below
which have the frameworks and examples installed, and run on CPU as well as
GPU instances. We have several AMIs including one for TensorFlow:

[https://aws.amazon.com/marketplace/pp/B01EYKBEQ0/ref=_ptnr_h...](https://aws.amazon.com/marketplace/pp/B01EYKBEQ0/ref=_ptnr_hn)

Would love to get some feedback from anyone who gives them a spin about what
we could do better - or which AMIs we should add that people may find useful.

If you are not familiar with AWS, we have quick-start blog here as well:

[http://www.bitfusion.io/2016/05/09/easy-tensorflow-model-
tra...](http://www.bitfusion.io/2016/05/09/easy-tensorflow-model-training-
aws/)

~~~
dbcurtis
Pardon the n00b question, but I'm near the bottom of the learning curve on
this.

It looks like this runs on GPU-less instances, as well as Gx-instances. So,
how do you envision this being used? Would someone do prototyping on the cheap
instances and then move up to the Gx instances for production? Is that move
transparent?

~~~
mbajkowski
You could do precisely that. Get started on a small instance, play around with
one of the frameworks (one of the reasons why we also integrated Jupyter as
part of the AMIs so the you can quickly write some python code from the
browser without having to ssh into the instance). And then when all checks
out, migrate the image (by creating a snapshot) and booting it on a more
powerful instance.

For TensorFlow if an operation has both CPU and GPU implementations, the GPU
devices will be given priority (if present on the instance) when the operation
is assigned to a device. For Caffe we have both the GPU and CPU version
installed.

------
profen
Have used this digits ami on aws in the past for caffe and torch.

[https://aws.amazon.com/marketplace/pp/B01DJ93C7Q/ref=srh_res...](https://aws.amazon.com/marketplace/pp/B01DJ93C7Q/ref=srh_res_product_title?ie=UTF8&sr=0-6&qid=1463261339052)

------
amelius
Step 1: make sure that your machine has sufficient free PSI slots for the GPU
cards, and that you have sufficient physical space inside the machine.

Seriously... why can't there be a better way of adding coprocessors to a
machine? Like stacking some boxes, interconnected by parallel ribbon cable, or
something like that?

~~~
MikeTLive
Google for "external PCI card cage" these have been around for over 15 years.

------
tzz
If someone creates a Juju Charm
[https://jujucharms.com](https://jujucharms.com) for this, then you can use
the pre-configured service on any of the major public clouds.

~~~
agibsonccc
You mean this? [https://github.com/SaMnCo/layer-skymind-
dl4j](https://github.com/SaMnCo/layer-skymind-dl4j)

We work with the ubuntu team on juju.

~~~
nl
That doesn't do the Cuda/cuDNN stuff does it? (Ie, the hard, error prone
part).

~~~
agibsonccc
Right. So we'll be adding cuda and all that as well.

We are working very closely with canonical/IBM on the whole DL stack[1]. You
will also see some stuff from us and nvidia here within the next month or so
on making cuda a bit easier to setup in a normal data science "stack" eg:
jvm/python hybrid product stacks. Cudnn has tricky licensing but it shouldn't
be that bad to automate setting up the cuda part.

[1]: [https://insights.ubuntu.com/2016/04/25/making-deep-
learning-...](https://insights.ubuntu.com/2016/04/25/making-deep-learning-
accessible-on-openstack/)

------
tacos
I don't understand the fascination with these "make a list" style setup
instructions, as they're almost immediately outdated, and seldom updated.

We have AMI, we have docker, we have (gasp) shell scripts. It's 2016, Why am I
cutting and pasting between a web page and a console?

To my knowledge the only thing that does something like this well is oh-my-
zsh. And look at the success they've had! So either do it right, or don't do
it at all.

~~~
danso
> I don't understand the fascination with these "make a list" style setup
> instructions...

> ...as they're almost immediately outdated, and seldom updated.

Your second sentiment is the reason why I appreciate the "make a list"-style
tutorials. When something inevitably goes out of date, I can at least see some
of the narrative and reasoning for each step, instead of trying to debug
someone's shell script that they've left to obsolescence.

Even better is when I have 3 such tutorial-lists to compare, making it easier
to see which steps are integral and which steps were simply author-specific
conventions.

------
raverbashing
No, you don't need to restart your machine after you install CUDA.

Also you might not need to restart after you install the drivers, this is not
Windows. (But there might be some rmmod/modprobe needed)

> If your deep learning machine is not your primary work desktop, it helps to
> be able to access it remotely

Yes, use ssh.

~~~
modeless
Your snark is outdated. Windows can upgrade graphics drivers without a reboot
these days and even supports hotplugging GPUs to a limited extent. Meanwhile,
like many others I regularly have to perform console surgery on my Linux
machines when they fail to boot to X after fiddling with the graphics drivers.
(My latest discovery is that Ubuntu's auto-updates will auto-destroy your
NVIDIA drivers if you have an unexpected version of GCC set as default).
Graphics drivers are emphatically _not_ something Linux people should be
crowing about to Windows users.

~~~
raverbashing
> My latest discovery is that Ubuntu's auto-updates will auto-destroy your
> NVIDIA drivers if you have an unexpected version of GCC set as default).
> Graphics drivers are emphatically not something Linux people should be
> crowing about to Windows users.

I have to agree. Especially when the Ubuntu package drivers ruin your system.

~~~
hondaz54
Yeah but do not blame Ubuntu or other distributors or Linux, but NVidia, who
ship shitty proprietary drivers that do not integrate well in the *nix system.

~~~
PascLeRasc
I've had massive problems with my Nvidia gpu using anything but the open-
source driver. I'm not sure how much performance I'm losing but it's worth not
spending hours debugging.

