
Nvidia-Docker – Build and Run Docker Containers Leveraging Nvidia  GPUs - dgellow
https://github.com/NVIDIA/nvidia-docker
======
klueska
If you're using DC/OS you can also run docker containers that leverage GPUs.
We worked with Nvidia to mimic the same functionality provided by nvidia-
docker. Here's a talk I gave at this years Nvidia GTC conference explaining
how it works:

[https://drive.google.com/file/d/0B7vZqCY-
AJrpMEYyVmZkVGlyOEE...](https://drive.google.com/file/d/0B7vZqCY-
AJrpMEYyVmZkVGlyOEE/view)

[https://docs.google.com/presentation/d/1q6brajoQkFZVtSocMPGg...](https://docs.google.com/presentation/d/1q6brajoQkFZVtSocMPGgbmqTLAcrZetcMrdve-
WD6x4)

------
sargun
We use this in production every day. We've hit a couple bugs, but it's a
surprisingly stable piece of software.

We hit an issue with user namespaces, and it not correctly chowning things
when "exporting" them into Docker containers. I honestly forget how we solved
this, but I believe it was just an upgrade.

We hit an issue with the first run of a Docker container on a system being
extremely slow, to the point where downstream systems would give up, and
retry. The issue was fixed by starting a container that leverages the driver
at boot time, "warming" it up.

~~~
etaioinshrdlu
Is it possible that the slowness is caused by disk speed of loading a large
file from disk?

I've observed the same thing but it seemed to be IO related, not CPU or GPU
related.

~~~
tlb
It's mainly the speed of creating a huge number of small files on disk. When
unpacking a docker image into an AUFS file system, it has to create every file
in the shadow system. A robust Ubuntu install has around 100000 files. On AWS
with default EBS storage, this can take a few minutes. At OpenAI we pre-built
AMI instances with file systems pre-unpacked.

------
krosaen
As and example, we used (nvidia-)docker images to make our object detection
results reproducible:

[https://github.com/umautobots/driving-in-the-
matrix](https://github.com/umautobots/driving-in-the-matrix)

docker has been great for this in the lab: only one person now goes through
the pain to get latest framework + hack + model tweak work and then the rest
can reuse.

------
random023987
Isn't AMD leaving a lot of potential revenue on the table without a comparable
amdgpu-docker?

If I'm building an ML cluster, I'm going to go with the vendor that's easiest
to containerize and deploy, and right now it looks like nvidia has a
commanding advantage in software.

Why doesn't AMD throw a few hundred thousand dollars at some developers to get
containerization parity?

------
ThePhysicist
Just for information, it's also possible to use NVIDIA GPUs inside Docker
containers without using `nvidia-docker`:

[https://stackoverflow.com/questions/25185405/using-gpu-
from-...](https://stackoverflow.com/questions/25185405/using-gpu-from-a-
docker-container)

In a nutshell, you can give the container access to the NVIDIA device file via
the `--device` flag, so all you need is a container with the NVIDIA drivers.
The added benefit of this is that you can use different versions of the
drivers side-by-side (in my understanding).

I thought this might be relevant as some people might not want to use the
`docker-nvidia` CLI to run containers (I'm not sure how you would use this via
the Docker API for example).

~~~
flx42_
We document how this on our wiki: [https://github.com/NVIDIA/nvidia-
docker/wiki/Internals](https://github.com/NVIDIA/nvidia-docker/wiki/Internals)

> The added benefit of this is that you can use different versions of the
> drivers side-by-side (in my understanding).

No, you can only have one driver version, the one that correspond to the
loaded kernel modules. Installing the driver inside a Docker image makes it
non-portable.

~~~
ThePhysicist
Ah thanks for the clarification, I was not aware of this!

------
aub3bhat
I use NVidia-Docker extensively in my Open Source project Deep Video Analytics
[1] when combined with TensorFlow (which allows explicit GPU memory
allocation) its unbeatable in running multiple inference models on a single
GPU in a reliable manner. Combining this setup with docker volumes on AWS EFS
allows simple multi machine deployments.

[1]
[https://github.com/AKSHAYUBHAT/DeepVideoAnalytics](https://github.com/AKSHAYUBHAT/DeepVideoAnalytics)

~~~
sandGorgon
this is pretty cool! why do you use multiple packages like torch and
tensorflow ?

~~~
aub3bhat
Certain algorithms/models are implemented in PyTorch or Caffe and typically
it's huge amount of work to convert them to TensorFlow while ensuring
correctness / Parity. Also I personally like design of PyTorch.

~~~
sandGorgon
coming from the facebook/reactjs weaponized patent grant problem.. caffe also
has the same revocable patent grant.

Tensorflow is apache licensed. I think in general, the perception is that it
is far safer to stay away from caffe.

~~~
aub3bhat
Caffe 1 is developed by Berkeley and I think Apache or BSD licensed. The
Patents.txt issue occurs with Caffe 2 which is developed by Facebook.

------
zitterbewegung
I wish they would add support for NVIDIA-docker to run on Windows but I think
that is a tall order.

------
lmeyerov
One of our engineers recently shared an intro-level talk at the Docker meetup
about a ~year of (happy) experiences with this:
[https://www.meetup.com/Docker-Santa-
Clara/events/240641246/](https://www.meetup.com/Docker-Santa-
Clara/events/240641246/) . It should be recorded somewhere, and may be helpful
if you're considering production use, not just personal DL stuff.

------
etaioinshrdlu
A collection of deep learning nvidia-docker images to get you started:
[https://hub.docker.com/r/deepaiorg/](https://hub.docker.com/r/deepaiorg/)

Floydhub also curates most frameworks as nvidia-docker images:
[https://hub.docker.com/r/floydhub/](https://hub.docker.com/r/floydhub/)

------
stephengillie
Is there a processing-specific user case for this - are we going to see mining
containers or game client containers?

~~~
vqc
Would this be useful for people who wanted to leverage GPUs for deep learning,
but didn't have the wherewithal or the willingness to set up all the
dependencies on the host machine?

~~~
tiangolo
It is used by this kind of popular pre-built Docker image for Deep Learning:
[https://github.com/floydhub/dl-docker](https://github.com/floydhub/dl-docker)

They actually created this service using that (a Deep Learning "as a
service"): [https://www.floydhub.com/](https://www.floydhub.com/)

~~~
narenst
At FloydHub, we use nvidia-docker in production for running DL jobs. It has
been very solid for the past 6 months or so. We have also built a collection
of open source DL docker images for various frameworks and we actively
maintain them.

[1]: [https://hub.docker.com/r/floydhub/](https://hub.docker.com/r/floydhub/)
[2]:
[https://github.com/floydhub/dockerfiles](https://github.com/floydhub/dockerfiles)

------
unixhero
Can I run regular web apps with this on NVIDIA gpu's?

~~~
mastax
No.

~~~
neuronexmachina
I mean, technically I guess you could run it, it just wouldn't make use of the
GPU.

