
Nvidia-Docker: Build and Run Docker Containers Leveraging Nvidia  GPUs - charlieegan3
https://github.com/NVIDIA/nvidia-docker
======
Patrick_Devine
I think leveraging GPUs inside of Docker containers is an absolutely fantastic
idea, but forking/wrapping Docker to make it happen is just silly. Were there
no attempts to talk to the core team to make this happen?

(Disclaimer: I work at Docker, but not on the core team)

~~~
flx42_
We are only wrapping the Docker CLI, not forking the full code (that would be
insane). The wrapper is provided for convenience since it should be enough for
most users. If you know what you're doing, you don't need to rely on the
wrapper, we explain how you can do that on our wiki:
[https://github.com/NVIDIA/nvidia-
docker/wiki/Internals](https://github.com/NVIDIA/nvidia-docker/wiki/Internals)

Yes, we have been discussing with a few people at Docker, I hope attending
DockerCon and discussing with the team will help us move forward.

~~~
shykes
This is a fantastic use case, we (Docker) would love to help in any way we can
to make this usable by all Docker users.

Let us know if there's anything more we can to help.

------
joshuak
This seems like unnecessary additional structure. I was one of the first to
build dockerized apps with nvidia drivers[1] and it's as easy as building the
drivers and running modprobe in a privileged (driver only) container. All
other containers on the platform are then able to run GPU workloads without
modification.

We haven't found this to be particularly brittle or problematic in any way. It
just works(TM). it _feels_ like reaching deeper into the control stack by
'wrapping' Docker CLI or other actions is likely to be an untenable workflow
for us.

We use a _huge_ amount of GPU. We do it on CoreOS/k8s et al. We're exploring
rkt. Doesn't nvidia's approach interfere with this?

However, I am always looking for better approaches to help reach as far into
the future with our platform as possible. So show me the magic, and I'm yours.

1: [https://github.com/Avalanche-io/coreos-
nvidia](https://github.com/Avalanche-io/coreos-nvidia) (Old public repo,
Should I push our latest? I don't think anyone is using this so I haven't.)

~~~
exxo_
I think you are missing the point, it has nothing to do with installing the
NVIDIA drivers through Docker.

What you are showing[1] is how to install NVIDIA drivers on CoreOS the hackish
way (not persistent, no driver libs, no DKMS, no UVM, no KMS...)

Regarding rkt, it's not supported at the moment but a similar approach could
be taken. As for the Docker CLI wrapper, you can avoid it if you really need
to.

~~~
joshuak
While my code here is definitely hackish, I can't argue with that, I have to
say I'm hard pressed to see how running a container to activate a driver is
hackish when the comparison at hand is modifying the Docker CLI and requiring
a Docker plugin.

I run the driver container at startup, and never shut it down. How is this not
persistent? DKMS and other build/deployment choices are not obviated by my
approach, so I'm not sure that's relevant.

Looking more deeply at the "Why NVIDIA Docker" in the repo wiki doesn't
provide any enlightenment either. In fact it doesn't really explain why docker
itself must be modified. The only explanation really is lack of container
portability, but driver containers are portable within the scope of a given
kernel version. Certainly modified docker cli and plugin requirements are much
less portable.

It seems to me like someone at nVidia simply didn't realize that they could
run a container in privileged mode and effectively install the driver system
wide for all containers.

~~~
exxo_
If you want more insights, I suggest you read the section "Internals".

I'm not going to dwell on the details but there are many reason why doing so
can go horribly wrong. Believe me, we (NVIDIA) evaluated our options and know
the implications of running our drivers within containers.

Do you really know what --privileged do? If so, you know that there is no such
thing as "install the driver system wide". For that you would have to
circumvent the namespaces and a bunch of other things that Docker put in
place.

"portable within the scope of a given kernel" [and driver] "version"

Well that's not what I call portable :) With nvidia-docker you can build a
CUDA image on your laptop and deploy it anywhere in the cloud or on premises
without a single modification.

~~~
joshuak
Ok great, thanks! I'll check this out when we run into these issues.

------
WhatsName
This is brilliant, wished I had this when I was implementing a large scale
cloud rendering project.

Having a standardised and streamlined way of deploying the application code
with CUDA toolkit would have saved me a lot of troubleshooting.

------
lmeyerov
We use this at Graphistry for wrapping our analytics & visuals for cloud & on-
premises, super appreciative :)

------
moondev
This looks very interesting. What advantages would there be to running
containers on the GPU?

~~~
fscherer
using containers to easily deploy machine learning experiments - thats at
least how we use it

~~~
flx42_
Yes, running containerized machine learning workflows is our primary use case
of nvidia-docker internally. That's why we provide pre-built images for cuDNN
and DIGITS on the DockerHub. Our base cuDNN image is now used by TensorFlow,
Caffe, CNTK, Theano, to cite a few.

------
fapjacks
I've been using this for some time specifically for my ML projects.

