
New AWS Deep Learning AMIs for Machine Learning Practitioners - rbanffy
https://aws.amazon.com/blogs/ai/new-aws-deep-learning-amis-for-machine-learning-practitioners/?sc_channel=sm&sc_campaign=launch_&sc_publisher=TWITTER&sc_country=Global&sc_geo=GLOBAL&sc_outcome=awareness&trk=_TWITTER&sc_content=MXNet_a331e93a_DL_AMI_Conda_Based_AMI&sc_category=MXNet&linkId=44774947
======
anothertraveler
They're probably similar software-wise...but the Nvidia GPU Cloud is a docker
container registry, so you can pull the docker image and use nvidia-docker to
run it. Containers are easier to move around across cloud providers (AWS,
Azure, Nimbix, etc...), or even to your local machine, so you don't need to
manually configure (and sometimes compile from scratch) a whole slew of
complex software packages. The Nvidia GPU cloud is maintained by NVIDIA
themselves, so you know they have the configuration just right for optimal
performance on their GPU hardware, wherever you choose to use it.

~~~
eanzenberg
I never played around with nvidia-docker. Is there much, if any additional
overheard to the gpu compute by running through it vs. natively with an AMI?

~~~
anothertraveler
nvidia-docker is just docker that configures the docker to see the physical
gpu device, and also mounts the run-time drivers that match the hardware
driver that is installed on the host. It makes life way easier if you need
portability across different Linux hosts for GPU-driven software. The Nvidia
container images also have some of the "proprietary" free software packages
like cuDNN which accelerate some popular deep learning packages with optimized
CUDA implementations.

If you run on a bare-metal platform, like your laptop booting linux or a bare-
metal cloud, there's a very small amount of overhead for using nvidia-docker,
mainly just the same overhead as running a regular docker container (a
container is just chroot + Linux kernel cgroups + cgroup kernel namespaces).

If you're in an AMI on AWS, it's a virtual machine _anyway_ , so there's
virtualization overhead which is quite a bit higher than container overhead,
but there's other baggage as well such as shared tenancy/noisy neighbors, and
possibly oversubscription of hardware to the virtualized environment.

If you're in a docker container in an AMI, there's the slight container
overhead plus the virtualization overhead and baggage and benefits that come
with it. Virtualization overhead is probably an order of magnitude higher than
container overhead. Natively with an AMI is not so native (though Amazon is
trying to improve that with their C5).

~~~
bob_theslob646
>If you're in an AMI on AWS, it's a virtual machine anyway, so there's
virtualization overhead which is quite a bit higher than container overhead,
but there's other baggage as well such as shared tenancy/noisy neighbors, and
possibly oversubscription of hardware to the virtualized environment.

Can you post or cite something related to virtualization overhead being
"probably an order of magnitude higher than container overhead"

Your other points are very fair.

Any mention on costs for the docker versus Ami?

------
praseodym
How do these AMIs stack up against the recently announced NVIDIA GPU Cloud
images ([https://www.nvidia.com/en-us/gpu-cloud/](https://www.nvidia.com/en-
us/gpu-cloud/))?

~~~
JoeDaDude
I did not see Keras [1] on the Nvidia machines, though to be fair, I did not
search exhaustively. Keras is just too convenient for rapid prototyping to go
without.

[1] [https://keras.io/](https://keras.io/)

------
eghri
If you want to use the p3 instances with Tesla V100's, you'll need to use
CUDA9 instead of CUDA8, and unfortunately none of Amazon's AMI's have the
latest PyTorch that support CUDA9. The easiest way to use them is get one of
the Conda Deep Learning AMI's and then follow the latest PyTorch install
instructions, which seems to work pretty seamlessly.

