
GPUs for Google Cloud Platform - hurrycane
https://cloudplatform.googleblog.com/2016/11/announcing-GPUs-for-Google-Cloud-Platform.html
======
dkobran
Kudos to Google for making moves here. Having spent the last year+ tackling
GPUs in the datacenter, super curious how custom sizing works. It's a huge
technical feat to get eight GPUs running (let alone, in a virtual
environment), but the real challenge is making sure the blocks/puzzle pieces
all fit together so there's no idle hardware sitting around There's a reason
why Amazon's G/P instances require that you double the RAM/CPU if you double
the GPU. Another example would be Digital Ocean's linear scale-up of instance
types. In any case, we'll have to see what pricing comes out to.

Shameless plug, if you want raw access to a GPU in the cloud today, shoot me
an email at daniel at paperspace.com We have people doing everything from
image analysis to genomics to a whole lot of ML/AI.

~~~
Veen
Paperspace looks pretty awesome. Given the exorbitant expense of the new
MacBook Pros, I'm thinking of getting a iPad Pro instead, except that you
still can't really do dev work conveniently on an iPad.

This looks like it might be a cool solution to that problem. Use iOS most of
the time (I'm a writer) and the Paperspace VM on the iPad for coding (I'm also
a wannabe developer).

Only fly in the ointment is that it's not available in Europe, and I'm worried
about latency over a 4g connection.

~~~
lima
For $25/month, just rent a dedicated server in Europe (or even just put in
your basement if your network connection is sufficiently sized). RDP with
HyperV has everything you need for that sort of usage (RemoteFX).

For coding, you don't even need a GPU. Just use RDP. Big companies use this
all the time (thin clients + central server farm, also called VDI).

~~~
Veen
I tend to prefer Linux on the server, and I've tried the iPad + tmux + vim
route on a remote server and the experience hasn't been great. It's usually
latency problems that trip me up. I travel a lot and rely on tethering to my
phone's 4g connection, which mostly works but can be quite frustrating.

~~~
adventureloop
Living permanently on 4G I couldn't get by without mosh. I have managed to do
weeks of work with most+tmux+vim at times when ssh was too painful to use.

~~~
nwrk
On mosh's note. It's great experience for slow connection. Totally agree.
There is new ios client recently released client with mosh support build-in
[1]. Happy user

[1] [http://www.blink.sh/](http://www.blink.sh/)

------
timdorr
> Google Cloud GPUs give you the flexibility to mix and match infrastructure.
> You’ll be able to attach up to 8 GPU dies to any non-shared-core machine...

Wow, that's impressive. One thing I've loved about GCE has been the custom
sizing. This takes it even further, so we don't have to buy what we don't
need.

Looking forward to seeing the pricing on this. Looks like they're going to
heavily compete with AWS on this stuff.

~~~
boulos
Yes, I foresee a lot of 8 vCPUs VMs with 8 giant GPU dies attached to them.
And that's just fine by us!

Disclosure: I work on Google Cloud (and want to sell you some GPUs!)

------
matt_wulfeck
One of the HUGE advanatages of GCE/AWS is that they will gobble up 100% of the
unused resources for their own computation. Nothing is wasted, and the
machines basically pay for themselves.

Compare this was something like oracle, which simply can't consume the unused
resources in order to discount the hardware effectively. They can't beat
GCE/AWS at the cloud game until this changes.

~~~
honkhonkpants
I'm curious to know what workloads you think Google is running on the same
hardware that cloud customers are using.

~~~
jfoutz
What an odd question. Seems like training a new voice recognition model, or
image classifier, or any of a zillion other research projects google is
working on would be perfect for _any_ unused capacity.

I've never worked at google. Do they just let each team go with their own
crazy hardware setup? Or maybe two separate systems, one internal and one
external? That seems a little wasteful (they have the money, so whatever).

But wouldn't programming against a big unified api be a bit more, well, sane?
I can see really sensitive stuff locked away on its own hardware, but the
search front end? why not serve some JS or provide chrome downloads from a
giant pool of common hardware?

Things grow organically, and they might be technically locked into a specific
layout right now, that's totally understandable. But, unused capacity is just
capital depreciating steadily away with no gain. Soaking up every spare cycle
doing something useful should probably be somewhere in the company goals.

~~~
honkhonkpants
Like I said, I'm just curious. Google has so much capacity that I'm not sure
it would be important for their bottom line to move it between Google and
customer workloads in quanta smaller than one physical machine. Your theory
also requires Google to have a substantial standby workload that is currently
not scheduled, doesn't it?

~~~
jfoutz
In my very limited experience, trying to answer research questions can soak up
all the computation you can throw at it. Consider something like the traveling
salesman problem. The only answer is to check every path. There are heuristics
to get good answers, but you never know if they're optimal.

You can get an answer on your laptop in an hour. You'll probably get a better
answer throwing 1000 hours at your algorithm though, and a better answer
throwing 10000 hours.

You can do a lot of neat things if p=np. Since no one knows if that's true,
we're stuck with bad big O. More computer time is the only way out right now.

The point is, I'd be disappointed in google researchers if they didn't have a
huge amount of unscheduled workload.

~~~
hueving
There are some questions though where the saved energy from not running the
batch jobs is worth more than the answer (e.g. searching for the next prime).

~~~
dekhn
We (google) ran numerous theory problems on Google's computing platform via
the Exacycle program. Peter Norvig convinced me this was a bad idea, but
mainly to say that time was better spent on disproving the conjecture using
pure math, not search.
[http://norvig.com/beal.html](http://norvig.com/beal.html)

"But Witold Jarnicki and David Konerding already did that: they wrote a C++
program that built a table of Cz(modp)Cz(modp) up to 5000500050005000 , and,
in parallel across thousands of machines, searched for A,BA,B up to 200,000
and x,yx,y up to 5,000, but found no counterexamples. On a smaller scale,
Edwin P. Berlin Jr. searched all CzCz up to 10171017 and also found nothing.
So I don't think it is worthwhile to continue on that path."

------
slizard
Kudos for Google and happy to see that at least in principle AMD is still an
option.

I wonder what kind of device driver does GCE use with AMD, the new ROCm?

What about Power8 + NVLink harware? Does anybody know if the current NVIDIA
GPUs, in particular the P100s are all on x86?

~~~
boulos
I'll just answer your driver question: you bring whatever you want. Our VMs
just boot off of bytes you have stored in your image (which is in turn just
stored on GCS). If those bytes happen to be Debian 8 with some AMD driver that
supports the 9300, awesome ;).

We will be working with both vendors to make sure we highlight which drivers
work most reliably on our stack though (virtualization + GPUs is way too
rare!). We want to work closely with both vendors to do the qualification, so
you at least know what is known good.

Disclosure: I work on Google Cloud.

~~~
puzzle
When is Kubernetes support going to be added? Right now it has it only in a
very basic form for NVIDIA GPUs.

~~~
eudoxus
There is a PR[1] to extend the current alpha GPU implentation. Hopefully it
will get merged soon.

[1] -
[https://github.com/kubernetes/kubernetes/pull/28216](https://github.com/kubernetes/kubernetes/pull/28216)

~~~
puzzle
That's still NVIDIA-only, plus you can't even tell a P100 from a K80 for
scheduling purposes.

~~~
boulos
I'd just use a separate NodePool and match your jobs using a selector as
appropriate.

~~~
puzzle
So, is gci going to come with the right drivers?

------
boxerab
Very very happy to finally see AMD GPUs in cloud.

~~~
wyldfire
I love to root for an underdog. But I've been pulling for AMD for years and
they just can't get their linux support to the same level of NVIDIA's. I
speculate that they have undiagnosed design errors in their driver.

~~~
boxerab
You should check out ROCm

[http://hothardware.com/news/amd-rocm-13-adds-support-for-
pol...](http://hothardware.com/news/amd-rocm-13-adds-support-for-polaris-and-
opencl)

They are revamping their entire linux stack for HPC, heterogenous compute.

~~~
wyldfire
Fool me several times, shame on me.... ;) Really -- ever since the ATI
acquisition I thought AMD could make it happen. And it's true that they have
come a long way since then.

"New Linux Driver and Runtime Stack optimized for HPC & Ultra-scale class
computing" sounds great, let's hope this time they can do it.

~~~
boxerab
:) I've only been deeply involved with compute for the past two years, so I am
still an optimist. I think AMD pretty much has no choice but to get it right
this time: they need server and HPC dough to stay afloat.

------
eudoxus
This amazing!!! First cloud provider to have P100! Amazing opportunities ahead
with compute power like that.

~~~
eDameXxX
What about this[1]?

"GPU-Accelerated Microsoft Cognitive Toolkit Now Available in the Cloud on
Microsoft Azure and On-Premises with NVIDIA DGX-1"

DGX-1 is powered by 8 x P100

[1][http://nvidianews.nvidia.com/news/nvidia-and-microsoft-
accel...](http://nvidianews.nvidia.com/news/nvidia-and-microsoft-accelerate-
ai-together)

~~~
thesandlord
AFAIK the DGX-1 is for running on premise, not on the cloud

------
fulafel
What's the assurance like regarding security against other concurrent users on
the same hardware? Historically multitenancy with GPUs has been quite iffy and
not much security research around, even if there theoretically are IOMMU's.

~~~
boulos
We care a lot about security, that's part of the motivation for passthrough
instead of any of the "split up a GPU". So as I quoted above from our post:

> GPUs are offered in passthrough mode to provide bare metal performance. Up
> to 8 GPU dies can be attached per VM instance including custom machine
> types.

So when your VM boots up, the GPUs you attach will not be shared with others.
I agree that the state of GPU virtualization is not yet where it needs to be
to make me _personally_ comfortable enough.

Disclosure: I work on Google Cloud (and worked on this a little).

------
kozikow
Now it would be great if kubernetes on GKE would work nicely with GPUs. It's
still in the works:
[https://github.com/kubernetes/kubernetes/blob/master/docs/pr...](https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/gpu-
support.md) .

------
otto_ortega
Awesome news! The Tesla P100 is a monster, this will push ML development to
new heights.

------
AlexCoventry
Is there any public access to the TPUs?

~~~
mikecb
If you use the cloud ml service.

~~~
mnbbrown
Really? I was under the impression they just use standard compute instances to
power Cloud ML?

------
kesor
This happened some months ago ... how does it compare? Anyone in the know can
pitch in on a short comparison?

[https://aws.amazon.com/about-aws/whats-
new/2016/09/introduci...](https://aws.amazon.com/about-aws/whats-
new/2016/09/introducing-amazon-ec2-p2-instances-the-largest-gpu-powered-
virtual-machine-in-the-cloud/)

~~~
wmf
Amazon: up to 16 K80s and ~700 GB of RAM

Google: up to 8 K80s and ~200 GB of RAM but much more flexible

------
dylanz
There are a lot of excited posts here about this announcement! For someone
that doesn't use GPU's in everyday life, can someone explain why this is great
and maybe touch on the current landscape around GPU usage and the cost
landscape?

~~~
slizard
The excitement comes from the fact that some workloads benefit greatly from
GPUs -- mostly those where the GPU arch offers some niche-advantage that can
give GPUs >=10x perf[/W]. A good example is deep learning where Intel is
struggling to compete [1], and even in linear algebra there is a 4-5x power
efficiency difference [2, slide 6].

At the same time, in many other complex applications the performance/W/$
advantage has not been all that clear, especially since GPUs have been quite
behind in process technology. A good example is a simulation code I work on
where a 2-4x speedup from GPUs, which (due to inherent overhead of using
accelerators) gradually vanishes in strong scaling, means that when price is
considered, only cheap consumer cards can imrpove the performance/buck metric
significantly (that's time-to-solution/buck in our case), professional cards
don't offer significant significant improvements other than performance
density [3, figures 6,7].

There are plenty more examples that NVIDIA collects [4], but always take the
claims with a grain of salt, especially the ones with "incredible" >10x
speedup claims :)

Last, I'd note that with the recent 14-16nm jump, the gap between the
traditional CPUs and the simpler accelerator architectures has increased (I've
just seen MAGMA BLAS on Tesla P100 results which show >10x GFlops/W, more than
double that of the previous arch) and I expect it to keep increasing partly
due to the manufacturing/process technology gap shrinking between Intel and
the rest and partly due to architecture and programmability improvements of
accelerators.

[1] [https://www.nvidia.com/object/gpu-accelerated-
applications-t...](https://www.nvidia.com/object/gpu-accelerated-applications-
tensorflow-benchmarks.html) [2] [http://on-
demand.gputechconf.com/gtc/2015/presentation/S5476...](http://on-
demand.gputechconf.com/gtc/2015/presentation/S5476-Stanimire-Tomov.pdf) [3]
[https://www.academia.edu/13753737/Best_bang_for_your_buck_GP...](https://www.academia.edu/13753737/Best_bang_for_your_buck_GPU_nodes_for_GROMACS_biomolecular_simulations)
[4] [https://www.nvidia.com/object/gpu-
applications.html](https://www.nvidia.com/object/gpu-applications.html)

------
alecco
Is it possible to have a non-shared machine? Is it virtualized anyway?

~~~
boulos
Even if we had a "You get this box all for you", we'd still run our hypervisor
on it. We wouldn't give someone a raw, unfettered machine in our datacenter
(and I hope you wouldn't either!).

What is your concern with virtualization in this context?

Disclosure: I work on Google Cloud (and for a long time specifically Compute
Engine).

~~~
pktgen
> We wouldn't give someone a raw, unfettered machine in our datacenter (and I
> hope you wouldn't either!).

I think you're alluding to security issues with renting bare metal hardware to
potentially untrusted customers, right? I've often wondered about the security
implications of that - think backdooring BIOS/UEFI or other component
firmware. But there are many large providers that rent dedicated hardware
without a hypervisor. Are they just ignoring these concerns, hoping that this
kind of tampering would be rare anyway?

> What is your concern with virtualization in this context?

I'm not the parent poster, but performance was probably the concern. There's
still some (marginal, these days) overhead with virtualization in general, and
VT-d specifically.

~~~
boulos
From our perspective: yes, we don't feel comfortable letting untrusted users
run side by side without a sandbox.

In _this_ context though of GPU compute, you are just talking sporadically to
a PCIe attached device directly. There isn't any virtualization or sharing of
the device. And yes, I assumed that the complaint would be performance, but
especially with just a pass through GPU, it's going to be honestly funny to
find the right "oh wow this is 20% slower because _virtualization_
specifically".

~~~
pktgen
> From our perspective: yes, we don't feel comfortable letting untrusted users
> run side by side without a sandbox.

That's what I figured. I still wonder what other large providers (SoftLayer,
OVH, etc.) do about this. Unfortunately, I suspect the answer is "nothing, we
just hope/assume nothing will happen."

> In this context though of GPU compute, you are just talking sporadically to
> a PCIe attached device directly.

I know; I'm familiar with passthrough. It's not completely "free," but the
overhead is very minor. I don't think anyone will be able to complain with a
straight face about its performance. :P

------
shaklee3
Does anyone know what the cost will be for these? AWS is quite high for the
K80.

------
nojvek
Nvidia got a massive bump in share price. I was quite sad because I sold all
my shares after election downfall. I think this announcement might have caused
the huge peak. Could have made 10% in one day.

~~~
kayoone
They announced very good results for the quarter, i think that was the reason
for this spike. Their datacenter/compute revenue grew by >150%

------
n00b101
Great news!

------
eDameXxX
Similiar: [http://nvidianews.nvidia.com/news/nvidia-and-microsoft-
accel...](http://nvidianews.nvidia.com/news/nvidia-and-microsoft-accelerate-
ai-together)

------
jaspervdmeer
Mine ALL the bitcoins

------
largote
I wonder what kinds of cores will be available and whether that will be
visible. Optimizing your code for a particular GPU architecture can have
massive performance differences, much more so than for GPUs.

~~~
boulos
9300x2s, K80s and P100s.

From the post:

> Google Cloud will offer AMD FirePro S9300 x2 that supports powerful, GPU-
> based remote workstations. We'll also offer NVIDIA® Tesla® P100 and K80 GPUs
> for deep learning, AI and HPC applications that require powerful computation
> and analysis. GPUs are offered in passthrough mode to provide bare metal
> performance. Up to 8 GPU dies can be attached per VM instance including
> custom machine types.

Disclosure: I work on Google Cloud and pitched in on this.

------
kesor
Am I the only one annoyed that their "announcement" talks about something that
_will_ happen in the future?

What kind of asshole move is this? Why not just say "here, you can use it now,
good luck".

~~~
puzzle
Presumably you can contact them and ask for early access.

~~~
boulos
Yes, that's the goal of the survey:

> Tell us about your GPU computing requirements and sign up to be notified
> about GPU-related announcements using this survey. Additional information is
> available on our webpage.

Survey: [https://goo.gl/mgEI9X](https://goo.gl/mgEI9X) Landing Page:
[https://cloud.google.com/gpu/](https://cloud.google.com/gpu/) (which also
links to the survey and indicates as you rightly presumed, that if you fill it
in, we'll add you to our waitlist)

Disclosure: I work on Google Cloud.

