
New Compute Engine A2 VMs–First Nvidia Ampere A100 GPUs in the Cloud - boulos
https://cloud.google.com/blog/products/compute/announcing-google-cloud-a2-vm-family-based-on-nvidia-a100-gpu
======
NiekvdMaas
In case you're wondering about pricing: the V100 costs $1260/GPU/month and
this A100 will have about 2.5x its performance [1]. A n2d-highmem-96 instance
is $4377 per month. So for the maxed out a2-megagpu-16g I would expect around
$54k per month before usage discounts etc.

1\. [https://developer.nvidia.com/blog/nvidia-ampere-
architecture...](https://developer.nvidia.com/blog/nvidia-ampere-architecture-
in-depth/)

~~~
boulos
Disclosure: I work on Google Cloud.

I think the right way to think about the economics here is either “I would pay
$X/hr for this short-lived job” or “I want to compare with buying it” (3-yr
committed use discount in our case, RIs / Instance Savings Plan for AWS).
Unless you are an ML research lab (Google Brain, FAIR, OpenAI, etc.) or an HPC
style site sharing these, you won’t get 100% utilization out of your “I just
bought it” purchase. Worse, in ML land, accounting math about N-year
depreciation is pretty bogus: if the A100 is 2.5x faster, you’d have been
better off with a 1-yr CUD on GCP and refreshing, rather than buying Voltas
last year.

One amusing thing that’s not clear about “just buy a DGX” is that many people
can’t even rack one of these. At 400 watts per A100, our 16x variant is 6.4 kW
of GPUs. That’s before the rest of the system, etc. but there are (sadly) a
lot of racks in the world that just can’t handle that.

~~~
Zenst
Many excellent points and the one of interest is power usage - it'snot just
the cost of buying the kit, but also powering, maintaining (admin) aspects. So
the Bottom line of ownership is far bigger and as you say, you really need to
get 100% utilisation to capitalise in that level of outlay. Hence the on-
demand cloud option on many levels becomes cost effective, even if it does
seem not cheap. Sure it is not cheap, but overall, it can and does work out
way cheaper for many use cases.

Makes you wonder what kind of power costs you would incur running one 100%
utilisation. Certainly even with best prices, be looking at several thousand a
year I would of thought, that's not even factoring in provision. Which would
mean 3 phase power for that type of load and then you have to balance out the
phases. So many little details that become more an issue when you start
getting to datacenter level power usage. Then UPS load/capacity
costs/planning, networking. So whilst the costs of these units are high, the
other costs that add up, sure do add up fast.

~~~
jjoonathan
Eh, the Titan V I bought last year broke even vs AWS inside of a month ($2000
vs $2234).

Obviously there are factors going in to that -- I could live without paying
the Tesla tax (didn't need to virtualize, didn't need the vram, did need the
fp64), I bought used, I didn't have a problem keeping it fed, I didn't need to
burst, etc, but my point is that for some GPU workloads the cloud GPUs are
_really_ expensive and the break-even utilization is far south of 100%, more
like 5%.

~~~
echelon
I'm currently renting LambdaLabs V100 instances at 100% utilization for
training [https://vo.codes](https://vo.codes)

It's really expensive, and I think I should lean into buying hardware at this
point.

I want to build a high end GPU rig, but was wondering how easy the setup was.
I've only built "consumer" systems before (2x 1080Ti). Is there any
appreciable difference?

Do you have a single card? Multiple? What motherboard do you use?

Do you have any takeaways or resources you can share?

~~~
jjoonathan
I only have one card. I swapped the Titan V with my 1x1080 in my old, cheap
motherboard and it just worked. I had to hook up the water cooling, but I did
that because it came with a block, not because I optimized the thermal design.
To verify motherboard compatibility, I looked up in the nvidia specs how many
PCIe lanes and at what speed I should expect and confirmed in HWinfo that they
were active in that configuration -- much like I'd look at a network interface
to make sure my 10/5/2.5GbE hadn't turned into 1GbE on account of gremlins in
the wires.

I'm not using this for machine learning, so you might want to talk to someone
who is before pulling the trigger. In particular, my need for fp64 made the
choice of Titan V completely trivial, whereas in ML you might have to engage
brain cells to pick a card or make a wait/buy determination.

------
rjhacks
I understand it's exciting to see introductions of new machine types and new
GPUs, but for it to mean anything Google should instead get its house in order
on the GPUs they already offer. Getting an n1 instance with a Tesla T4 GPU in
any datacenter I've tried has a <50% success rate on any given day ("resource
unavailable" more often than not, they just don't seem to have enough of
them), which is _hugely_ damaging to our ability to rely on the cloud for our
workload. Worse, there's no way for me to work around it: I'd be willing to
switch zones, or machine type, or GPU type, but there is no dashboard or
support guidance that'll tell me if there's any such configuration that'll be
reliably available.

Because of that, seeing this A100 announcement is just a bummer, as I fear
it'll be just another "resource unavailable" GPU...

~~~
ckleban
Disclaimer: I work at google cloud.

Sorry to hear you have experienced this.

Customers can experience stock outs sometimes based on a variety of factors
but we can surely help you out as we have T4 GPU capacity available and like
you said, we may direct you to a different zone or region. Open up a support
case on the issue and we can help you out.

------
zetazzed
I think this is the most in-depth article on Ampere:
[https://developer.nvidia.com/blog/nvidia-ampere-
architecture...](https://developer.nvidia.com/blog/nvidia-ampere-architecture-
in-depth/)

Lots of architectural changes like MIG, new floating point formats, etc. Great
to see GCP getting VMs out pretty soon after launch so people can start
kicking the tires.

~~~
g_airborne
Very cool! Does anyone know how is software support for all these features? It
seems that TF doesn’t support the TP32/16 types as of yet. Is this something
only CUDA engineers can use right now?

It does seem a little fishy to me that NVIDIA often boasts with figures like
10x performance upgrade whilst in practice those are only possible if you use
one of their non-default float types which are hardly supported in most deep
learning libraries :(

~~~
zetazzed
Both PyTorch and Tensorflow teams have announced they'll support TF32. By
design, it interoperates well with existing code, since calling code can just
treat it as a regular FP32 value.

------
EwanToo
There's some more technical information on the A100 here
[https://www.anandtech.com/show/15801/nvidia-announces-
ampere...](https://www.anandtech.com/show/15801/nvidia-announces-ampere-
architecture-and-a100-products)

------
anon102010
I wish google / aws would avoid the overlapping names where possible.

the "A" series on AWS = AMD instances

The "A" series on GCP = Nvidia instances.

I know - probably on no ones radar at all :)

~~~
boulos
Disclosure: I work on Google Cloud.

Even worse is that for GCE, A was for AMD originally (and N was for iNtel). In
any case, this A is for Accelerator.

~~~
jeffbee
Are there any papers or blogs about how these GPUs are attached to the host? I
find it interesting that you can get a VM with 96 vCPUs, which I assume
amounts to a whole box (2x24-core hyperthreaded Xeon CPUs?) but either 8 or 16
GPUs. How does that keep from stranding 8 GPUs? Is there some kind of rack-
wide PCIe switch that can attach GPUs to various hosts or ??

~~~
boulos
_We_ sadly don’t talk about how we rack these at all, but the folks at
Facebook have made their OCP designs public for vaguely similar systems.

However, I’ll note that the 16 A100s here are way more expensive than the cpu
cores (and we can just run vanilla VMs on those left over cores if really
needed).

------
ricardo81
I've only had a very slight dip into the water of the GPGPU world in the past
5 years, but it is obvious that Nvidia have the lion's share of the market
when it comes to hosted solutions.

Still not sure why competitors don't have their GPU's offered in cloud
services. At the time, it seemed the alternatives offered a more economical
alternative. I was building a GPU version of Hashcash at the time fwiw.

~~~
Nokinside
Nvidia has software advantage. Unlike with AMD, you have libraries and ways to
build software that runs fast: CUDA, cuDNN.

~~~
BadInformatics
They do, but (as described elsewhere in this thread), the level of polish
isn't quite up to that of the CUDA stack. Add onto that the relatively abysmal
developer evangelism from AMD and it's not surprising that adoption is so
low...

------
ksec
GCP seems to be getting new hardware way faster than AWS. Both AMD's Zen 2 and
Nvidia's Ampere.

------
alfonsodev
As a ML beginner, seeing this new offerings, does a local setup(2x nvidia 2080
8gb) make sense or it's better to learn using the cloud and let hardware get
even cheaper.

The cloud it's a bit scary from learners perspective because it's not exactly
clear how much power one would need to practice the core concepts and see
actual results.

In the other hand a pc build, it's an upfront investment, less scary because
it's a fix cost, but also feels risky in case things progress quickly and
hardware gets outdate soon.

~~~
boulos
Disclosure: I work on Google Cloud.

Honestly, I’d start with Colab until you decide you need “dedicated hardware”.
It’s better to focus on learning before you decide “Okay, I’m serious about
this now”.

------
fhrifjr
100G of network bandwidth for 8-16 GPUs? Is it a joke? One GPU will consume
50G easily. Who is going to use it?

------
beamatronic
Imagine a Beowulf cluster of these

