
New AMD EPYC-based Compute Engine family, now in beta - cvallejo
https://cloud.google.com/blog/products/compute/announcing-the-n2d-vm-family-based-on-amd
======
boulos
Disclosure: I work on Google Cloud.

This has come up a few times, so I wanted to reiterate that these are the
Zen2/Rome parts not the first generation “Naples” parts. We didn’t bother
launching Naples for GCE, because (as you can see) Rome is a huge step up.

~~~
Bluecobra
Are you using a custom CPU from AMD? I spun up a N2D instance and it's showing
up an "Epyc 7B12" and I can't find any details about this CPU anywhere.

------
mdasen
Since people from Google Cloud are likely here, one thing I'd like to ask/talk
about: are we getting too many options for compute? One of the great things
about Google Cloud was that it was very easy to order. None of this "t2.large"
where you'd have to look up how much memory and CPU that it has and
potentially how many credits you're going to get per hour and such. I think
Google Cloud is still easier, but it's getting harder to know what is the
right direction.

For example, the N2D instances are basically the price of the N1 instances or
even cheaper with committed-use discounts. Given that they provide 39% more
performance, should the N1 instances be considered obsolete once the N2D exits
beta? I know that there could be workloads that would be better on Intel than
AMD, but it seems like there would be little reason to get an N1 instance once
the N2D exits beta.

Likewise, the N2D has the basically same sustained-use price as the E2
instances (which only have the performance of N1 instances). What's the point
of E2 instances if they're the same price? Shouldn't I be getting a discount
given that Google can more efficiently use the resources?

It's great to see the improvements at Google Cloud. I'm glad to see lower-
cost, high-performance options available. However, I guess I'm left wondering
who is choosing what. I look at the pricing and think, "who would choose an N1
or N2 given the N2D?" Sure, there are people with specific requirements, but
it seems like the N2D should be the default in my mind.

This might sound a bit like complaining, but I do love how I can just lookup
memory and CPU pricing easily. Rather than having to remember name-mappings, I
just choose from one of the families (N1, N2, E2, N2D) and can look at the
memory and CPU pricing. It makes it really simple to understand what you're
paying. It's just that as more families get added and Google varies how it
applies sustained-use and committed-use discounts between the families, it
becomes more difficult to choose between them.

For example, if I'm going for a 1-year commitment, should I go with an E2 at
$10.03/vCPU or an N2D at $12.65/vCPU. The N2D should provide more performance
than the 26% price increase, yes? Why can't I get an EPYC based E-series to
really drive down costs?

Again, I want to reiterate that Google Cloud's simpler pricing is great, but
complications have crept in. E2 machines don't get sustained-use discounts
which means they're really only valuable if you're doing a yearly commitment
or non-sustained-use. The only time N1 machines are cheaper is in sustained-
use - they're the same price as Intel N2 machines if you're doing a yearly
commitment or non-sustained-use. Without more guidance on performance
differences between the N2D and N2, why should I ever use N2? I guess this is
a bit of rambling to say, "keep an eye on pricing complexity - I don't like
spending a lot of time thinking about optimizing costs".

~~~
boulos
Disclosure: I work on Google Cloud (and really care about this).

The challenge here is balancing diverse customer workloads against the
processor vendors. Historically, at Google, we just bought a single server
variant (basically) because almost all code is expected to care primarily
about scale-out environments. That made the GCE decision simple: offer the
same hardware we build for Google, at great prices.

The problem is that many customers have workloads and applications that they
can’t just change. No amount of rational discounting or incentives makes a 2
GHz processor compete with a 4 GHz processor (so now, for GCE, we buy some
speedy cores and call that Compute Optimized). Even more strongly, no amount
of “you’re doing it wrong” actually is the right answer for “I have a database
on-prem that needs several sockets and several TB of memory” (so, Memory
Optimized).

There’s an important reason though that we refer to N1, N2, N2D, and E2 as
“General purpose”: we think they’re a good balanced configuration, and they’ll
continue to be the right default choice (and we default to these in the
console). E2 is more like what we do internally at Google, by abstracting away
processor choice, and so on. As a nit to your statement above, E2 _does_ flip
between Intel and AMD.

You should choose the right thing for your workloads, primarily subject to the
Regions you need them in. We’ll keep trying to push for simplicity in our API
and offering, but customers really do have a wide range of needs, which
imposes at least some minimum amount of complexity. For too long (probably) we
attempted to refuse, because of complexity, both for us and customers. Feel
free to ignore it though!

~~~
alfalfasprout
I mean, this mentality often _is_ wrong. Scaling out actually isn't the right
solution for everyone. It works for Google given that primarily web services
are offered. It does not work for workloads that heavily rely on the CPU
(think financial workloads, ML, HPC/scientific workloads) or have realtime
requirements. In fact, for many ETL workloads vertical scaling proves _far_
more efficient.

It's long been the "google way" to try and abstract out compute but it's led
to an industry full of people trying to follow in their way and
overcomplicating what can be solved on one or two machines.

~~~
erulabs
Except, almost without exception, eventually the one or two machines will fall
over. Ideally you can engineer your way around this ahead of time - but not
always. Fundamentally relying on a few specific things (or people) will always
be an existential risk to a big firm. Absolutely agree re: start small - but
the problem with “scale out” is a lack of good tooling - not a fundamental
philosophical one.

~~~
aidenn0
Plenty of services can deal with X hours of downtime when a single machine
fails for values of X that are longer than it takes to restore to a new
machine from backups.

~~~
darkwater
I do not agree, and it is not my experience.Mind you that I've always worked
in small/mid-sized businesses (50-300 employees) and basically every service
has someone needing it for their daily work. Sure, they may live without it
for some times, but you will make their lives more miserable.

And anyway if you already have all in place to completely rebuild every SPOF
machine from scratch in a few hours, go the extra mile and make it an
active/passive cluster, even manually switched, and make the downtime a
minutes thing.

~~~
aidenn0
A small amount of work over a long period of time (i.e. setting up a redundant
system) may be worse than losing a large amount of work in a short period of
time.

Single machines just don't fail that often. I managed a database server for an
internal tool and the machine failed once in about 10 years. It was commodity
hardware, so I just restored the backups to a spare workstation and it was
back up in less than 2 hours. 15 people used this service and they could get
_some_ work done, without it, so there was less than 30 person-hours of
productivity lost. If I spent 30 hours getting failover &c. working for this
system over a 10 year period, it would have been more hours lost for the
company than the failure caused.

------
tpetry
Now its getting really interesting: In the end you have to compare pricing for
a vCore (which is a thread on a cpu) with per-thread performance on AMD vs.
Intel. Does anyone know a benchmark like this? Epyc Processors are most often
tested on heavy parallelizable tasks and not strictly single thread tasks.

~~~
t3rabytes
I did some rudimentary testing with AMD vs Intel on AWS recently and found
that AMD lacked enough in single threaded perf that it meant they weren’t
worth the savings for our workloads (Rails apps).

~~~
theevilsharpie
AMD-based AWS instances are running on first-generation Epyc processors.

Compared to the second-generation Epyc processors that Google is using, the
first generation has lower clock speeds, can execute fewer instructions per
clock (particularly in terms of floating-point operations), has substantially
less cache, and has a more complicated memory topology that can negatively
impact the performance of workloads that aren't NUMA-aware.

In short, your experience with AMD in AWS isn't relevant to Google's
offerings.

------
boulos
Disclosure: I work on Google Cloud.

cvallejo is the PM, so ask her anything!

~~~
JoshTriplett
Does the new n2d machine type support nested virtualization?

(Asking because Azure supports nested virtualization but only on some machine
types. AWS doesn't support nested virtualization at all. Google Cloud seems to
support nested virtualization on other machine types.)

Also, why 224 rather than 256?

~~~
boulos
We do not yet support AMD's nested implementation (we do on Intel). _But_
cvallejo is _also_ the PM for Nested :).

As for 224, we've always reserved threads on each host for I/O and so on.
Figure 2 from the Snap paper [1] is probably the best public reference. We
also don't make it clear (on purpose) what size the underlying host processors
are, though you can clearly guesstimate pretty easily.

[1]
[https://research.google/pubs/pub48630/](https://research.google/pubs/pub48630/)

~~~
JoshTriplett
> We do not yet support AMD's nested implementation (we do on Intel).

Any particular reason for that limitation, or just "not implemented yet"? (Not
asking for product roadmaps, just wondering if there's a specific technical
issue that makes it more difficult to support.)

> As for 224, we've always reserved threads on each host for I/O and so on.
> Figure 2 from the Snap paper [1] is probably the best public reference.

That's helpful, thank you.

~~~
boulos
Not implemented yet. In the stack rank of “stuff needed to update our
hypervisor for AMD again” it wasn’t at the top :).

Note the again as well: GCE _originally_ had it such that N in N1 meant iNtel,
and A1 was for AMD (as Joe said publicly here:
[https://twitter.com/jbeda/status/1159891645531213824](https://twitter.com/jbeda/status/1159891645531213824)).
By the time I joined though, we didn’t see the point of the A1 parts, since
the Sandybridge’s smoked them.

------
privateSFacct
Does AWS have a comparable offering? I haven't seen anything on EPYC -
congrats to GCP for moving quickly. I'm mostly AWS based currently.

~~~
wolf550e
See [https://aws.amazon.com/about-aws/whats-new/2019/04/amazon-
ec...](https://aws.amazon.com/about-aws/whats-new/2019/04/amazon-
ec2-t3a-instances-are-now-generally-available/)

~~~
Jyaif
Your link is for the 1st gen EPYC cpus, while this announcement is for the 2nd
gen.

------
tkinz27
Its really great to see more AMD options for cloud instances. Now I'm just
waiting for more ARM architecture options. Not having to cross compile code
(a1/m6g AWS instance types) has been very useful in my day to day job.

------
m0zg
Well, they may be hypothetically "available" in us-central1, but they don't
show up for me. All I see is "Cascade Lake powered" N2.

~~~
boulos
Disclosure: I work on Google Cloud.

Sorry for the confusion. We should be more clear and explicitly state "now
rolling out". Over the course of our multi-day rollout, different regions and
the console will start showing these.

------
pier25
What are the implications? Higher perf and/or lower price?

~~~
kart23
Looks to be about $5/Month cheaper, based on this page:

[https://cloud.google.com/compute/all-
pricing#n2_machine_type...](https://cloud.google.com/compute/all-
pricing#n2_machine_types)

Jeez, I didn't realize how expensive cloud compute was. I always wondered why
my school still has a datacenter. Having your own servers still makes sense
for a lot of orgs.

~~~
acdha
It’s a question of how dynamic your usage is and how much the better security
and management features save you. If you have very consistent workloads with
little idle hardware and modest admin needs you can beat cloud environments
even with reservations but usually when I see the numbers it means major costs
like staffing or power / HVAC aren’t being factored in.

------
Cyclenerd
Geekbench Multi Core Benchmarks:

1\. n2-standard-16 (Intel Cascade Lake):
[https://browser.geekbench.com/v5/cpu/1257619](https://browser.geekbench.com/v5/cpu/1257619)

2\. n2d-standard-16 (AMD EPYC):
[https://browser.geekbench.com/v5/cpu/1257340](https://browser.geekbench.com/v5/cpu/1257340)

3\. n1-standard-16 (Intel Skylake):
[https://browser.geekbench.com/v5/cpu/1257420](https://browser.geekbench.com/v5/cpu/1257420)

Intel Cascade Lake is with a very small lead on 1st place. But AMD is with
three-year commitment the cheapest option. Great to have a choice. Thx
@cvallejo

------
dsign
Can those instances be used in GKE node pools?

~~~
boulos
Disclosure: I work on Google Cloud.

Should be, once the rollouts complete. So once you can see N2D types in the
Console for your project, I _think_ it'll just flow naturally to GKE.

~~~
cvallejo
Disclosure: I also work on Google Cloud

That's correct! Once the rollout completes, you will be able to use N2D
instances for GKE!

~~~
p_l
And if it doesn't, it was always possible to break GKE state into "offering"
node types that were not supported. Source: me, using preemptible instances
for a year before they got into GKE ;-)

Modifying instance group templates is your friend.

~~~
crazysim
Yeah, this can be done for nested virtualization instances (images) too.

------
ensacco
What's the intent / timeline for N2D in other regions, e.g. us-west1?

~~~
cvallejo
disclosure: I work at google!

We will be expanding the regional footprint of N2D. US-west1 should come
online in the first half of this year.

~~~
colinmcdonald22
Not directly related, but do you know when we can expect to see N2 instances
in us-east1-b? Currently they're in 2/3 of the zones, just one annoying zone
short of being able to use it in my GKE cluster.

------
benbro
AMD EPYC has SHA hardware accelerations but in my test "openssl speed -evp
sha1" is slower than "openssl speed sha1". Any idea why?

------
carbocation
Any idea when these will be available on the genomics pipeline API (now "Cloud
Life Sciences" API)?

------
lallysingh
What's the topology of these machines? Dual socket 64c chips with some
reserved (or disabled)?

~~~
wmf
They don't say that but that's the only way to provide 224 threads.

~~~
bob1029
Their topology tells far richer tales than the press releases when you dig
deeper on the numbers.

224 threads = 112 HT cores = 2 x 56 core CPUs. This is 8 cores short of the 64
core flagship. 8 cores == 1 CCX.

It seems exceedingly unlikely that AMD would produce a Rome CPU with 7 out of
8 CCX in perfect health, but have the 8th CCX completely missing
(functionally). It seems more likely that the 8th CCX is there with all 8
cores, and that it is reserved for some other type of service. One possibility
could be that there are higher guarantees of side channel protection at the
CCX boundary, and google intends to use this for secure internal functions or
sell to clients who have side channel sensitivity. Another may be that they
simply want the hypervisor to have a very fat budget of 8 cores per socket to
work with. Considering the amount of potential IO going on, this might be
required in some cases.

~~~
smueller1234
See also boulos' comment:

"As for 224, we've always reserved threads on each host for I/O and so on.
Figure 2 from the Snap paper [1] is probably the best public reference. We
also don't make it clear (on purpose) what size the underlying host processors
are, though you can clearly guesstimate pretty easily."

Ie. if there's even one thread reserved on the host, your speculation comes to
naught. Sorry.

------
tempsy
AMD's stock is wild. It was around $2 just a few years ago and has been on a
non-stop trend up to almost $60 today.

~~~
sdesol
They really are doing something disruptive. I can't quite remember if this is
correct (it has been a while since I last studied business), but in business
there is a "blue ocean strategy". The basic premise is, if you can provide a
product for half the price, with the twice the value, you will destroy the
incumbent.

What AMD is doing is really insane in my opinion. I'm not sure if they are
pricing their processors low on purpose and/or if they have found a way to
manufacture cheaper and/or Intel was screwing consumers with their pricing
since they were so dominate.

No matter what, AMD is able to provide something that is measurably better and
significantly cheaper than the incumbent, and if the blue ocean strategy
holds, they should become the new incumbent in the near future.

~~~
thesz
Smaller chips have better yield. As AMD's current chips are composed from
several smaller ones (I believe two or three), each composite has better yield
than one bigger of same real estate size.

So yes, they figured out how to produce cheaper solutions.

~~~
dragontamer
> As AMD's current chips are composed from several smaller ones (I believe two
> or three)

For EPYC, AMD is using nine chips:
[https://images.anandtech.com/doci/13561/amd_rome-678_678x452...](https://images.anandtech.com/doci/13561/amd_rome-678_678x452.png)

That's 1x I/O chip (kind of like a router), and 8x chips, each of which has 8x
cores on it. Total for 64-cores / 128-threads across 8-compute chips, talking
together through a central 1x I/O and Memory chip.

The I/O chip is the biggest for reasons: 1. Its made on a cheaper process. 2.
It has worse performance than the compute chips. 3. Its required to be big
because driving external I/O requires more power.

So the I/O chip can be made on a cheap / inefficient 14nm process, while the
CPUs can be made on a more expensive 7nm process (maximizing clock rates,
power-efficiency). The big I/O ports are going to eat up a lot of power
regardless of 7nm or 14nm process, so might as well save money here.

------
867-5309
no pricing mentioned

I was surprised to discover the other day that one of my VPSs had been
upgraded from 1 old Xeon 26XX core to 2 EPYC cores. other stats unmetered
10Gb/s up/down, low latency A'dam location, 2GB RAM, SSD.. it even
outperformed my i7-8700T in a single-core openSSL benchmark. most importantly
it costs €3/mo

I really can't see google competing with that

~~~
e12e
I see 10gps listed at USD 569/month?

[https://www.scaleway.com/en/virtual-instances/general-
purpos...](https://www.scaleway.com/en/virtual-instances/general-purpose/)

~~~
WhiteOwlLion
Euros is not the same as US Dollar

569 Euro equals 613.73 United States Dollar

~~~
e12e
Right. Either way it's a little over 3 USD/month..

~~~
867-5309
€3/mo, and for a different plan

