
Why Cloudflare Chose AMD EPYC for Gen X Servers - jgrahamc
https://blog.cloudflare.com/technical-details-of-why-cloudflare-chose-amd-epyc-for-gen-x-servers/
======
aloknnikhil
I really wish AMD would focus on its tooling. The Intel VTune Amplifier is a
fantastic multi-platform profiler that can help understand how effectively
your software is using the hardware (Pipelining, micro-ops efficiency, cache
usage, etc.).

If AMD could come up with something similar, it'd make their offering a no-
brainer.

> [https://software.intel.com/en-us/vtune](https://software.intel.com/en-
> us/vtune)

~~~
effie
I doubt tooling is the problem that is slowing down AMD adoption. Compiler or
special applications writers care about that. These are not the ones who pay
AMD for their platform. It's the big businesses - most care about stability,
predictability, reliability hence they stay with Intel. Those that are
interested mostly in bang/buck, like Cloudflare, choose to go with AMD,
tooling or no tooling. You don't need tooling to check that AMD is far more
efficient now, you just do some basic load tests and the result is easy to
understand.

~~~
mlyle
If you have crappy tooling, the software other people write -- in house or
commercial software-- runs crappier on your hardware.

~~~
dpacmittal
But it's irrelevant if it works faster than intel

~~~
hamandcheese
It’s not irrelevant if Intel has tools to let you close that gap easily.

~~~
jfkebwjsbx
Tools can't close a gap of entire cores, much less twice or more of them.

~~~
kohtatsu
It's a matter of principle.

Some people don't want cores that you can't gather insight into how you are
spending/wasting their time.

I'm not a direct customer of Intel or AMD, but in general I would trade off
clock speed/cores/cost for insight into what my software is doing.

------
jasode
Yes, there are lots of stories of individuals and companies choosing AMD over
Intel. For me, what's more interesting would be any _contrarian_ view listing
the compelling reasons to pay the higher price for Intel instead of AMD. Off
the top my head, some reasons would be:

\- Intel AVX-512 instruction set

\- Intel 4 and 8 sockets

What realistic workloads have a better performance/cost advantage on Intel vs
AMD EPYC?

If you're a home user (gamer, programmer), is there any reason to buy Intel
today? I'm about to buy another motherboard & CPU and considering AMD for the
first time. In the past, I avoided it because AMD had an incompatibility (e.g.
VMware) that kept me on Intel.

(E.g. previous example of AMD showstopping bug affecting VMware:
[https://communities.vmware.com/thread/456094](https://communities.vmware.com/thread/456094))

~~~
throw_554323
> If you're a home user (gamer, programmer), is there any reason to buy Intel
> today?

If you need EEC ram?

I know technically AMD supports it for their Ryzen line, but everything I’ve
read says it’s up to the motherboard manufacturer to support it. And even if
the motherboard manufacturer says you can use ECC ram, it doesn’t mean it will
actually “use” it correctly.

So you mainly need to rely on others doing tests on their own systems to
confirm if the motherboard actually uses it.

My understanding is that you need to use the EPYC line for guaranteed ECC
usage, which is really expensive.

Intel’s W series Xeons start off at a reasonable price, and there are even
cheaper Celerons that will use ECC ram.

~~~
microcolonel
> _I know technically AMD supports it for their Ryzen line, but everything
> I’ve read says it’s up to the motherboard manufacturer to support it._

AFAIK every ASRock AM4 motherboard supports ECC.

Also, most Intel processors do not support ECC at all, except the ones which
are competing (in some limited sense, since they are being absolutely
_crushed_ at most price points) with Threadripper, in which case, ECC is
available across the board (and glorious quad-channel memory).

------
KaiserPro
At FAAG its well known that some AMD procs are faster, cheaper and cooler(well
higher performance per watt). None of that is in doubt, what they are having
trouble with is supply.

It appears that AMD are not able to provide either the support (microcode
tweaks, firmware heads up, firmware testing and bug reporting) or the volume
to be useful.

The conclusion was that AMD want to nail consumer first.

------
ddevault
SourceHut runs its performance critical infrastructure (git, builds) on AMD
EPYC. These CPUs are sweeeet. The hard part is sourcing motherboards. I
currently buy them from some dude in Germany on eBay because he's the only
consistent source I've been able to find.

~~~
fierarul
> SourceHut runs its performance critical infrastructure (git, builds) on AMD
> EPYC. These CPUs are sweeeet. The hard part is sourcing motherboards. I
> currently buy them from some dude in Germany on eBay because he's the only
> consistent source I've been able to find.

Heh, you know that guy is with / getting a visit from the German BDN now.

~~~
sauerbraten
*BND, from Bundesnachrichtendienst

~~~
bithavoc
Why, is it illegal?

------
MikusR
Two years ago they switched to ARM
[https://news.ycombinator.com/item?id=16646701](https://news.ycombinator.com/item?id=16646701)

~~~
jgrahamc
We tried to switch to ARM. We maintain our ARM port of all our code for the
day there are servers we can use. However, we are in production with AMD now.

From one of the posts: "Readers of our blog might remember our excitement
around ARM processors. We even ported the entirety of our software stack to
run on ARM, just as it does with x86, and have been maintaining that ever
since even though it calls for slightly more work for our software engineering
teams. We did this leading up to the launch of Qualcomm’s Centriq server CPU,
which eventually got shuttered. While none of the off-the-shelf ARM CPUs
available this moment are interesting to us, we remain optimistic about high
core count offerings launching in 2020 and beyond, and look forward to a day
when our servers are a mix of x86 (Intel and AMD) and ARM."

~~~
matt2000
Do you see any ARM chips on the horizon this year or next that might change
the equation? It looks like Amazon is seeing 40% better performance per watt
(I think they said the power draw is comparable somewhere) with their custom
design, will a commercially available chip be able to match that soon?

Side note: Thanks for making a great product and keeping it cheap/free for
hobby projects! I got comfortable with the basic features on a small project
and now advocate for it's use regularly.

~~~
jgrahamc
We continue to look at ARM and test things. We continue to look at Intel and
test things. Right now AMD makes the most sense for the servers we are
deploying today. But don't be surprised if we have more Intel in our future or
ARM.

------
gdm85
> Notably, Intel is not inside.

I can hear the loud silence of that full stop.

------
iRobbery
Now i wonder if these were considered/known

[https://www.servethehome.com/ampere-altra-80-arm-cores-
for-c...](https://www.servethehome.com/ampere-altra-80-arm-cores-for-cloud/)

------
lukevp
I’m a big fan of CloudFlare. I own a little stock, and I moved our company
over to it after running on a competitor for about a year. CloudFlare has been
much better. Whenever I use the front end, it just feels like it’s made by
people who actually need to use this stuff, and it is so straightforward. The
api is great too. When we were with the competition it was ridiculous how
often we had to have an engineer assist us with setup because the front end
didn’t support some feature we needed. With CF we signed up one day and have
never had to have manual changes made by support, and the health checks, load
balancing and failover have worked from day one. With the competitor the
health check story did not live up to the reality and we had several outages
related to the failover system not recovering properly.

When they publish these blog posts, it shows to me that they are willing to
consider bold moves and that they aren’t afraid to be different. Also, that
their architecture has been designed with this agility in mind. I’ve visited
their offices and while their corporate culture didn’t seem like the best fit
for me, I liked the team and what they are doing.

~~~
endorphone
"it shows to me that they are willing to consider bold moves and that they
aren’t afraid to be different"

Because they benchmarked a drop-in-place commodity replacement as being a
better performance-per-dollar for their operation? I'm not sure what's bold
about that. I find the notion that they posted a blog entry about this pretty
bizarre -- almost rinky-dink. It seems like it's pandering to the AMD fanbase
(which I am right now, just as anyone who is paying attention is, but I can
still recognize it)

EDIT: Sorry, I didn't realize I stepped into a Cloudflare circlejerk.
Apologies and carry on. So brave!

~~~
sounds
There's a ton, just a ton of intel-specific integrations and Cloudflare's
switch to AMD was not just a "drop-in-place" move.

~~~
endorphone
What "intel-specific integrations"? Cloudflare specifically talks about how
they keep their stack runnable across a wide-range of platforms, including
even entirely different instruction sets (e.g. ARM). Their own words support
my statement.

I'm pretty sure it was a "drop-in place" move, and they say absolutely nothing
to the contrary. They benchmarked a system that performed better than their
existing system for whatever criteria they targeted, and chose it. Story at
11. Saying they "abandoned" Intel is uproarious given that if Intel pushes out
a new 128-core Xeon at a decent price, they'd as easily "abandon" AMD for
Intel.

~~~
jgrahamc
As I say in another comment... AMD makes sense for us today because of the
performance per watt. We continue to keep our software ARM-ready and look at
ARM-based solutions. We also continue to look at Intel. We'll use what gives
us the best requests per watt.

But it is notable that these are our first servers wit NO Intel components at
all.

~~~
jjeaff
I'm curious why requests per watt is an important metric?

Are you running up against some sort of heat limit or optimizing for
electricity cost?

Seems like requests per amortized dollar all in would make more sense. Or
requests per cubic foot of data center space if that is your limitation.

~~~
fierarul
This is a good question and I would like to learn the answer too.

According to some napkin calculations, the CPU is $5000 and will be kept in
service, I guess, 5 years.

The US average price is 13 cents / kWh so this 225 watts CPU will cost $1280
in electricity over those 5 years.

Colocation price is probably low, let's say $50 / 1U server / month. So that's
another $3000.

Why is electricity the most important here? I guess because you can't lower
the other 2 costs very much? But isn't the CPU price itself based on the power
savings it does provide?

~~~
tbyehl
Data center space is generally plentiful. Power and cooling, which for most
purposes can be considered equivalent, are the scarce resource.

CloudFlare is trading half their server density[0] for a ~25% gain in
performance-per-watt and performance-per-node[1]. Power is the whole ball-game
at scale.

[0] [https://blog.cloudflare.com/a-tour-inside-
cloudflares-g9-ser...](https://blog.cloudflare.com/a-tour-inside-
cloudflares-g9-servers/)

[1] [https://blog.cloudflare.com/an-epyc-trip-to-rome-amd-is-
clou...](https://blog.cloudflare.com/an-epyc-trip-to-rome-amd-is-
cloudflares-10th-generation-edge-server-cpu/)

~~~
fierarul
Ah, so it's not about power being a big expense, but being a constraint that
is limiting their growth:

> We are constrained by the data centers’ power infrastructure which, along
> with our selected power distribution units, leads us to power cap for each
> server rack.

------
bluedino
Title is "Technical Details of Why Cloudflare Chose AMD EPYC for Gen X
Servers"

~~~
PhantomGremlin
Blog post is by Nitin Rao, Head of Global Infrastructure at Cloudflare.

Submission and editorialized title change is by John Graham-Cumming, CTO at
Cloudflare.

Not sure what to make of that.

~~~
sudhirj
jgrahamc is more active on HN, and he seems to keep tabs on all Cloudflare
discussions and answers questions when asked. Don't see what's odd about it.
Haven't noticed Nitin here.

~~~
Thorrez
It's not odd that he submitted it. It's odd that he changed the title, given
the HN guidelines against changing titles.

[https://news.ycombinator.com/newsguidelines.html](https://news.ycombinator.com/newsguidelines.html)

------
abriosi
Let's put it this way.

I'd rather program single core applications on my i5-4690k than any other
ryzen series chips including threadrippers with 12+ cores.

I have used two threadrippers for more than 6 months.

Linux still runs faster on i5-4690k which is 6 years old

~~~
undersuit
Odd, I switched from that processor to the AMD 2700 last month and I'm
enjoying the legroom.

