
AMD is determined to gets its rightful datacenter share - ChrisArchitect
https://www.nextplatform.com/2020/03/06/amd-is-determined-to-gets-its-rightful-datacenter-share/
======
drewg123
My biggest concern in moving to AMD is support for profiling and system
visibility. On Intel, there are many more performance counters exposed, and
Intel themselves make great tools (Vtune, Intel PCM).

AMD has a profiler, but it is nowhere near as capable as Vtune.

AMD has ways to do some things that Intel's PCM tools do (like monitor memory
bandwidth), but they are exceedingly awkward (eg, only option is producing a
CSV, no live output). And there are lots of things which are just missing
(like support for monitoring NUMA fabric bandwidth, PCIe bandwidth, power,
etc). I gave a talk at AMD in Austin last fall where I beat them up about some
of these things.

Oh, and I'm up to over 260Gb/s of 100% TLS traffic from an AMD Rome :)

~~~
AaronFriel
I think I see a comment to this effect on every post about AMD, that their
software ecosystem isn't as advanced.

But, isn't it a pretty narrow market that benefits from Intel's tooling here?

Startups running their entire business on JITed runtimes on top of multiple
layers of virtualization in cloud providers aren't using this. Enterprises
with enough resources to revise their entire stack to work on ARM (Cloudflare,
Microsoft, Google, Amazon) likely have the internal resources and skills
already to build the missing tooling and profile statistically using massive
fleets of servers.

How large is the target market of Intel purchasers that are large enough and
performance sensitive enough that it's profitable to micro-optimize assembly,
and yet not large enough that it's a rounding error to build similar tooling
for AMD x86-64 or ARM or otherwise?

~~~
drewg123
Its not a question of just building tooling. It is a question of AMD having
fewer performance counters. So even organizations like Google with tons of
SWEs can't do anything if there are no performance counters, or if the
performance counters are not exposed.

And we (Netflix Open Connect) kind of fit that middle ground that you're
talking about.

~~~
mda
Any examples of missing counters that are essential? I am curious because I
rarely see people care about things further than TLB, L1, L2 misses (which
exist on AMD).

------
chx
> But it will get very ugly if the market tightens and Intel can only lose
> both revenues and profits in a price war. There is no way to maintain at
> all. Period.

Yeah adding "period" to a sentence like that totally makes up for the lack of
data backing it. AMD spent years in the red and they didn't disappear. I
looked it up and Intel cash on hand for the quarter ending December 31, 2019
was $13.123B, a 12.64% increase year-over-year. That's about two quarters
profit. Their debt to equity ratio is 0.33 which is on the lower end. They
could double their ~29B debt if badly necessary and still be considered a good
investment. That two together makes for one gigantic war chest...

~~~
dirtydroog
I remember similar things being said about nokia

~~~
chx
Nokia has found itself unable to compete in new markets -- previously the
smartphone has grown up from the phone and the PDA but the new smartphones
have shrunk down from computers and this was fundamentally different. Let's
say the peak of Symbian Nokia smartphones were the E71/E72 -- the E72 and the
iPhone 3GS were released practically the same day. By this time some 100 000
apps were available for the iPhone (and a bit more than 10 000 for Symbian but
it was a fractured ecosystem, another fatal flaw) and cumulative they have
been downloaded a billion times. But even importantly, Symbian has started as
the EPOC, the OS of the Psion PDAs in the eighties and it simply was unable to
function well in the new world -- you can patch a preemptive system only so
many times. Android and iOS were built on bona fide Unix. No such thing
hampers Intel.

There's no new market here, just intensified competition and Intel has 7nm in
the pipeline late 2021/early 2022, that's how far they need to survive on the
venerable 14nm and the totally broken 10nm process.

------
nominated1
My biggest concern when considering AMD on Linux is drivers. I see in my feeds
Phoronix is constantly reporting on updates to recent kernels. However, most
of these power management, etc equivalent features have been available for
years from Intel.

How are people finding AMD hardware in the real world wrt Linux on say 4.19 or
5.4 kernels?

~~~
jchw
I run Ryzen on Linux 5.5 on my desktop workstation at home. Admittedly I am
always running the latest kernel as long as I reboot often enough, but it has
been good. I also use an AMD GPU, and that has been very nice. Intel probably
still has an edge on supporting the Linux graphics stack but not by much, and
AMD GPUs have a lot more muscle, so it’s not a bad trade off imo. Anything to
avoid NVIDIA really.

(To those who love NVIDIA on Linux: I get it. However, Linux needs to move on
from X11 and the old, broken, fragmented graphics driver “model.” NVIDIA will
always be a second class citizen on Wayland until they change their tune
regarding open source. But hey, maybe they’re OK with losing Linux and Mac
users, and if that’s their position I’m perfectly happy losing them.)

~~~
gowld
Do AMD GPUs work well with ML drivers (CUDA and competitors)? Are higher-level
drivers like PyTorch and TensorFlow compatible (and optimized, or else no
point buying an expensive GPU to get only half-performance) with AMD backends?

~~~
lostmsu
TensorFlow works with AMD's CUDA alternative named ROCm. You'd need to check
installation instructions though.

~~~
yangikan
How about Pytorch?

~~~
jamesblonde
You have to compile it yourself, which sucks. For some reason PyTorch aren't
agreeing to upstream ROCm, yet.

~~~
bgorman
You don't need to compile it, AMD provides docker images.

For example I was able to run fast.ai/pytorch oh my amd gpu
[https://github.com/briangorman/fastai_rocm_docker](https://github.com/briangorman/fastai_rocm_docker)

------
bluedino
I wonder how many sales they miss out on because you can't mix AMD and Intel
in VMware clusters.

~~~
erik_seaberg
Is this because runtimes optimize for the CPU stepping during startup? How
many people actually rely on moving workloads around as live processes without
restarting? I thought the industry had settled on "cattle, not pets" and
letting instances appear and die as long as a quorum is maintained.

~~~
AdamJacobMuller
> cattle, not pets

People running VMWare don't even consider that.

Most people have that as a goal which they aren't achieving (and that's fine).

I would doubt there's any company in existence actually attaining that goal.

~~~
erik_seaberg
I admit I haven't had occasion to use VMware for a decade. Different world, I
guess.

~~~
AdamJacobMuller
I don't use it either, but, it's The Thing in some IT departments.

It does offer some amazing capabilities for making old legacy applications
that were never designed to have any semblance of failover or redundancy or
disaster recovery operate in impressively resilient ways. You also pay for the
privilege.

------
walrus01
There was a brief period in the early 2000s when AMD had a clear performance
lead in 32-bit and x86-64 datacenter stuff.

This was at the time of the first generation Opteron CPUs (single core), which
could be configured in a dual socket motherboard in a 1U server. They had a
good price/performance advantage over Intel, which was also at the time a
single core per socket.

Around 2006 or so Intel started pulling ahead and AMD never really caught up.

With the development of things based on the zen/zen2 cores, now may finally be
the time...

~~~
thedance
Your timeline is interesting to me because it suggests you think Intel started
to pull ahead with in 2006 Clovertown (quad core MCM with northbridge memory
controller) rather than with Nehalem or Westmere (integrated memory
controllers) in 2009-10.

~~~
kllrnohj
Conroe, the Core 2 Duo E6300/E6400 era in 2006, is where Intel definitively
took the crown away from AMD. And AMD never got close to it again until
Zen/Zen2.

Intel's much better prefetcher let them get away without having an integrated
memory controller.
[https://www.anandtech.com/show/2045/5](https://www.anandtech.com/show/2045/5)

~~~
thedance
Clovertown was basically two Core 2 Duos in a multi-chip module, and capable
of multi-socket system configs. The AMD fans at the time snickered at the MCM,
for being a hack and a workaround, not a "true" quad core part. AMD didn't
launch the "true" quad core server part until a year after Clovertown. That
was the "Barcelona" Opteron 23xx, and it was terrible, and broken. To me that
was when AMD faded.

------
craigkilgo
100%?

------
chrisseaton
Rightful? You don't have any _right_ to people choosing your product.

~~~
kube-system
That is not the way the word is being used in this context.

Here, it simply means 'fitting'. It is a statement of opinion, not legal fact.

i.e. "the sports champion finally claimed their rightful place in the hall of
fame."

------
RKearney
They need to work on making faster cores instead of just throwing a ton of
cores at the problem. In the datacenter, we commonly have to pay per-core
license costs for certain vendor software. I'm not going to pay 7 figures to
grow our Oracle license for example by switching to AMD when we can get Intel
CPUs with faster, fewer cores.

~~~
rmnoon
Out of curiosity have you read benchmarks recently on Rome? Per-core
performance is quite competitive now (unlike in the recent past).

~~~
RKearney
No but I'll certainly pass the word along. I handle the network and wasn't
directly responsible for evaluating AMD. I just heard the reasons why it
didn't work and saw the not-so-impressive benchmarks when looking at single
core performance. This was 2-3 years ago.

~~~
philjohn
Power is also a high cost in a good data center, and Epyc also smokes Xeon for
perf-per-watt.

