Hacker News new | past | comments | ask | show | jobs | submit login
AMD lands Google, Twitter as customers with newest server chip (reuters.com)
837 points by jonbaer on Aug 8, 2019 | hide | past | favorite | 255 comments

It is worth pointing out the David versus two Goliaths situation here.

AMD 10K employees and a $1 billion annual R&D budget.

Intel 100K employees (including 10K just in software) and $20 billion R&D.

Nvidia 11K employees and $2 billion R&D budget.

AMD CPUs now surpass Intel and their GPUs are competitive with Nvidia except at the high end.

AMD software ecosystem is pretty bad. They fired many of the Linux platform devs in ~2013 and never recovered. There is nothing like Intel's vtune and even the basic functions like performance counters, MCA support, ECC events etc are janky on AMD CPUs. Intel does a pretty good job all around there. Nvidia is leagues ahead on GPGPU with CUDA and general Linux driver stability.. sure it's a blob but if it works well in a commercial setting nobody cares.

The Rome is a nice chip nonetheless, I hope they can rehire a big enough Linux team to get over these humps it isn't that much work on the CPU/platform side.

I work on game performance (or rather, I work on the underlying engines with an eye on performance) and that is the biggest reason why my computer has an Intel CPU. I would love the killing power of an Zen 2 machine, the compile time improvement alone would make ith worth every cent. Unfortunately I use VTune a lot for work and there just isn’t any alternative, so I’m stuck with whatever Intel gives me. Which is doubly sad, because I end up optimizing for Intel’s microarchitecture and can only hope that it runs decent enough on AMDs CPUs.

On the GPU side though, at least with the modern APIs (Vulkan, D3D12 and Metal), AMD isn’t too far behind Nvidia. I actually prefer RenderDoc to Nvidias Nsight because I can capture multiple frames instead of pausing the application to look at one frame at a time. That being said though, the OpenGL tooling for AMD is abysmal and so is their OpenGL driver. All things being equal, just swapping an Nvidia card for an AMD card gives around 40-50% speed up when running OpenGL purely by getting rid of the driver overhead. Although with Vulkan AMD and Nvidia are actually on par, so at least that’s solving itself for us.

A whole lot of gamers, me included, have been buying up Ryzen 3000 series cpus so you might need a 2nd computer with one in there.

Good excuse to build another machine!

I feel like this is a safe assumption, but I assume you do most of your performance work on Windows? Since you would even focus on OpenGL at all, I have to assume that you at least spend some of your time developing on Linux.

Have you tried the AMD mesa drivers? The open source driver performance doesn't look terrible, and in my own experience, it tends to work far better for OpenGL than AMD's proprietary GPU drivers.


Our game actually ships on Windows, macOS and Linux (as well as the two dominant mobile OSs), so I dabble in all of those platforms. The Linux OpenGL performance is indeed better than on Windows, but unfortunately it makes up only a super small fraction of our userbase.

I should note that the OpenGL stack on macOS isn't better than the Windows one either. Probably not surprising anybody, given the lackluster support that Apple has for it. We are seeing similar improvements with Metal as we do for Vulkan.

Probably Windows, because on Linux you have the AMD mesa driver and perf and valgrind-callgrind and kcachegrind which are all pretty good. For OpenGL there are renderdoc and apitrace. Unfortunately, the rr reverse debugger does indeed not work with Ryzen because the performance counters are too fuzzy, as mentioned in another comment.

I have been working with performance tools on Linux for over 10 years and occasionally debugged graphics issues.

kcachegrind is okay, but VTune is a lot better.

Yes, Mesa/Gallium is much faster than AMDGPU-PRO and Windows driver junk.

Windows OpenGL driver is so bad that running it through ANGLE with DX backend is faster.

Also work in games and agree with the sentiment on VTune. I've also enjoyed using Razor, but that is limited to specific hardware as well.

I built a Threadripper 2 machine recently for home, but I don't know that it would have similar utility in a work environment where I could use something like Incredibuild to farm out compiles. Is that not an option for you in your work?

This post doesnt make much sense to me since both the PS4 and XB1 are based on AMD APUs.

Mobile is also not Intel.

The only case to left are the Wintel PCs - and even that just makes Intel look bad since AMD CPUs run the games just fine. For the actual gaming experience it basically amounts to no difference at all.

On the CPU tooling side of things, I was planning to buy into another Intel CPU for my workstation for the same reasons- until I ran into some 50%+ performance cliffs in Zen 1. (I suspect Zen 2 addresses some of these.)

I'll still need to keep some Intels around for VTune and optimizing deeply for AMD is going to be a chore... But given how the market is moving, I don't think I can avoid making AMD the primary development focus.

Are those performance cliffs about AVX or about very branchy code?

Ryzen < 2(!) has a much narrower AVX unit which could easily cause a 50% slowdown.

Branchy code like that of compilers tends to run somewhat slower on Ryzen, though closer to 15% than 50% (comparing AMD and Intel CPUs with otherwise same performance in mixed benchmarks). The latter really depends. Core and Ryzen have different branch predictors, both so complicated that they can't be fully described in documentation anymore. Ryzen 2's first level cache configuration also moved closer to Core's, presumably for software that has been fine-tuned for Core.

While the lack of full rate 256-wide operations was quite noticeable, I was expecting it- the cliff that caught me by surprise was something else: certain kinds of inter-CCX communication seem to be catastrophic for performance. The rest of this is partially speculation since I haven't had time to dig deeply into the problem.

Specifically, in both a physics engine solver and some machine learning backwards passes I was working on, there are areas where a little false sharing is likely. It's relatively uncommon and a negligible concern on the smaller Intel chips I tested on, but Zen cores will sometimes have to synchronize with another CCX's caches and it seems to come with a penalty vastly larger than naively expected latency or IF bandwidth.

A fully loaded 2950x ended up getting beat by a 1700x in the solver despite having virtually the same architecture, twice the memory bandwidth, and an observed clocks advantage. Cutting the used thread count down to the same as the 1700x helped a little, but it looked like Windows was scheduling the threads on as many CCXs as possible and it ended up still being slower.

On the upside, the 2950x blasts through friendlier workloads like collision detection and inference with perfectly reasonable scaling.

I'm hoping that the dramatic redesign in Zen 2's memory architecture (unified IO die and whatnot) will help things a little. If not, I'll probably have to rework some stuff.

Thanks for the reply. I didn't think of the split caches. It didn't make such a big difference in my own workloads or reviews I've read.

Wait for Google's game division to do something about it they have their AMD online gaming app coming up they must do something about it and hopefully will open source it.

AMD has uProf [1], which while maybe not at full parity, definitely does performance counters (along with energy and power profiling, call graphs, and the other basic things you'd expect).

I use ECC on a (consumer) Ryzen chip/board and edac-util seems to give me the same information that it does on Intel - what's missing?

One the CPU/platform side of things, my biggest annoyance is how far behind k10temp is (Zen2 support not in mainline until 5.4?), and how bad sensors support is in general on the boards (requiring reverse engineered non-mainline modules for my Zen/Zen2 workstations).

While I agree that on the GPGPU-front Nvidia is still ahead, I'm very happy these days on my workstations with the state of AMDGPU in the mainline kernels and much prefer AMD GPUs for my workstations now vs Nvidia cards, which has led me to pay a bit of attention to ROCm - it looks like they are making very steady progress and TF and PyTorch support seems pretty good at this point (also, stuff like MIVisionX/OpenVX, CenterNet, BERT support seem to all be working relatively painlessly [3]), although it'd be useful if anyone has a resource that does continual benchmarking comparing on-prem/cloud perf of the various platforms, it'd be nice to get good $/perf and W/perf numbers over time. (I assume that anything running with Nvidia's tensor cores still completely blows away what AMD has to offer atm).

[1] https://developer.amd.com/amd-uprof/

[2] https://github.com/RadeonOpenCompute/ROCm/commits/master/REA...

[3] http://blog.gpueater.com/en/

uProf is decent, but the instruction-based sampling approach is the only way to get pinpoint accuracy on instructions.

> I use ECC on a (consumer) Ryzen chip/board and edac-util seems to give me the same information that it does on Intel - what's missing?

Event-based sampling on Intel is accurate to an instruction-level (while event-based sampling on AMD is less accurate. You're forced to use the more complicated IBS metrics if you want instruction-level accuracy of events).

Intel also has branch-history data stored. Super useful for some developer tools, but I forget which tools those were...


I think AMD uProf is certainly usable. And the price is good (free). But Intel vTune is just light-years ahead.

AMD vs CUDA on the other hand is... closer than I think most people realize. CUDA has a bunch of libraries (Thrust, TensorFlow support, etc. etc.) which helps. But if you're doing high-performance coding, you'll likely have to write your own specialized data-structures. At least, that's the approach I'm doing with some GPU hobby code I'm writing.

TensorFlow (due to Tensorcores) and BLAS are solidly NVidia advantages. But general purpose libraries (ex: Thrust) is more of a convenience.

AMD's main disadvantage is documentation. But the tools are actually quite usable. AMD documents the lowest level well (the ISA), but their HIP / HCC / etc. etc. documents are lacking and difficult for beginners to follow.

AMD should work on updating their beginner guides (their OpenCL guides) to their ROCm framework. Even if its ROCm OpenCL 2.0 stuff, its important to get beginners to use their platform. Or at least, update their beginner guides to reference GPUs that have come out within the past 5 years...

> AMD software ecosystem is pretty bad.

Sadly also the third-party hardware ecosystem is (at the moment) pretty bad.

For example you can choose among almost 300 different motherboards with an Intel 1151v2 Socket [1]. They come in all possible sorts and combinations of form factor, chipsets, ports, etc. In comparison there are less than 100 motherboards with an AMD AM4 Socket [2], most of which in the large ATX form factor and with basic I/O ports.

Let's hope that all these third-party companies felt the change of the tide and are hard at work on new AMD-based products.

[1] https://geizhals.de/?cat=mbp4_1151v2 [2] https://geizhals.de/?cat=mbam4

What's the context for them firing off their Linux platform devs in ~2013?

I would say falling revenue would be the biggest driver here, in 2010 AMD had annual revenue of 6.49 Billion USD, by 2013 this number had fallen to 5.29 Billion. It continued to slide to a low of 3.99 Billion in 2015.

[0] - https://www.macrotrends.net/stocks/charts/AMD/amd/revenue

All true, though I'm optimistic about what SYCL can do for better ecosystem interop on the GPGPU side.

NVidia has no incentive to support SYCL properly, maybe they'll support it half-assedly like they do with OpenCL.

Well, the current implementations are on top of existing APIs, e.g. https://github.com/illuhad/hipSYCL so maybe everyone just keeps using these.

As an aside: why won't everyone migrate to Vulkan compute shaders? I hate these "special" compute stacks >_< Clearly I'm not alone in thinking this: Tencent's ncnn uses Vulkan as the only GPU option, some Googler is working on clspv (OpenCL to Vulkan SPIR-V compiler)..

> why won't everyone migrate to Vulkan compute shaders?

1. No libraries. cudnn, cublas, cufft, are all the fastest available (except maybe magma sometimes), plus no one writing an actual application wants to reinvent a fast gemm. Also cutlass, cub, thrust, ...

2. No c++. The "standard" seems to be glsl, and a "prototype" opencl c -> spir-v compiler doesn't give me much confidence in that approach.

3. No one wants to use vulkan apis directly, and there are approximately 5 billion different "vulkan compute" wrapper utility libraries. i.e. No consistent platform.

Right, Khronos intends for SYCL on GPUs to be backed by OpenCL.

Vulkan compute and OpenCL are not entirely compatible (both are backed by SPIR-V (but different flavors of it.)) Khronos has chosen to maintain OpenCL (and OpenCL-next) separate from Vulkan.

How about codexl?

Intel is definitely bigger than AMD, but it's not exactly an apples to apples comparison. I'd hazard a guess that quite a few of Intel's employees and R&D dollars are in areas that AMD isn't competing in (e.g. FPGAs, SSDs, IOT, and until recently modems)

Intel also does their own manufacturing and AMD does not.

I don't particularly like Intel, but that is Hardly a fair comparison.

TSMC 48K employees, and $13 - $15B R&D.

And even that is excluding all the ecosystem and tooling companies around TSMC. Compared to Intel which does it all by themselves. Not to mention Intel does way more than just CPU, GPU, also Memory, Network, Mobile, 5G, WiFi, FPGA, Storage Controller etc.

Intel sold off the 5G/Modem business to Apple recently:


Still surprises me the new Mac Pro isn’t AMD.

I guess Intel current handcuffs on customers is them owning Thunderbolt.

Thunderbolt is royalty-free; does AMD have any implementations available?


Intel did announce in 2017 they planned to release thunderbolt royalty free in the future; however, they still have not released it as of yet. And given their new competition from AMD - I doubt they plan to now anytime soon

It seems like it's not so binary and it's more like kinda royalty free: https://arstechnica.com/gadgets/2019/03/thunderbolt-3-become...

They gave it away to usb form but maybe they won't for anything AMD wants to do?

It's called "USB 4" now. And yeah, a few years away at best.

USB4 Final Spec 1.0 should be ready anytime soon. It was originally scheduled for mid 2019, but you can defiantly expect the draft before year end and product ready by next year.

There is at least one motherboard that I know of with Thunderbolt on it for AMD processors, the X570 Taichi. It does have some limitations though.

What are the limitations you're on about? I'm in the market for Ryzen 3900X + X570 + TB3.

Wake issues with some devices, external GPU solutions do not work.

Will external GPUs never work? As in, physically impossible? That'd be my main purpose.

Physically impossible, no, but external GPUs definitely work right now with Intel processors but even they have their limitations related to usability moreso than technical issues. I have one now with a 1080 Ti off of an Intel NUC and not being able to see my BIOS without switching to the iGPU can be cumbersome when troubleshooting. Once in a while when I reboot (this is very frequent actually because my eGPU's PSU's power limiter cuts out if I forget to throttle the power usage) the GPU doesn't come back unless I hard cycle a couple times.

The annoyances are such that I'm waiting for a 3900X to be available in-store so I can build my full workstation finally after sitting on a NUC + eGPU setup for over a year.

I don't want to buy Intel CPUs, nor Nvidia GPUs. Exception being an Nvidia Shield.

You'll be going for 3900X + X570 Taichi? Or another board? Ideally I'd also want ECC RAM.

I got a 3900X and the Asus X570 workstation board to make sure I'd be able to support ECC RAM within basically a couple hours after I made that post. My needs are video encoding and virtualization setups needing lots of cores so this makes a lot more sense for me than any comparable Intel setup for my money. I'm slightly cheaping out by not going with ECC RAM but in the era of catering to overclocking gamers over professionals I'd rather pay more for a workstation board.

Does the X570 support ECC RAM?

ECC support is built into the IMC (on the CPU) so de facto, every single Zen chip (except the non-PRO APUs) usually have support, but official support is still on a per motherboard basis (and you'd probably want to check the memory QVL as well) if it's important for you (it was for me) so it's best to check each board model's specs.

The Asrock X570 board mentioned definitely has support, as do the other high-end X570s I looked at like the Aorus Master, Asus C8H, Pro WS, etc.

Based on X570 BIOS updates so far (and their overkill VRMs this gen), I think the Gigabyte/Aorus boards would be my pick atm (I went w/ a C8H and I'm somewhat unhappy w/ the CPU/memory voltage wonkiness when tweak, but it does give me 30 IOMMU groups, so I'm able to do GPU passthrough via VFIO to a Windows VM w/o any issues).

Thanks for the information. Does GPU passthrough to Windows allow high-end gaming from Linux (running a windows VM)? What specs would I need to look for in a motherboard to support that?

Yes, although it's not entirely straightforward and there are many sharp edges, so it's maybe not for the faint of heart. Since I just did it, here's a guide I wrote up with my specific hardware and some of the issues I've encountered to set up my system specifically for VR passthrough: https://forum.level1techs.com/t/zen2-ryzen-3700x-and-x570-as...

Note: I haven't dealt with sound yet, which is another potential sticking point if you're doing gaming (most people seem to use a separate sound device, although I've seen some people use network sound when running into issues; HDMI sound I assume would work fine if you're doing GPU pass through) since I was just focused on getting my VR HMD working, which has it's own sound output already.

IOMMU groups are defined by AGESA and the X570 boards in general seem to have much finer grained groupings vs earlier AMD chips/boards (although people seem to have done ACS workarounds). My recommendation is to find something that someone has gotten working already and just follow along (the VFIO subgroup at level1techs and r/VFIO on reddit seem to be the best resources).

Yes, that works. The motherboard doesn't need to do anything special so any should work, unless they did something silly like forget to put the IOV toggle in BIOS.


- Anti-cheat systems used in multiplayer games generally don't like running in VMs. - Consumer Nvidia cards don't get reset on VM reboot, you need to reboot the machine. - IIRC Nvidia cards need a KVM hack to get the driver to work.

Actually, it's AMD Vega cards that have a PCI reset issue that doesn't work well with VM cycling. There's a new kernel patch just the other week that fixes this: https://forum.level1techs.com/t/vega-10-and-12-reset-applica...

Nvidia cards do require a config workaround for VM detection but the workaround is relatively straightforward (and works w/o further mucking).

>Still surprises me the new Mac Pro isn’t AMD.

Yes, I was rather hoping for some surprise, with 2 Socket, 128 Core Max Mac Pro. I guess Apple has to keep Intel happy until they get their hands on Intel's modem unit.

However,don't Intel and Nvidia both manufacture their chips, whereas AMD design them and licence the designs. Which I assumes at least partly accounts for the lower headcount?

AMD and Nvidia are both fabless manufacturers. They design but do not fabricate the chips. AMD used to, but spun out their fabs as GlobalFoundries. As you correctly noted, Intel is a foundry and does manufacture it's own chips.

No- nvidia uses TSMC as far as I can tell.

They are switching to Samsung Foundry Soon.

AMD's graphics Linux software is really inadequate compared to Nvidia.

Nvidias Linux drivers are buggy and awful and regularly break systems. My bf switched to an AMD card and the experience has been much better. It all just works since the drivers are mainlined.

This choice is likely to backfire on Google and Twitter. Do you recall the major concern about prospective execution vulnerabilities about 2 years ago? Windows was able to release a patch for systems using Intel chips in about a day, AMD chips never got fully patched at the time because AMD doesn't properly document their chipset driver APIs.

The honest truth is that AMD is unreliable and their performance specs are only higher currently because they haven't fixed serious security vulnerabilities that would result in reduced multi-processing efficiency.

I hope this puts AMD in stable footing for many years to come. Having newer and faster chips is great, but whats even better is long term competition in this space.

"...even better is long term competition in this space."

I hope that's the biggest outcome from this. My inside sources tell me that Intel has their briefs in a bind over AMDs latest technology, and are certainly kerfuffling over it, but I'm not sure how quickly they can respond. It seems like the disparity in this cycle has grown wider than in the past. But that is just my subjective take - I'm not really conversant on the manufacturing part. I'm just historically reviewing the power consumption and benchmarks.

Especially as the top dual socket Epyc BEATS the top dual socket Xeon in a workload heavily optimised for AVX-512 and using the Intel ICC compiler.

True, it has more cores, but it seems Xeon's have to clock down when executing AVX-512 ops to stay within their power budget.

It's also doing this at almost half the cost and consuming less power ... that's nothing short of astounding.

Xeon's do have a benefit for workloads that can stay in L3 cache as their mesh means latency is stable, whereas Epyc is 16MB per CCX.

Overall memory latency is also lower, which is the tradeoff with the IO die, but it will be interesting to see what real world numbers for DB's etc. say about that.

I don't have exact performance numbers handy and haven't measured it personally but... from what I understand:

* With heavy avx instructions churning, AVX should still be a net positive performance-wise even when clocking down.

* It's when you're running AVX mixed with non-vector code that you can see the performance effects of clocking down, since you can drop way down from Turbo on non-vector instructions. Here are the different ranges [1]

[1] - Pages 14-21 https://www.intel.com/content/dam/www/public/us/en/documents...

From the Anandtech benchmark it was made clear that there is more than AVX-512 vs. clockspeed going on. Apparently AMD deemed implementing AVX-512 to be less desirable than spending the same silicon area on more cores.

So there's both AVX-512 vs. clockspeed & AVX-512 vs. silicon budget going on. For many usecases having extra cores is a good trade-off.

There’s a hit that comes from “frequency scaling” on avx512 and avx2 instructions on Intel (worse for 512), so a total of fewer instructions isn’t always worth it. IIRC, AMD doesn’t pay a cost for avx2, but I don’t know how it works.

They pay a cost for it in sense of having to clock lower if they run a lot of AVX code, but they advertise their AVX2 clock as their base clock, and more importantly, they can change clock at a much finer granularity and only have to change after a latency period.

The problem with the AVX2/AVX512 clocks with Intel is not the fact that they must clock lower to use them (for pure AVX code, running wider but at the lower clock speed is still worth it!), it's that they need to clock lower pre-emptively for any such instructions, and must remain at this lower clock for a while. This means that code that executes a few AVX instructions every now and then mixed in with a lot of integer code needs to run the whole program at a lower clocks.

In contrast, AMD runs their chip power supply from a huge mimcap built in the chip, meaning that they have margin so they can start executing exceptionally power-hungry instructions, and only clock down reactively if it's actually needed. And also do the clocking down and up with a much finer granularity, clocking up immediately after the need to clock down passes.

Ryzen also uses clock stretching where single(?) clock cycles can be lengthened on demand (power transients), which allows running stable at otherwise marginal clock speeds. I suspect that that is part of the reason why Ryzens have so little overclocking headroom - there are tiny safety margins in their clock frequency.

The rest of you folks make me feel dumb - but nice posts, I'm doing a lot of Googling now...

> but I'm not sure how quickly they can respond

AMD won't be in the position to just rest on their laurels for years to come. So the fact that Intel is scrambling to come up with a competitive response shouldn't be a concern as far as competition goes in the short-medium term.

I'd say we have at least a few CPU generations coming from AMD without worrying that they become complacent, even if Intel still comes up short.

Opteron was crushing it back in the day. I think Tech is just cyclical. It seemed like AMD was in a lull for a while which basically made intel not really compete. Hopefully this forces Intel to step its game up. Competition is always good for the consumer.

Back during the Opteron era, their chips had good compute, but higher heat output and power consumption. I use to run dual AthlonXPs on their older dual-MP boards back in University. It's crazy we now have 8~16 cores on a single board; on consumer (or I guess prosumer/developer) targeted hardware.

In 2016, AMD was near bankruptcy, after many years of mediocre tech and bad decisions. At that time it looked like they were never going to compete with Intel again in any meaningful way, didn't have the resources to pull ahead, had a negative balance sheet, and the best hope was that they could survive on some cheaper x86 niches, on the grace of people wanting some x86 competition to exist.

Would it have been a lull if all of intels flaws and security issues were known at the time? Those go back many years but weren't discovered til fairly recently and AMD didn't have most of them.

Not all of them were known back then obviously, but there were definitely known timing-info-leak flaws with Pentium 4 hyper threading[1], its remarkable on that basis that the modern class of execution timing exploits has taken so long to arise, considering their approach and concept really isn't fundamentally any different.

[1] http://www.daemonology.net/papers/htt.pdf by none other than our resident cpercival!

Was AMD in a lull or just making longterm bets on technology?

I've only followed casually in the last decade+ but I was under the impression that the ATI merger was meant to support a long term bet on better chips by AMD and we're starting to see the fruit of that labor.

They hired Jim Keller to design the Zen architecture around 2012, and he left in 2015. Many people point to him for their current success. He joined Intel in 2018.

Afaik Keller mostly worked on K12, the so for unreleased ARM cpu. Mike Clark was the chief architect for Zen.

Is AMD getting ahead of Intel with a superior architecture, or due to the fact that its running on 7nm instead of Intel's 14nm?

Part chip design, part 7nm being unexpectedly good.

But the chiplet architecture, supported by fast Infinity Fabric interconnect is what puts AMD ahead. Intel will have to go same route to stay competitive. And now they are two steps behind.

Chiplets were for sure an amazing bet AMD made that’s paid off big time.

The smaller dies and modular setup even on the same die gives them so much flexibility in their binning and package integration and must keep defect rates much lower than Intel’s monolithic high core count chips.

And the 14nm I/O die means they can still keep buying chips from GloFo, whom they still have a contract with as part of the spinoff from a decade ago.

> But the chiplet architecture, supported by fast Infinity Fabric interconnect is what puts AMD ahead.

Exactly this. Funny Intel is no longer mocking this as dies "glued together".

Intel has some very impressive chiplet technologies in house, aka EMIB. It's surprising they haven't used it for anything but that weird 8809G yet, but I'm sure it'll be used in more products next year.

As a layperson, I never understood that merger. How does buying a graphics card company make for better chips in general?

Some advantages:

* Integrated graphics in their CPUs is important for lower-end systems and APU/SoCs. Intel had an integrated GPU at the time, AMD didn't.

* It allowed them to provide the CPU+GPU for all Xbox and Playstation consoles since the acquisition

* It might have allowed them better deals with chip factories because their volume increased

It was a bad bet on netbooks that didn't fully anticipate the dominance of smartphones.

Their low-end Ryzen APUs can game better than low-end Intel chips with their IGP.

At the time it seemed that GPUs were becoming developer friendly and productivity might approach CPU languages, but then it all plateaued and GPU languages & OS integration stayed clunky. And fragmented.

Also, most computers come with integrated graphics today. It's the no brainer choice for most uses.

Integrating graphics card into their chips is what Intel seems to want to do as well.

So it would appear AMD was ahead of the game.

I guess, at the time, the thinking was that GPUs were a core threat to the CPU business itself, since a lot of the buzz was around using the GPU to do CPU activities.

Yes, there was this thing called HSA(Heterogeneous System Architecture). Does anyone remember that now?

The PS4 is an example of that. It's an APU with GDDR5 acting as both system and video memory.

To be honest for a while I thought that was where everything but servers would be going. You can tune your OS and applications around the slower GDDR5 and it seemed a no brainer: instead of 8GB of system memory topping out and then 4GB of VRAM doing jack shit you could have 12GB that could be dynamically assigned to whatever you wanted!

Alas, that dream was never meant to be.

Edit: I posted about this a while a back and apparently Arch Linux has a hacky way to use your VRAM as system RAM. Good stuff.

Zen in many ways is still moving towards that model, The concept of an IO die that all cores hook into, along with all of PCI-E lanes, no need for complicated cross core linking. It would be relatively "easy" to introduce specialized accelerator cores or beefy GPU chips into this mix on a separate chip, and not massively complicate the CPU design, or massively bloat its cost as a specialist die.

Yes, parts of it now present in IOMMU(v2)

Intel has such a poor track record in recent years of expanding into new markets and hasn’t had to compete in their core market for so long that it’s questionable if they can.

we just got done saying that about AMD, so..

> but whats even better is long term competition in this space.

One thing missing from discussions as the results of Zen 2 EPYC is ARM on Server is now pretty much dead. I don't think any HyperScaler really want ARM on server per se, they simply want better pricing. And having AMD competing with Intel will provide just that.

I don't see ARM Server being a viable alternative in the next 5 years.

Looks like Intel's 10nm is still failing spectacularly, so I can totally see the 7nm Marvell ThunderX3 being EPYC Rome's real competition :P

Ampere has a chance too, if they get big gains from 7nm and/or massively improve their microarchitecture while keeping the current prices, they'll be unbeatable in price/performance.

> I don't think any HyperScaler really want ARM on server per se

There definitely seems to be a concern with the AMD/Intel duopoly, and Amazon clearly just wants absolute control, since they've already deployed in-house chips that mix ARM cores with <s>Bezos Backdoors</s> fancy cloudy networking and verified boot stuff.

5/7 use AMD now ( not facebook and Alibaba).

Seems AMD is back

why Rome is very impressive, I wouldn't cross ARM from the race just yet.

The historical context here is that AMD once had a monopoly inside Google's datacenters and pissed it away by shipping the horribly broken Barcelona followed by the not very broken, but also not very fast, Istanbul. This is a return to form for them, after a decade of poor form. The important thing for an operator like Google or Amazon is pricing power. As long as they can brandish a competing platform under the nose of Intel sales reps, they can get a better deal from Intel regardless of which they really prefer. You may have noted that a few years ago Google was showing a POWER platform at trade shows. That has the same purpose of putting Intel (and AMD) on notice that they have the capability to port their whole world to a different architecture if needed.

Sorry it is just not true that Google was AMD exclusive. During that time Google had a I/A release cycle where every other hardware platform cycled between Intel and AMD. I will give you that AMD had a huge issue launching and I still have my "Argo" bag that I got for helping with the hardware qualification trials, but Google was no way AMD exclusive before that.

You're right. There were also not a lot of AMD boards at first, to the point that services had to prove "Opteron worthiness". Unless your load tests showed improvements of 70% or faster, presumably what the most important products were seeing, you couldn't run on them. Those were the days. The next generation Intel systems weren't a great improvement, but they still ended up being built in large numbers.

> Google had a I/A release cycle where every other hardware platform cycled between Intel and AMD.

Was the I/A cycle intended just to keep Intel and AMD on their toes, knowing that Google's data centers and software seamlessly supported both platforms? Was there any technical benefit, other than ensuring Google's software was portable?

There was a delivery benefit. At the time we were one of the largest server manufactures by volume. Generally we were on the same scale as Dell or HP. Buying from only a single vendor had serious cost implications sure, but there was also a pure volume issues. If Intel can't get us 50k of a given chip fast enough we can fill capacity needs with an AMD platform instead. That was the goal at least.

AMD is not just competitive, it is better than Intel. Thus Google should adopt it and role it out, and faster than any other cloud provider. This will win them customers. I want 256 threads per machine at competitive prices.

AWS has already rolled out EPYC instances.


Those are the previous generation, Zen 1 EPYC CPUs, whichn were rolled out on AWS back in November.

Don't forget power consumption. Electricity costs are probably just as big as a factor as Cperformance when it comes to the number of computer Google has.

IBM quoted them as willing to switch to POWER if they could save 10% in energy costs

AWS has AMD machines. Amazon claims they perform worse than the Intel machines they use, and they are priced lower as a result.

T3 are intel, newer than the T2, and perform worse than the T2. T3a are AMD and perform on par/slightly better than T3 for less cost. (From my own testing. Not a claim I can backup this is just my observation)

I thought they were cheaper due to the better energy efficiency. Less electricity means less cooling required, double whammy!

Where did you hear that? We're running a handful of m5a instances with fantastic performance. I figured they were priced lower because they're cheaper to purchase and operate.

Depends on the use case. Intel is still top for many games and some apps like Excel/Photoshop.


Very few developers are prepared to write code that can efficiently use 256 threads / machine. At that level, cache coherency becomes a real and non-trivial problem.

In most cases, I suspect developers will see improved wall-clock times with substantively worse FLOPS/watt. Good for developers, bad for data-centers.

«Very few developers are prepared to write code that can efficiently use 256 threads / machine»

This junk justification has no longer been relevant for years. Most developers don't care because (1) they rely on core applications that are already multi-threaded (web servers, SQL engines, transcoding, etc), or (2) in today's age of containers, VMs, etc, it doesn't matter to them. Now we scale by adding more containers and VMs per physical machine. Bottom line, data centers always need more cores/threads per machine.

Correct, if you partition a 256 core machine into 32 virtual 8-core machines partitioned by their NUMA architecture - you are relatively unaffected by core count (minus the consequence of some scheduling algorithms not tuned for N > 8).

Unsure what the percentage of VM's that use no time sharing or oversubscription is though.

Most devs I know are creating async workloads which don't require cache coherency, as they use parallelism to parallel process separate requests and workloads. I can see things being pretty linear in that sort of space.

They are not linear unless all requests take an identical amount of time OR the system is not oversubscribed (common in many workloads) - and even then, the current linux CFS scheduler has a complexity of `O(log N)`.

When you have variable length requests, you will find cores will not always be balanced, it is simply a statistical reality. And in those cases, the kernel will have to migrate your process to a different core, and if you have 256 cores, that core might be really far away.

Except that they are typically not. The Zen architectures are NUMA and controlling where memory is allocated is key to decent threaded performance. You may even have to do seemingly counterintuitive things like duplicating central data structures across nodes and other tricks from the distributed systems playbook.

Epyc 2's memory layout is not like Epyc 1. Epyc 2 is very simple.

Yup everything is equally slow now. Kinda sad, but the original NUMA design was treated as a glass half empty situation instead of AMD letting people maximize performance. This change lets them avoid the bad press and everyone is happier despite the final design being slower than it could have been.

Epyc 2 has different memory latencies within and across NUMA nodes according to the infirmation I have. So it is not equally slow for all memory. Can you point me to a source that says otherwise?

Edit: my source is this German article: https://www.heise.de/newsticker/meldung/AMD-Server-CPUs-Epyc...

See the architecture diagram here: https://www.anandtech.com/show/14694/amd-rome-epyc-2nd-gen/2

Everything goes through the central crossbar on the I/O die, where Zen1 had memory attached directly to each CPU chiplet which would relay as necessary. On Zen1 if you accessed direct attached memory you wouldn't pay the latency penalty from relaying the data. In Zen2 all data is relayed via the I/O die with the associated delay that entails.

I did some more digging. It seems like the Linux NUMA topology shown in the anandtech article is a deliberate lie. There are different latencies between cores and memory comtrollers on the same socket, but these are deemed to be insignificant enough to not expose them in the reported NUMA topology.

That is true with Intel chips as well. In the HFT space people actively work with Intel to determine which cores they should pin tasks to.

The speed of light is constant, and some cores will always be a little closer to various resources.

That was true before Skylake, but is no longer true since they moved away from the multi ring architecture.

Even with the mesh the number of hops is variable based on which core is requesting and the physical geometry of the chip. The cores right beside the IMC will have the lowest latency. See this diagram: https://en.wikichip.org/wiki/intel/mesh_interconnect_archite...

The main improvement is the max number of hops is log(n) instead N/2.

Epyc 1 was NUMA within the socket while Epyc 2 is officially UMA within the socket (although not really). Unfortunately Epyc memory latency is much higher than Intel so it's fair to call it uniformly slow.

Yeah, I actually was not so happy with the benchmarks because the memory access latency is not all that good... for most of the workloads that I care about, I don't know that the Epyc will be faster than a Xeon.

I suspect cache coherency doesn't mean what you think it means. It's a hardware feature.

But yes, writing correct and performant highly parallel code is difficult & error prone, often prohibitively so.

Must be strange to work on a huge technical project knowing your work will likely never be used and is there primarily to put pressure on someone else to lower their prices.

I guess you get paid, and can think of it like a hobby project for your own technical chops. But still.

This is my favorite kind of work, because there's no chance you'll ever get called by an irate customer.

Been there, done that.

"Upper management want us to be able to offload burst capacity to AWS, MS, Google or other public provider, do what you can to make it work but I reckon in-house can beat them on pricing"

Six months later -

"Congrats, good work! We showed them, we're getting a new data centre!"

A lot of people love working on more esoteric technical things and get more satisfaction from the intellectual component than its direct utility (I certainly feel this way about some things, though trans-architectural portability is not one of them). I would imagine this type of person is better-represented in this sort of field.

In addition, this kind of work does indirectly help keep Intel competitors viable, which helps keep Intel in check for everyone. Stuff like that is pretty exciting in its own way.

It’s an interesting concept to me. I do have a lot of “hobby work”, which is still meant to be used eventually. I just don’t apply a timeline, which enables me to focus on correctness.

Then I have my professional work, where the timeline is the primary focus, and correctness can only be pursued where it moves the timeline forward.

These projects you’re talking about are an interesting mixture. There is still a timeline that must be hit, because you need to do your demos, and you need to be ready to shift to a production timeline if negotiations go south. But since there are no customers the business model isn’t changing. And you don’t need to do any polish. So you can stay focused on the raw architectural problems.

It’s like my hobby projects in that you can focus on readiness over completeness, but there still is some timeline pressure.

Interesting to think about. Strange for me, but I guess it’s every day for others!

Just imagine how nice a project is when you'd don't have to worry about supporting it for years.

Projects exist to expand revenue or cut costs. In my experience the revenue expansion ones are far more likely to “never be used”

Tons of neat stuff gets built and then not used for a number of reasons (political, pricing, scaling, etc). It doesn't meant building it isn't worth it though.

Your right about pricing power. I think the net benefit from new competitive AMD chips will be to force Intel to adjust its premium pricing. Personally when I'm looking for personal needs or pricing out a build for use in work (basically not quite "big" data, but data about as big as can be done on a single high-end workstation) I don't really care about brand. I care about cost-performance factors, and component compatibility. I'll happily choose AMD if they're a 20% discount over Intel

I think STH's writeup does a particularly good analysis here: https://www.servethehome.com/amd-epyc-7002-series-rome-deliv...

The second is important. Customers need to adopt AMD EPYC. To our readers, it is important when you get a quote to at minimum quote an AMD EPYC alternative on every order. More important, follow through and buy ones where Intel is not competitive. If AMD EPYC 7002, with a massive core count, memory bandwidth, PCIe generation and lane count, power consumption, and pricing advantage cannot take significant share, we are basically done. If AMD does not gain enormous share with this much of a lead, and easy compatibility, Intel officially has a monopoly on the market and companies like Ampere and Marvell should shut down their Arm projects. If AMD does not gain significant share, there is no merit to having a wholistically better product than Intel.

As for bettering cost-performance, the full review gives plenty of context that the new Epyc 2's soundly beat out the current Intel Xeon lineup (often by 2X or more), but I think AMD is also doing what they need to do get marketshare (while still raising their ASPs):

When it comes to the top-bin SKUs, the value proposition is simple, just get a higher-end SKU and consolidate more servers to save money. AMD is extracting value for the higher-core count SKUs. For AMD a chip with 64-cores, 256MB L3 cache, 128x PCIe Gen4 lanes at just under $7000 compares favorably when its nearest Intel Xeon competitors are two Intel Xeon Platinum 8280M SKUs (M for the higher-memory capacity) that run just over $13,000 each. AMD at around $7000 is essentially saying Intel needs to start their discounting at 73% to get competitive, and that is not taking into account using fewer servers.

On the AMD EPYC 7702P side, AMD is calling Intel that if it wants to be performance competitive, it needs to discount two Platinum 8280M’s by 83% plus the incremental cost of a dual-socket server versus a single-socket server. This is a big deal.

What was horribly broken in these Opterons?

The Barcelona chips initially had a pretty nasty bug in the TLB. AMD stopped shipments for about 5 months so they could put out a new stepping with the bug fixed. The Istanbul chips arrived a few months after Intel's Nehalem, which is where Intel caught up with features like the on-die memory controller and started roughly a decade of unchallenged performance lead.

The TLB bug (Errata 298, doc 41322 if you really care - while the processor was attempting to set the A/D bits in a page table entry, an L2->L3 eviction of that PTE could occur) was one of a great many things wrong with that chip.

* A number of errata (not just 298) delayed full production, sapped performance, or negatively impacted idle power. Take a look at doc 41322, DR-BA step for many samples.

* It was late and didn't achieve performance targets; it missed clock rate targets and 2 MiB L3 was insufficient.

* Intel delivered a very compelling server part (Nehalem) during the lifecycle of family 10h.

How do you measure performance per watt? FLOPS/watt? I don't think FLOPS is worthy measure of chip performance, since it doesn't take the L1/L2/L3 cache size into account.

Are there performance benchmarks that are designed to measure server application performance?

There are plenty of server performance benchmarks and even one for power efficiency: https://www.spec.org/power_ssj2008/

Google probably has a whole team internally to benchmark their own applications on different hardware.

The article is missing the largest elephant in the room for datacenter cpu's: the security issues Intel chips have had latent for the last decade, and the constant patching and bios updates (and re-updates as researchers discover new attacks) that are needed in order to make datacenter use sane.

Intel really messed up, and has no one to blame but themselves.

I don't disagree generally.

But anytime there is renewed competition I wonder if the new competitors are better security wise, or just haven't been tested much yet security wise.

I hope AMD is doing better, but I'm not sure how to tell just yet. Things like the speculative execution problems seem to be a general issues inherent to speculative execution, so if AMD is doing it and they become a bigger target I would expect new issues to arise.

There are issues both in general regarding speculative execution, and also Intel Specific errors in their implementation of speculative execution.

The combined issues of these two aspects of the mistake in the theory of speculative execution as well as the implementation is what makes it so bad for Intel vs any one else on the market.

Does anyone know of any benchmarks that go through the full impacts for all of Intel’s hardware-level security issues?

I remember seeing the Linux kernel devs discussing some massive 10+% performance hits back around Meltdown/Spectre patch time, and I’m now wondering what the final impact has been.

10 percent? My teams RDS cpu usage spiked up by 40 percent after the initial patch [0].

[0]: https://m.imgur.com/a/khGxU

Epic Games saw their CPU usage double after applying the meltdown patch. Things have gotten quite a bit slower since then. Turns out flushing every time you switch into kernel space carries a massive impact if you are realtime with lots of network calls.


Oh wow. That's a lot worse than I'd expected. Intel really royally screwed up, especially since so much of their current optimization relies on the same hardware functionality that makes these side-channel attacks possible.

What is RDS short for?

Amazon web services relational database service: https://aws.amazon.com/rds/

I don't know about full pre and post benchmarks but there is the latest benchmark from Phoronix which benchmarks the latest Spectre "SWAPGS" mitigations [0].

[0]: https://www.phoronix.com/scan.php?page=article&item=swapgs-s...

The impacts really depend on your use case. The mitigations make each context switch have a much higher fixed cost.

If you're doing something that's syscall heavy, you're going to see a big negative difference. If it's something that's CPU heavy without making many syscalls, you're not going to see a lot of difference.

How are these relevant in a datacenter context? If you are the sole tenant of a machine then you will disable all the mitigations.

Whether or not you consider it an important part of insider threat prevention depends on too many other factors (i.e., have you already prevented other, easier avenues of attack) to generalize, but it's not unreasonable to want isolation of jobs even within a single-tenant datacenter. You may also do things like run external code in sandboxes, and you'd like that sandboxing to be safe and effective.

You're going to make someone's life very special by having them deploy to a completely different environment than their secure workstation or build server and telling them that the security mitigations(or lack of) are causing an issue.

Google is not the sole tenant. It's running a cloud.

Sure, on a handful of their machines. But they have a lot of private and dedicated hosts, too.

Yes, but even internally, separation between departments in a company is extremely important.

Google cannot have a breach between the services running gmail and the services running adwords for example, even if those are running in the same server on an internal cloud that has strict permissions being enforced by software.

This is especially even more relevant in any kind of datacenter application, even if the company is the sole tenant, because they are working with Client Data - which is data that does not belong to Google, but to their customers.

"Google's Borg system is a cluster manager that runs hundreds of thousands of jobs, from many thousands of different applications, across a number of clusters each with up to tens of thousands of machines. It achieves high utilization by combining admission control, efficient task-packing, over-commitment, and machine sharing with process-level performance isolation" [0].

All of their processes run together on the same machines so you wouldn't want to risk one compromised process access data on possibly any other process.

[0]: https://ai.google/research/pubs/pub43438

This is a big win for AMD and for me it reconfirms that their strategy of pushing into the mainstream features that Intel is trying to hold hostage for the "high end" is a good one. Back when AMD first introduced the 64 bit extensions to the x86 architecture and directly challenged Intel who was selling 64 bits as a "high end" feature in their Itanium line, it was a place where Intel was unwilling to go (commoditizing 64 bit processors)

That proved pretty successful for them. Now they have done it again by commoditizing "high core count" processors.

Each time they do this I wonder if Intel will ever learn that you can't "get away" with selling something for a lot of money that can be made more cheaply forever. Processors are not a veblen good.

> That proved pretty successful for them. Now they have done it again by commoditizing "high core count" processors.

They've done much more than that. Intel's current server CPU lineup is tightly siloed into different segments to limit every feature that some customers would pay more for into it's own line priced to match. That's why they currently have Xeon Scalable {Bronze, Silver, Gold, Platinum}, Xeon {D, W, E} lines, with 402 different Xeon CPUs actively being sold.

In contrast, AMD has two EPYC lines, P and non-P, only differing in that P is for 1-socket servers. The models in these lines differ in that they have more/less cores and different clocks, all those features that Intel gates and segments by, are found in every AMD CPU.

> Processors are not a veblen good

Huh, TIL


"Veblen goods are types of luxury goods for which the quantity demanded increases as the price increases"

> Each time they do this I wonder if Intel will ever learn that you can't "get away" with selling something for a lot of money that can be made more cheaply forever

I think the keyword is "forever". They know they can get away with that for a long time though. Because they historically have.

> new Intel chip features for machine learning tasks and new Intel memory technology being with customers such as German software firm SAP SE (SAPG.DE) could give Intel an advantage in those areas.

I hope AMD turns their attention to machine learning tasks soon not just against Intel but NVIDIA also. The new Titan RTX GPUs with their extra memory and Nvlink allow for some really awesome tricks to speed up training dramatically but they nerfed it by only selling without a blower style fan making it useless for multi-GPU setups. So the only option is to get Titan RTX rebranded as a Quadro RTX 6000 with a blower style fan for $2,000 markup. $2000 for a fan.

The only way to stop things like this will be competition in the space.

Their GPUs already have really good performance for machine learning tasks, but it doesn't seem like there's a whole lot more they can do about people overwhelmingly using CUDA. Their ROCm software stack has gotten reasonably good but getting developers to buy into is hard with how much inertia NVIDIA/CUDA has.

ROCm is supported natively in TF 2.0, so they will benefit from that.

Is it? TF2.0 Alpha and Beta release notes don't mention anything about it, and the official docs at https://www.tensorflow.org/install/pip are quite explicit that the tensorflow-gpu supports only CUDA cards.

The last time I looked at ROCm (which was something like a year ago) it was "supported" as in "there's reports on the internet that someone got it to work" but when I tried, I couldn't get it to work and it really wasn't worth the effort. If I'd dig out a machine with an AMD GPU right now, can I get it working (like, train MNIST or some other helloworld'y system) within an hour, and is there documentation available on how exactly that should be done?

Why cant AMD make their gpus to be CUDA compatible?

I'd say it's likely because NVIDA's CUDA compiler produces code for NVIDIA GPUs, and AMD would have a lot of (questionably legal) reverse engineering ahead of them in order to support the same code on their own GPUs.

If you're asking why AMD doesn't make a compiler for CUDA source code that targets their own GPUs--that's basically what ROCm currently does. They're pushing their CUDA alternative, called "HIP", which is essentially just CUDA code with a find-and-replace of "cuda" with "hip". (And other similarly minor changes.) They have an open source "hipify" program that does this automatically (https://github.com/ROCm-Developer-Tools/HIP/tree/master/hipi...).

So, basically, AMD GPUs are already sort of CUDA compatible: just run your CUDA code through hipify, then compile it using the HIP compiler, and run it on a ROCm-supported system (which, for now is the most spotty of all of these steps IMO).

If I remember correctly, CUDA is pretty tightly tied to Nvidia’s GPU architecture, so it would be very difficult to get the same things to run on AMD chips without them making huge changes to conform to Nvidia’s way of doing things.

I think there have been a few projects that try to translate CUDA stuff into OpenCL or other AMD-compatible compute platforms.

>So the only option is to get Titan RTX rebranded as a Quadro RTX 6000 with a blower style fan for $2,000 markup. $2000 for a fan

for $2000, surely you can install an aftermarket cooler or watercooling loop?

If I was doing a personal build in a tower I would probably look at a watercooling loop. The build is 8x GPU rackmount for a small business though and I have not found an aftermarket blower style cooler anywhere. I have found a discussion where someone brought a rackmount case to a machine shop and had custom airflow guides put in between the cards and a hole cut in the top of the case to direct airflow out of since the backplate prevents normal airflow [1].

[1]: https://forum.level1techs.com/t/cooling-8x-titan-rtx-in-a-se...

I am sure it is aimed at Intel fanboys if you know what I mean :). The people who live by Intel as a brand no matter what and will defend its superiority regardless of facts. And yes for 2000$ one could probably build a liquid cooled super rig. I for one am happy there is finally some competition.

NVIDIA won't ever come out with a blower style Titan again. Look at the changes they made to their EULA for the driver stack as it explicitly states that it is not to be a data center product. Which is a shame really.

It kind of amazes me that there can be a EULA for software that is absolutely required in order to use physical hardware that you’ve purchased. Guess that’s the world we live in now...

Mainframe customers face more onerous licensing terms from their vendors.

I mean would it make you feel any better if they just put the requirement on the hardware itself?

I mean people seem to get non-commercial software but why is non-commercial hardware weird?

I mean it makes sense if they just choose not to sell the cards they don’t want in datacenters to people who will use them that way.

What weirds me out is that you could have purchased the card, eventually decide to use it in a way that goes against their EULA, and they could just decide to take away your access to the software required to use the hardware.

I guess it’s more or less the same feeling as certain people have about Windows licensing or buying games on certain platforms. You’re buying a license to use whatever it is rather than actually buying the thing, and this kind of feels like an extension of that to hardware.

It doesn't stop the 'raid the local Microcenter' problem though -- which I was guilty of at my last job. As it turns out people realized that at scale it was actually cheaper to just buy out big-box stores and play the silicon lottery with GeForce cards than buy Nvidia's high-end cards.

Now look, Nvidia wants to make more money, let's not pretend that there's really any other primary motivation here. And segmenting the people who make money with their cards and the people who use them for entertainment is a pretty solid way to do that. However, the secondary reason for this is that on the consumer side people complained that stores were constantly sold out of new cards from large businesses buying them all up. Some stores implemented rationing schemes but the 'final solution' it seems is to just stick a line in the license that says you can't stick these cards in your DC.

OTOH, Intel found ways to segment without attempting insane things like "oh but I don't allow you to put a Core i5 in a datacenter".

A kind of clause which might very well be void in tons of jurisdictions, by the way.

Because software you license, hardware you buy.

I get the sense that is more "you need to pay for data center usage". The GPU manufacturers used to make a ton of money having special drivers for industry. That used to be for CAD, but over the years they lost that cash cow, and now they want it back for ML volume customers.

SAP is a steaming pile of garbage and needs to die. Any company that has SAP and isn't in the midst of getting off of it needs to fire their CTO.

SAP is a prime example of how attempting to be all things at once can make a program ridiculously unusable.

Beating Intel using their own codec! That's gotta hurt.

Looks like the dual socket systems don't really fare well in that benchmark. Either way, it's really funny that we need a state-of-art 64 core CPU to exceed 60 fps in AV1 encode.

Jokes aside, I know H.264 was at this point in the past, I just wonder how long it's going to be before we see AV1 hardware encoders that produce good quality video (and hell, hardware decoders at that as well).

Codecs make N-way tradeoffs between implementation complexity, latency, throughput, quality, ...

AV1 is designed to give you the ability to throw more CPU cycles at the encode side to achieve higher quality/byte while maintaining reasonable decoding performance. You don't have to use it that way, you can encode faster but give up quality/byte. But without AV1 we would not even have that choice, the previous generations reach diminishing returns at some point.

And hardware decoders are not that great. They use less power and can achieve realtime encoding, but if you want the maximum quality/byte (at the expense of encoding time) then software encoders generally reign supreme due to years of iterative improvements that you don't see in hardware. This is important for streaming services which spend those cycles once and then streams to millions. The asymmetry makes it worth it.

> Either way, it's really funny that we need a state-of-art 64 core CPU to exceed 60 fps in AV1 encode.

When H.264 was introduced, you needed a state of the art CPU just to play it back... You can imagine how slow encoding it was.

Wasnt that bad. H.264 ratified in 2003, meanwhile even first Xbox released in 2001 (~celeron 733MHz) can play H.264 480p 2.5Mbit (DVD resolution, equal quality to 10Mbit mpeg2) movies with ease, just as desktop budget processors released in 2001 (Celeron 800/Duron 800, both included SSE).

Higher resolutions were another matter. 720p needed at least 2GHz Athlon/2.8GHz P4 for smooth H.264 720p 5Mbit playback. Either top of the line 2003 CPUs, or 2006-2007 budget ones.

H.264 1080p 30Mbit bluray released in 2006 could be decoded purely in software on top of the line 2006-2007 CPUs (dual core P4, A64 X2).

AV1 is not Intel's codec, though they have released an encoder for it.

AV1 is a video format. SVT-AV1 [0] is a codec, and it is (co-)authored by Intel.

[0] https://github.com/OpenVisualCloud/SVT-AV1

That is actually just an encoder and not a (co)der (dec)oder.

I'm curious, since I don't follow hardware that closely: Is the current battle really best framed as a fight between Intel and AMD, or between Intel and TSMC? I.e. is AMD's recent resurgence due to better chip design, or because Intel's fabrication is struggling, whereas TMSC isn't?

It isn't just both, it is all four area of them. That is Intel's Design and Fabrication and AMD's Design and Fabrication.

It was basically a perfect storm that is unthinkable few years ago. ( To me it is still very much unreal even with today's announcement. ) Intel 10nm cant be fixed in time ( In fact for 24 months, they just keep lying both publicly and in investor's meeting ) , that is Intel's Fab problem. And Icelake couldn't arrive on time because of 10nm, their Design could not adopt to 14nm or other node, it was stuck with 10nm, compare to Apple and AMD which has adopted the train development method where their Design were less fixated on a Node Schedule.

And AMD managed to execute in perfection. Naples set the tone to the industry, Rome ( Zen 2 ) were a huge leap in performance, the Chiplet design gave AMD the advantage in cost ( Smaller Die, Higher Yield, Mass manufacture in volume ), so while it had a lower price than Intel, they are not hurting their margin to fight this battle. Very important for the long term survival of AMD. Along with TSMC 7nm were running in perfect harmony. Not to mention TSMC were willing to fight and get 7nm capacity for AMD. Along with risking more CapeX and building more Fab.

And to add a fifth thing to all these, Intel had major securities problem just months before AMD's Zen 2 launch.

As if the whole thing were scripted to play the perfect Counter Attack by AMD and TSMC. But No, it is the Hard work and dedication of AMD and TSMC, the will to fight and deliver against all odds, compare this to Intel in the past 4 - 5 years.

So if you loathe Intel, now AMD is not only an alternative, but also possibly the best option on Server.

And if you love Intel, you should buy AMD to teach them a lesson for milking and sitting on their butt not Innovating.

Intel has always had their design and fabrication teams work much more closely together than their competitors. That's often been an advantage as it's allowed fab to sometimes impose unusually restrictive design rules, but only in ways that the design team could handle and thereby give the fabs scope to reach a node faster or improve its performance.

Don't forget that the IO die (which is actually much larger than the CPU die) is still on Global Foundries 14nm, so more of most of these chips is 14nm than is 7nm. This increases the number of 7nm chips AMD can get which definitely helps keep up with the (seemingly) massive demand.

There is real demand, I'm building a Zen 2 based system for a co-worker and I had to try three different places (online) to get my hands on a 3600 the other day.

Damn thing is faster than my 2700X (which I'll be upgrading to a 3900X when I get back from holiday).

AMD is straight killing it at the moment.

It's both. TSMC's 7nm process seems way more stable than Intel's 10m, but a simple die shrink wouldn't help a poorly designed chip that much. Zen is much, much improved over Bulldozer and their Infinity Fabric design could theoretically allow AMD to just add more cores and together with arch and node improvements could keep them competitive for a long time.

To add a touch more information to the chorus of "both" replies:

Intel's development process is such that a lot of the components of the chips were expected to be introduced on 10nm. With 10nm in such production trouble, these components mostly weren't backported to 14nm design. As a result, Intel's chip design has mostly stagnated as a result of its 10nm fabrication issues.

Both. AMD made a good decision to divest from Global Foundries. They have also made some great strides in chip design which improve instructions per clock (IPC) as well as increasing cache size and improving the various interconnects. The switch from a monolithic die design to chiplets which puts I/O at the center is an important change as well. But then TMSC enabling AMD to make a successful switch to 7nm adds to the efficiency equation and is likely key for clock speed targets.

As I understand, AMD pretty much couldn't divest from GlobalFoundries due to a contract, WSA(Wafer Supply Agreement), which is not public. In 2019 AMD announced that WSA got amended to allow AMD to use TSMC.

What did they use before?

Originally AMD owned their own fabrication plants. In the late 2000s they were forced to spin them off (as GlobalFoundaries) for a quick injection of cash after their merger with ATI went badly. Part of that spin-off was the Wafer Supply Agreement (WSA) contract which obligated AMD to buy a certain amount of silicon from GlobalFoundaries, effectively requiring them to manufacture the majority of their products with GF or pay GF for the privilege of manufacturing their products elsewhere.

If GF had kept up with the competition that wouldn't necessarily be a problem but for the past 10 years GF has struggled to deliver new nodes on time making the WSA a millstone around AMD's neck. With 14nm GlobalFoundaries ended up licensing Samsung's 14nm process and with 7nm they gave up altogether. While the details are confidential the latest amendment to the WSA presumably gives AMD much more freedom to manufacture their newest products elsewhere as GF literally can't.

AMD is using a kind of modular processor architecture which, as I understand it, makes the chips easier / less expensive to fab than an Intel design of comparable size, so it seems like the two factors are intertwined. I don't know enough about the specifics of Intel's 10nm issues to say whether it would have helped them though.

>AMD is using a kind of modular processor architecture which, as I understand it, makes the chips easier / less expensive to fab than an Intel design of comparable size

You pretty much got it. It all basically comes down to the yield you get on waffer. The larger the die, the less you can fit on a waffer, the more chance there is a defect in the die. Using a few smaller dies is a smart way to get high yields out of your waffers.

the modular design adds some memory access latency but i think it also lets them add larger l3 cache more easily which can mitigate that. For instance the new ryzen cpus crush intel on gcc compile time benchmarks due to the large l3.

Most of AMD's improvement is due to better chip design. The last 10% or so are from fabrication process.

There's also fabrication cost and yield which factors into the pricing and profitability of Intel and AMD products. In this respect Intel is competing against TSMC and GlobalFoundries (for the IO die).

It’s not just Google and Twitter, Azure is in as well: https://azure.microsoft.com/en-us/blog/announcing-new-amd-ep...

Reminds me of many years ago when the absolute best $ per performance setup was a rackmount dual socket opteron. Right when the opteron first became a thing.

They need to build a server ecosystem. I'm hoping that this success will help with that so that things are better positioned for the next new server CPU launch.

We've had engineering samples of Rome for quite a while. However, there are very few available boards with PCIe4 right now. The one we've tried (under NDA) has a busted BMC that won't accept network settings, which has really hampered Rome testing. We've actually done most of our Rome testing using 1st generation boards, with slower RAM and PCIe Gen3.

Genuinely asking, since I have little to no concept of this space - how does the prevalence of either Intel or AMD affect developers?

AMD being competitive means faster and / or cheaper CPUs for everyone, short term from AMD and medium term from both AMD and Intel. People have been frustrated with slow CPU performance progress resulting from diminishing returns of process improvement and lack of competition for Intel ("four cores should be enough for everyone and also expensive").

For most developers, it doesn't. They're both making x86_64 chips with coherent caches of similar size, similar core and thread counts, and a lot of overlap in instruction set extensions. For people doing very high performance work (heavy data processing, simulations, etc.) or performance critical work (think hand coding optimized crypto library routines in assembly), the subtle differences in those categories might affect how they lay out data and access it, how many threads they use and how they use them, and preferences for the availability of instruction set extensions that are specific to the kind of workloads they have (vector processing extensions, native crypto operations, unusual bit twiddling patterns).

Another difference is that Intel and AMD virtualization (vmx and svm) are incompatible with each other.

>For most developers, it doesn't. They're both making x86_64 chips

This is true for the vast majority, but there are niche cases were there are differences even though both are both x86_64(like Intel's FlexMigration and AMD-V Extended Migration).

Competition benefits the space. It lowers price of compute and advances technology. One party prevailing over the other for a significant length of time leads to monopoly economics and stagnation of the industry.

The context of recent news is that Intel has been dominant for many years, prices of processors have been high and growing, and performance improvements minimal. AMD is now offering cheaper and faster processors, which characterises a resurgence of competition in the desktop and server compute market. Ideally Intel will improve their offering in response, otherwise facing loss of the market to AMD, and further continue the competition. No impartial consumer should want either company to prevail, but AMD's present lead signals an end to Intel's dominant position.

Developers benefit directly from cheaper and more plentiful computing, but also indirectly as more applications and approaches become practical for their target platform. As devices become cheaper, the potential user base for an application becomes larger.

Developers will be more indirect beneficiaries since the cost of compute may go down since they will have leverage against Intel and even AMD. Other benefits may include faster refresh cycle and/or hardware bugs effecting the dominant chip maker does not have to drag down the entire ecosystem etc.

Ignoring who is "winning" - it's always good when new viable competition enters a previously dominated market. In this case both companies will be providing to increase performance and lower cost, which AMD has done a great job at - this forces intel to lower prices or improve features to stay competitive.

Win-Win for consumers either way. And for the last what almost 10 years Intel has been ahead in many use cases - and their new offerings are rather stale IMO.

the new ryzens will compile code a lot faster for one. we might need to start learning more about optimizations that are amd specific.

It depends where on the stack you are doing development. For compiler developers, kernel developers and driver developers it could have a reasonable affect. For user-space application developers and web developers it should have very little affect.

Very little. Unless you write a compiler, they're both just x64 processors.

Intel stock has dropped quite a bit in the last month.

I remember when AMD's stock was a running joke on WallStreetBets. Doesn't seem like much of a joke now.

It’s still a running joke. It’s just running up now.

Would love to see someone who YOLO'd @ $2.00 and HODL'd till now...what a bet that would've been.

I seriously considered it just under $2. If I believed all I read about zen it would have been a sure thing to double or triple my money on an AMD comeback. But I couldnt tell if that was my intuition or just hope, so no bet. If I had bought there's no way I would have held all this time either (maybe some shares but not all). But with what's happening now it looks like $40 is entirely possible. That will be 20x in just a few years, but I didnt play...

I bought at $5.75 and held for the past 11 years...

Bought at $2.02, four years ago. Not all that many shares, sadly, but I'm happy. :) It somewhat makes up for me stupidly selling the cheap Apple stock I had, just before the iPad came out. Oops.

Only €4ish to €12ish, then a few successful call options later. Made a months's salary or two from it, no big deal. But it made me not regret paying launch price for a Ryzen 1800X :)

bought at 8 sold at 32 (reached it months ago or so). when it was 2 we would bet on whether it dies or gets bought. incredibly unstable stock with decades of underperforming. hope this new trend will continue well into the future, as everything seems to be working out for them.

I love that! This AMD/Intel competition is bringing us faster and cheaper cpu!

How long do you guys think it will take Intel to come up with an Infinity Fabric equivalent architecture? I feel like Intel wasn't working on it until recently and that it might actually be a few years before they can produce competitive silicon, especially accounting for the possibility that AMD is going to keep releasing on this yearly cadence while simultaneously leveraging full node improvements @ TSMC/Samsung...

Weird sensation, I remember the struggling days of AMD. And now that they're about to become very successful (plausibly) ... I'm almost tempted to donate to them. Go AMD.

Donate? Buy some of their products.

If their intention is to support AMD or reward their competitive technological investment, yet they don't need a new processor, then it would be more appropriate to buy AMD stock.

I'm not sure how buying AMD stock benefits AMD in any meaningful way.

I can think of a couple of indirect ways.

1. It pushes up the price which helps incentivize AMD executives and key employees who have an equity component to their remuneration

2. It makes it easier/cheaper for AMD to raise funding via equity or equity linked markets if they ever need to

At scale, either or both of those could help them though sales might be better

Been using a Ryzen thinkpad. It's been good!

I didn't imagine donating 100$ or anything near the price of one of their CPU. It's just that they cracked through years of hard work and it's great IMO.

Are you actually talking about giving away your money to a multi billion dollar public company?

Buy some stock instead, don't just throw your money away. How would you even 'donate' to them?

It's just an emotion toward a company that made business in a way I value. Maybe buying stock is how you express "emotional" appreciation in the business world.

buy a share, $32.50 at the moment

Maybe it has something to do with all the recent speculative side channel attacks. Intel seems to have more hardware vulnerabilities than AMD.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact