AMD Ryzen 9 3950X is the fastest processor on Geekbench

tmd83 · on June 12, 2019

If that's really true, a 16 core AMD having a higher performance than a 18 core intel processor at twice the price, that's a fabulous news for all consumers. Hopefully that will stop both intel from setting absurd price for mid end processor and generally push the industry forward.

CoolGuySteve · on June 12, 2019

Until Intel releases their 3D chiplet technology, is it even possible for them to compete on price with Zen 2?

The yields for their high-end multicore packages must be abysmal compared to the 7nm TSMC chiplets AMD is packing together.

mtgx · on June 12, 2019

If anything the 3D chiplet tech will make their chips even more expensive/lower yield/more likely to overheat.

HBM memory for instance is still quite expensive, and even 3D NAND was more expensive initially. Now it's harder to tell because the whole market crashed and they reached a limit for density anyway.

CoolGuySteve · on June 12, 2019

I agree with you, I don't see how Intel will cool a 3D chip unless the bottom package is extremely simple. Even then, it looks like that design would be more expensive than the multilayer mini-board + ball grid array AMD explained in their presentation.

In both this case and their delays in getting to 10nm due to wanting larger dies, it really feels like Intel's management is letting better be the enemy of good whereas AMD is making smarter choices about where to compromise.

By the time Intel releases their awesome 10nm 3D chiplet stack, AMD will likely have moved onto 5nm compute chiplets with a 7nm IO chiplet. It's not clear to me how Intel will catch up in the next 5 years or so.

ct520 · on June 12, 2019

Uh - Intel is releasing Foveros this year. Assume its going to be in a surface type of device. Little big core for x86. AMD will not be on 5nm. You seem confused about the status of the industry..

Point being you were wrong in the statements you have been making. 5nm will not be out by the time Foveros is introduced. I did not state Foveros is a competitor to a 128 thread epyc processor..

CoolGuySteve · on June 12, 2019

Ya cool, a small mobile chip. Meanwhile AMD is releasing a 128 thread EPYC later this year. No yield problems here.

What's your point again?

ct520 · on June 12, 2019

see above

TomVDB · on June 12, 2019

Not sure who you're talking about when you write "their chips". Is that AMD or Intel?

mtgx · on June 12, 2019

He was talking about Intel's 3D chiplet tech.

agumonkey · on June 12, 2019

The next move for intel is to buy all AMD outputs so that consumer can never get their hands on it.

Zekio · on June 12, 2019

that would mean they would have to buy AMD

vbezhenar · on June 12, 2019

And it's a consumer cheap, not really from the same league. Can't wait to see what new Threadripper will offer.

sbov · on June 12, 2019

Although it's a "consumer" chip, at a pricetag of $750 its kind of a weird hybrid.

rasz · on June 12, 2019

Wanna guess how much first ever 60 MHz Pentium chip cost in 1993? How about first 90 MHz Pentium in 1994? First Pentium II in 1997? Price of fastest Athlon K7 in August 1999? All more than $750 despite 20 years of inflation.

velox_io · on June 12, 2019

I think it's aimed at people who do specialised tasks such as video editing without the extra expense of moving to HEDT hardware.

Plus, these chips with two CCXs also has double the PCIe lanes 40! So a number of NVMe drives, GPUs, 10GbE etc... can run together without fighting over lanes (and that's without double bandwidth of PCIe 4.0).

It does still feel weired calling 16 core/ 32 thread CPU with 72MB of cache 'consumer'.

virtuallynathan · on June 12, 2019

I don't think it has more PCIe lanes -- those are limited by the IO die, afaik.

velox_io · on June 12, 2019

There is a little confusion whether it is 40 lanes or feels like 40 lanes. PCIe is a serial interface so they only need an additional wire per lane, per direction. So it is feasible as CPUs generally have spare pins.

As DDR5 is coming out next year, that will mean a new socket, limiting the upgrade path for the CPU, RAM & Motherboard. Although, 16 cores ~4.5ghz shouldn't be a problem for the near future (maybe 5 years even). Same goes with the PCIe bandwidth.

Edit: Just done some checking, I appears the 3950X has 24 PCIe lanes (16+4+4), but they are twice as fast, so not far behind the current 2nd generation ThreadRipper!

paulmd · on June 12, 2019

There is no confusion. It has x16 lanes for graphics/slots, x4 dedicated lanes for NVMe storage, and x4 lanes for the chipset.

The chipset multiplexes up to x16 lanes of "stuff" onto the x4 chipset lanes from the CPU.

All of this is physically determined by the pinout of the socket and none of this can change unless AMD moves to a new socket. What did change is the speed of the lanes - x4 lanes on 4.0 is twice as fast as x4 lanes on 3.0.

AMD, like Intel, likes to pretend that chipset lanes "count" as full CPU lanes, arriving at a total of 36 effective lanes. But that's nothing new either.

https://www.techpowerup.com/img/PXQ4X5jG1JfBexk6.jpg

https://www.techpowerup.com/img/HMRSMmd3jbU2CVX0.jpg

d1zzy · on June 12, 2019

That's correct, however, considering that Zen 2 supports PCI-E 4.0, that's double the bandwidth of the previous generation, it means that those multiplexed 4x can theoretically now support double the bandwidth of the previous generation 4x and it's not like the "stuff" that does get multiplexed over that 4x (USB, SATA, some 1x cards like sound, wifi and ethernet) also suddenly needs twice the bandwidth meaning that in practice, that 4x works as an 8x in Zen 2 motherboards. Great deal I say :)

opencl · on June 12, 2019

Even if the IO die had them the AM4 socket only has enough pins for 24 lanes.

CBLT · on June 12, 2019

It's a halo product for their am4 platform. I've heard people recommend midrange Intel cpus over equivalent Amd parts because you could upgrade to a top-of-the-line cpu later on that platform, but you couldn't go higher on Ryzen.

ninkendo · on June 12, 2019

Does anyone honestly upgrade CPUs? I’ve built machines with upgradability in mind for decades but I can’t say I’ve ever actually done it. Every time I want to upgrade, inevitably enough time has passed that there’s a new socket out and I have to replace the motherboard anyway.

KiwiJohnno · on June 12, 2019

I've done it, just this last weekend infact. Went from a Pentium G3258 nicely overclocked at 4.5ghz to a 2nd hand 4670k overclocked to 4.2.

But if you are thinking of upgradeability of CPUs you are much much better sticking with AMD, who don't change their CPU socket every couple of years.

iaml · on June 13, 2019

Well, they effectively changed the socket this generation as AFAICT you can't use new CPUs on any of the current x470 mobos.

madlynormal · on June 13, 2019

Actually, with a BIOS update, the new Zen 2 CPU's are indeed compatible with most x470, b450, and even 300 series motherboards.

iaml · on June 13, 2019

Looks like I was for some reason confusing pcie 4 compatibility (which will not be supported even on top end mobos) with cpu compatibility. Can't edit the post though.

effie · on June 13, 2019

You can put new Zen 2 cpus into old 370/470 boards, except the weakest ones. Maybe they will change socket for Zen 3.

Shorel · on June 13, 2019

> Does anyone honestly upgrade CPUs?

Purchasing decisions are not always rational, many times they are just emotion driven. The fear of missing out is hardwired into our brains.

glenneroo · on June 12, 2019

I've only done it once before with an early Pentium, I believe from 133 to 300 MHz? I plan to do it again with my Ryzen 7 1900 as soon as the 3000-series hits the market.

haldean · on June 12, 2019

I'm probably going to buy it; I do 3D animation/simulation stuff that gets huge benefits from parallelism, and it sounds like the Ryzen 9 beats everything other than the top-line Threadripper 2990WX (which has 32 cores but only 3GHz base clock with 4.2 boost). The Threadripper isn't a clear winner (the base and boost clocks are quite a bit lower), and they're super pricey ($1.7k). $750 for 16 cores at 3.3/4.3GHz is incredible.

bityard · on June 12, 2019

Gamers will happily buy it.

I remember paying paying something like $800 for an Athlon 700 MHz CPU (the cartridge one) in 1999.

penagwin · on June 12, 2019

Gamer and programmer here! From the perspective of a gamer with a "large but not infinite budget" in the past say ~8 years. I play counterstrike where any fps stutter is unexceptable. I also enjoy prettier games like BF5, etc. My current system is an i7 8700k, 32GB ram (just because), and a 1080ti.

Intel has always been the go-to. The #1 Priority is thread performance, first and foremost. Second is at least 4 cores. Most modern games can utilize at least 4, but it's also important to give the OS and other programs like discord plenty of cores.

While the Ryzen Gen 1 and Gen 2 have been amazing values, for gaming performance Intel has still ben king. When you compare AMD to Intel FPS to FPS Intel nearly ALWAYS wins.

CSGO is especially thread performance reliant, but this goes for most games. It's worth noting too that while games can use multiple cores, I don't believe most engines scale to 8+ cores very well.

devonkim · on June 12, 2019

Historically the only reason Intel has won on absolute top performance gaming FPS is because their raw single-threaded performance has beaten AMD due to most games still being bad / ineffective with multiple threads. For the first time in many processor generations this may actually not be true because of Intel’s stumble in their 10 nm transition.

smileybarry · on June 12, 2019

That changed slightly with Ryzen: AMD closed the gap on single-threaded IPC (close enough, anyway) but the new issue with Zen 1 and Zen+ was memory/cache/inter-CCX latencies. Zen+ solved most of the memory latency issues but hadn't fixed cache/CCX latencies much.

Supposedly Zen 2 solved most of that. (And some game benchmarks like CSGO suggest they really did) We'll see how it actually pans out since there's still the issue of inter-CCX latency (and now even cross-chiplet latency).

mizzack · on June 12, 2019

Windows 10 1903 has scheduler changes (intra CCX bias?) that seem to offer significant performance uplift in games (10+%)

smileybarry · on June 12, 2019

It doesn't solve all of it however. If your program has more than "$number_of_cores / 2" threads, you'll cross the CCX boundary at some point(s). On Zen 2, that instead changes to "$number_of_cores / 4" (CCX boundary) or "$number_of_cores / 2" (chiplet boundary).

Inter-CCX communication requires hopping over the Infinity Fabric bus, which (in case of Zen 1, no newer benchmarks) increases thread latency from ~45us to ~131us. I'm sure it was reduced in Zen+ and is probably closer to 100us by now. However, I'm not sure if inter-chiplet communication will be the same (e.g.: has its own IF bus) or worse (IO chip overhead).

Hopefully someone runs the same inter-thread communication benchmarks on Zen 2.

reilly3000 · on June 13, 2019

Recovering CSGO player here. I got a beefy box (TR1950X, dual 1070i’s, NVME, etc) for ML and crypto mining, and gaming inevitably followed. That plus low ping internet immediately boosted my ELO rankings and I started having more fun. Life in general became less fun since my sleep was suffering. That and the toxic CSGO community has kept me away, but I still relish the palpable advantage I enjoyed with better gear.

penagwin · on June 13, 2019

The trick is to play random matches and gradually add people you enjoy playing with. We started doing this a year ago, and now we have a small discord server with a few dozen people who are all fun to play with. It's best to recognize things are frustrating by not verbalize it 24/7 as it lowers the teams moral.

Otherwise, yeah it can be terrible.

leetcrew · on June 13, 2019

as another CSGO player, I am seriously considering one of the high end ryzen 3000 cpus. if the performance is as advertised, it looks like amd will be the single thread king for at least 6 months.

penagwin · on June 13, 2019

Before you pull the trigger wait to see what the latency between the chiplets/memory does to framerates. We'll know once benchmarks are out, but remember not just to look at average framerate but minimums too, you can have high framerate with terrible stuttering.

daveguy · on June 12, 2019

I doubt gamers will be a big market for that chip. You don't get a whole lot of increased capability / FPS with a high end chip compared to a mid range chip when the GPU is generally the limiter. But I do think they are going to sell a ton of 3600-3800 chips.

tracker1 · on June 12, 2019

Came to say the same... the 3600 (non-X) is extremely competitive in gaming, and is pretty likely to have some good overclocking headroom with a good water cooler. Personally, I'm very much looking forward to the 3950X and will probably be my choice (even though waiting yet another 2 months to upgrade) unless something significant/soon happens in the next ThreadRipper, the 3950X is likely to be a very sweet spot carrying it for 5 years and more.

I've said in other comments my 4790K is getting a bit old at this point, not slow for most stuff, but definitely hungry for more cores for a lot of tasks, and looking to break past 32gb of ram. I'd also been considering Epyc or even Xeon, as older/used Xeons can be very well priced. Guess I'm waiting until September.

dev_dull · on June 13, 2019

> I've said in other comments my 4790K is getting a bit old at this point, not slow for most stuff, but definitely hungry for more cores for a lot of tasks, and looking to break past 32gb of ram. I'd also been considering Epyc or even Xeon, as older/used Xeons can be very well priced.

I’m in nearly the exact some boat. I’d like to have ECC ram the second time around for my home server, which the Zen chips reportedly support though I don’t see people using. I’d also like better power usage. I think I’m going to wait one more year.

tracker1 · on June 13, 2019

Just got a used Dell, dual 8-core CPUs and 128GB ECC ... main purpose is for a NAS and it'll sit in the garage because of the noise. I may look into what CPU upgrades are available and maybe throw some heavier workloads at it.

For now, planning on just playing around with it. I haven't decided if I'll be running Windows or Linux as the base OS yet.

dev_dull · on June 14, 2019

Is it a rack server? I found the power usage too high on those servers compared to more traditional servers.

tracker1 · on June 25, 2019

It's a standard 2U enclosure... haven't tested the power usage, but it's a relatively current Intel CPU (E5000 series iirc), so should idle reasonably well.

vardump · on June 12, 2019

Well, at first gamers said dual-core chips are useless. Then that quad-core chips are useless. Now they're testing waters with octa-core chips.

Game developers have always made a good use of the available resources. They'll use the extra power available. The newest techniques they have, like work stealing queues, can scale to a large number of cores.

So games and gamers will use the extra cores. It's much less of a jump from 4 cores to 16 than from 1 to 2.

Give it a year or two.

ResidentSleeper · on June 12, 2019

In (recent) games made with Unity, a lot of workloads like scheduling the GPU and such are offloaded to separate threads with (almost) no developer intervention. Future games will extensively utilize the job system which provides safe and efficient multithreading. Not sure how Unreal and the remaining leading engines stand, but things seem to be looking very good for high core count CPU owners.

debaserab2 · on June 12, 2019

Pretty bleak for Unreal. UE4 uses only one core at a time and is often the limiting factor before GPU.

leetcrew · on June 12, 2019

gamers will probably not be a big market for that chip, but it might be appealing for gamers with a large budget. unless intel has something big hidden up their sleeves (doubtful when they don't even plan to release their next mobile line until holiday 2019), that 16-core chip will likely have the best single-threaded performance on the market. plus it has to be a highly binned part to have the same TDP as the 12-core model even with a slightly higher boost clock. I for one am very interested to see overclocking results.

wisty · on June 12, 2019

Just look at the cost of gaming GPUs (including the costs of watercooling?). Not to mention the fact that CPU can have a slower upgrade cycle than GPU (since a CPU upgrade will usually mean upgrading the motherboard, possibly the RAM, who knows what else while you're there), so getting a top GPU is not at all cheap in the long run.

No it's not worth it IMO, but some people spend crazy amounts chasing a few extra fps.

smileybarry · on June 12, 2019

More recently, the 9900k's MSRP was $500 but it was sold for $600 at launch due to scarcity. People wondered who would even buy that given its price but gamers (myself included) happily did and it sold out for months.

jplayer01 · on June 12, 2019

It probably sold out because of extremely limited stock though, not because of how high demand was...

gnulinux · on June 13, 2019

Yeah but if you want the best consumer chip in the market, you're probably a consumer with special needs e.g. someone that encodes/renders videos; engineer/scientist who needs tons of computation/simulation/visualization; gamer who plays CPU-intensive games like factorio/dwarffortress/rimworld etc... so in that niche $750 is still a very much consumer product. The end computer setup will cost around $2k, $3k which is a pretty normal price for these kind of niche consumer computer.

NightlyDev · on June 13, 2019

To put it into perspective: It's like half the price of an expensive phone.

High end phones has gotten way more expensive, but there are still consumer products.

H8crilA · on June 12, 2019

What really matters (for people that do the CAPEX and OPEX math on their assets; not gamers) is the performance/power ratio. Without this I don't see AMD eating much of Intel's lunch (35B vs 208B market cap).

sbov · on June 12, 2019

The Zen 2 16 core chip is 105 watt TDP. The chips its wiping the floor with are 165 TDP. TDP doesn't necessarily correlate with real world usage, but benchmarks show that AMD is much better at their chips running closer to TDP than Intel chips are, so the gap is probably actually wider. The strength of Intel chips is being able to pump a lot of power through them to hit higher clock rates.

It sounds like you're saying performance/power is a benefit for Intel, possibly based upon the history of AMD chips, but that line of thought has been wrong since the Ryzen architecture.

nolok · on June 12, 2019

> benchmarks show that AMD is much better at their chips running closer to TDP than Intel chips are, so the gap is probably actually wider

AMD gives their TDP with enabled turbo (similar to real usage), Intel gives TDP at rest / no turbo enabled.

There is still some variance from both between given and real TDP, but the core of the difference is well assumed, and dates back to almost a dozen CPU generations back when Intel already had to guzzle power like crazy to superclock their chips in the vague hope that they could compete with AMD's products of the time (and then they never reverted it once they took the lead back with the core architecture)

It's kind of similar to the whole "Intel wants comparison dont with SMT off", due to the last 15 years being theirs, the whole thing is biased toward Intel, ... yet they still massively lose those comparison.

throwaway2048 · on June 12, 2019

This is accurate, several "95W" TDP intel chips will happily guzzle upwards of 200W+ for sustained periods (providing they don't down clock due to heating)

ztjio · on June 13, 2019

No this is absolutely not accurate. This only happens due to motherboard defaults running all cores at turbo speeds simultaneously as well as automatic overclocking behaviors on by default.

beatgammit · on June 13, 2019

The reasoning doesn't matter, what matters is what the average consumer sees. If most consumer motherboards do it "wrong", Intel should use those numbers instead of the less common, but "correct" case.

piva00 · on June 13, 2019

And almost every benchmark will run on one of those motherboards, or do you have a list of curated benchmarks where they were done with TDP limitations per spec?

ztjio · on June 13, 2019

This is wrong, intel gives their TDP with CORRECT turbo enabled. The problem is that nearly every popular motherboard out there enables turbo on all cores simultaneously ("enhanced multicore" for example on ASUS boards) which blows the TDP out of the spec massively.

As a Small Form Factor enthusiast, I can attest to this with utmost confidence. The chips will run at their expected TDP when configured as specified by the factory, that's just not the default on almost any enthusiast board from known companies. In the case of ASUS it can actually be a bit of a battle to get things to run as intel specifies, both with MCE and automatic overclocking behaviors.

pizza234 · on June 13, 2019

> The problem is that nearly every popular motherboard out there enables turbo on all cores simultaneously ("enhanced multicore" for example on ASUS boards) which blows the TDP out of the spec massively.

If that's the case, then also the performance is "massively blown out", since essentially all the benchmarks around are based on popular motherboards.

Anantech did a test some time ago with a real, fixed, 95 W TDP[¹], and it ain't pretty.

It's definitely good for Intel that "every popular motherboard" is, uh, guilty of going out of spec, otherwise, the popular opinion of Intel chips would be significantly lower.

Regardless, I'm also not really convinced that this can be considered "cheating" by the motherboards. According to the official Intel page [²]:

> The processor must be working in the power, temperature, and specification limits of the thermal design power (TDP)

so ultimately, it's the CPU that sets the performance/consumption ceiling.

[¹] https://www.anandtech.com/show/13591/the-intel-core-i9-9900k... [²] https://www.intel.com/content/www/us/en/support/articles/000...

AstralStorm · on June 13, 2019

And you can do the same on quite a few AMD boards with "Precision Boost Overdrive", which gives you a 300W TDP on 1950x if your cooling can handle it. Or you can ignore turbo and flat overclock the thing, which depending on workload will produce better results.

Source: I have one of these.

tmd83 · on June 12, 2019

I think even on 1st gen Zen processor intel had performance/watt advantage though the gap was lower. It's Zen 2 that has completely obliterated intel in performance/watt. Which is almost a bigger shock than the AMD taking the performance crown. Taking both the performance and efficiency crown in a single generation specially when it's not even a full new generation is beyond impressive. Of course it was only feasible with the foundation build up by Zen 1 but it's still very encouraging after the major stagnation that was there in desktop processor for a long while.

kllrnohj · on June 12, 2019

It's going to depend heavily on which specific CPU you're talking about and which specific workload, but at least on Handbrake the Ryzen 2700 was the performance/watt king per legitreviews: https://legitreviews.com/wp-content/uploads/2018/05/performa...

Maybe Intel took that back with their lower clocked 8c/16t chips, dunno, this isn't something that comes up all that much in consumer reviews. But there's at least not a significant gap in either direction, it's pretty much a wash.

On the server side of things Anandtech didn't seem to go much into it but at least with this one: https://www.anandtech.com/show/11544/intel-skylake-ep-vs-amd...

The dual EPYC 7601 used 100w less than the Xeon competition in povray while also being the fastest system by a substantial margin at povray, too. Which would put performance, power, and performance/watt all firmly in the EPYC 7601's domain on that one test. And Intel took it back on MySQL. So 50/50 split.

paulmd · on June 12, 2019

There are a lot of factors to unpack here, but the 8700K has 2 less cores than the 2700X, which is the reason the 8700K is coming out behind. The direct comparison here is the 9900K, but the 9900K ships with significantly higher stock clocks (4.7 GHz all-core), which also reduces its perf/watt.

When limited to its "official" 95W TDP, the 9900K does about 4.3 GHz and has a higher perf/watt than Ryzen (both higher performance and lower power consumption).

So basically you are in a situation where the Ryzen pulls less at stock, has slightly higher efficiency at stock, but has a much lower clock ceiling. While the 9900K ships with much higher clocks and worse efficiency, but has a much lower power floor if you pull the clocks back to 2700X levels.

https://static.techspot.com/articles-info/1744/bench/HandBra...

https://static.techspot.com/articles-info/1744/bench/Power_H...

https://www.techspot.com/review/1744-core-i9-9900k-round-two...

Of note, the 2700X is actually pulling ~130W under AVX loads (33W more than the 95W-limited 9900K).

The Stilt noted that the default power limit AMD ships is 141.75W and the 2700X will run it for an unlimited amount of time (whereas Intel at least claims PL2 obeys a time limit, although in practice all mobo companies violate the spec and boost for an unlimited amount of time as well). So really "TDP" is a joke all around these days. Nobody really respects TDP limits when boosting, and it doesn't directly correspond to base clocks either (both 9900K and 2700X can run above baseclocks at rated TDP). It is just sort of a marketing number.

https://forums.anandtech.com/threads/ryzen-strictly-technica...

Epyc is a different matter and once again more cores translates into better efficiency than fewer, higher-clocked cores. But the gotcha there is that Infinity Fabric is not free either, the infinity fabric alone is pulling more than 100W on Epyc chips (literally half of the total power!).

https://www.anandtech.com/show/13124/the-amd-threadripper-29...

Similarly, the 2700X spends 25W on its Infinity Fabric, while an 8700K is only spending 8W. So, Infinity Fabric pulls roughly 3x as much power as Intel is spending on its Ringbus. This really hits the consumer chips a lot harder, mesh on the Skylake-X and Skylake-SP is closer to Infinity Fabric power levels (but still lower).

Plus, GF 14nm wasn't as good a node as Intel 14nm. So Ryzen is starting from a worse node.

Moneyshot, core for core, power efficiency on first-gen Ryzen and Epyc was inferior, but of course Epyc lets you have more cores than Xeon. Ryzen consumer platform's efficiency was strictly worse than Intel though.

And that goes double for laptop chips, which are the one area that Intel still dominates. Raven Ridge and Picasso are terrible for efficiency compared to Intel's mobile lineup. And AMD mobile won't be moving to 7nm until next year.

Because of that whole "nobody obeys TDP and it doesn't correspond to base clocks or any other performance level", we'll just have to wait for reviews and see what Zen2 and Epyc are actually like. I am really interested in the Infinity Fabric power consumption, that's potentially going to be the limitation as we move onto 7nm and core power goes down, while AMD scales chiplet count up further.

tmd83 · on June 13, 2019

I somehow completely missed this coverage of Infinity Fabric power usage. I wonder if IF power usage percentage remains the same in this generation or it has been reduced. If not improvement of IF power usage would remain a viable opportunity to make these chips even more power efficient. It seems that given IF power usage it's clear that I was even more uninformed about the power usage of first gen Zen cores.

H8crilA · on June 12, 2019

This is good to know. Are there any reputable benchmarks that show those advantages? Something like FLOPS/watt on some LAPACK or Tensorflow test, or amount of joules to compile the Linux kernel, or anything of this sort?

ac29 · on June 12, 2019

> It's Zen 2 that has completely obliterated intel in performance/watt. Which is almost a bigger shock than the AMD taking the performance crown.

Why is this shocking? Zen 2 is 7nm and Intel's latest is at 14nm. It would be a far bigger shock if they didn't beat Intel in performance/watt. Zen 2 vs whatever Intel releases on 10nm in the next ~6-18 months is a much more interesting comparison.

cptskippy · on June 12, 2019

I believe he's speaking historically.

AMD wasn't really a consideration but for budget until they launched the Athlon in the late 90s. The success of Athlon was as much about Intel's fumble with Netburst as it was with Athlon being a solid competitor.

It took Intel almost a decade to roll out Core and in that time AMD failed to capture the market despite making tremendous gains and legitimizing itself.

Ultimately AMD fumbled with the Bulldozer/Excavator lines of CPUs and lost almost everything they had gained.

The reasons AMD couldn't capture the market are complex but the short answer is that Intel influences every aspect of a computer from software, to compilers, to peripherals, to firmware.

nolok · on June 12, 2019

> It took Intel almost a decade to roll out Core and in that time AMD failed to capture the market despite making tremendous gains and legitimizing itself.

And by AMD failed you mean Intel used illegal means to stop them from it, right ?

The US, Japanese and Korean fair trade comission equivalent all either blamed Intel or fined them. The EU was still too young in that area to be in time but in 2009 they gave one of their biggest fine ever at 1.45 billions € to Intel for what they did, along with an approriate "oh and if you do it again we won't be late, and won't be so nice".

Calling it "AMD failed to capture the market" is technically true, but that's one funny point of view.

snvzz · on June 12, 2019

>AMD failed to capture the market despite making tremendous gains and legitimizing itself.

Because Intel played dirty and illegal[0].

[0]: https://www.youtube.com/watch?v=osSMJRyxG0k

geezerjay · on June 12, 2019

> Ultimately AMD fumbled with the Bulldozer/Excavator lines of CPUs

I've heard this baseless assertion before but so far I've never heard any semblance of support. Why do you believe that AMD "fumbled" with their Bulldozer line?

theresistor · on June 12, 2019

As someone who works on low-level CPU performance code, everyone I work with "knows" that Bulldozer was a performance dud for most use cases.

This article about Zen starts with an overview of why Bulldozer failed to deliver: https://arstechnica.com/gadgets/2017/03/amds-moment-of-zen-f...

cptskippy · on June 12, 2019

How about the fact that it performed about as good as the previous generation at multi-threaded workloads but worse at single-threadwd workloads?

Or that while it was power efficient at idle, it was exceptionally power hungry under load?

Maybe it was when the CEO admitted it failed to meet expectations, said we'd have to wait 4 years for a successor, and then stepped down?

Idk... I'm probably way off base.

jsgo · on June 12, 2019

Not Ryzen related, but seems you're pretty up to speed with AMD products. Does that include Radeon as well? I have a MBP and I am considering a Radeon VII for my external GPU (currently GTX 1080 but only usable in Windows. Thanks Mojave). My main concern though is thermals and noise. Does it perform on par with Nvidia there or little bit worse or considerably so? Power draw I'm not that concerned with.

xigency · on June 12, 2019

In the most recent generations, Radeon has run hotter than nVidia cards for similar performance. Seems to be true of the Radeon VII as well [0].

[0] https://www.theverge.com/2019/2/11/18194190/amds-radeon-vii-...

jsgo · on June 12, 2019

thanks for that. That's a huge bummer. Really wish Apple wouldn't force the Metal issue with Nvidia. Yeah, it'd be nice and all, but as a user, I'm fine with the various scripts I have to run after macOS updates to get the card running again but they just nixed that outright. Oh well, hopefully AMD can solve the fan problems or Nvidia and Apple can work something out, either or.

wlesieutre · on June 12, 2019

The Radeon 5700 and 5700 XT are supposed to be competitive with the RTX 2060 and RTX 2070 at slightly lower prices. Only reference cards right now, but things might be looking up once OEMs have a chance to put better coolers on instead of AMD's reference blower.

I'm planning to hold out for next gen when they get ray tracing hardware to be a bit more future proof (my GTX 970's not dead yet), but since I'm thinking of trading my Wintendo out for a Mac + eGPU setup it's nice to see that AMD could actually be a good GPU option now.

Those were just announced this week, so keep an eye out for 3rd party benchmarks soon.

tracker1 · on June 12, 2019

Will probably pull the trigger on a Radeon VII myself, only because of the better Linux drivers, and possibility of hackintosh usage. At least for my current system, I did a mid-cycle upgrade for the GPU (GTX 1080) and added NVME a couple years ago. Still running 4790K on 32gb ram, and does great for most stuff, but not so much for encoding or dev work (couple dbs and services in background).

smileybarry · on June 12, 2019

Sadly they both appear to have a total board power 50W higher than NVIDIA's comparable model(s), so NVIDIA might still win out on power. But we'll have to wait for third-party benchmarks to confirm that.

ivalm · on June 12, 2019

The new Radeons are still worse in power/performance though.

mntmoss · on June 12, 2019

I would wait the month or so for Navi cards to show up and see how they do on thermals and if the application performance is to your liking; Navi is intended for midrange cards(says the PR) but getting similar performance to your 1080 is possible.

AMD's recent releases have a reputation of releasing at "hot/high-power" stock and then doing much better when undervolted. Navi will get the die shrink, so the results for both power and thermals are likely to be even better, but benchmarking needs to be done before we have a full picture of what's changed.

zrm · on June 12, 2019

Keep in mind that Navi (RX 5700 series) is out next month and is a new architecture.

tracker1 · on June 12, 2019

It looks like the latest AMD cards are a bit more power hungry than NVidia counterparts. On performance, the Radeon VII seems to be closely aligned to the RTX 2080 (not TI). The RX 5700 XT is around the RTX 2070, and the RX 5700 is above the RTX 2060. Depending on your workload, and if it can leverage the AMD targets, it could be good to great. If you don't actually care about RTX features (and the slow framerates that comes with it), then you're better off with AMD for the price difference, even considering the extra power needs.

tmd83 · on June 12, 2019

I guess you are not aware that at this point everything suggest that the upcoming AMD 7nm processors are significantly more efficient that similar performance intel processor.

The Ryzen processor is 105w vs. the significantly slower intel processor is 165w. Additionally also AMD's TDP numbers are much more accurate in terms of real peak usage than intel. So almost certainly Zen 2 processor will have a much better performance/power ratio than corresponding intel one moving forward. That was definitely not the case for AMD in their last generation.

mtgx · on June 12, 2019

In that case, Intel should be in big trouble, because the advertised TDP seems to be less than half the power required to reach the chips' advertised performance:

> In this case, for the new 9th Generation Core processors, Intel has set the PL2 value to 210W. This is essentially the power required to hit the peak turbo on all cores, such as 4.7 GHz on the eight-core Core i9-9900K. So users can completely forget the 95W TDP when it comes to cooling.

https://www.anandtech.com/show/13400/intel-9th-gen-core-i9-9...

In other words

1) Intel's advertised "TDP" = true? (they don't use the same original meaning of the "Total Design Power" anymore)

2) Intel's advertised peak performance = true (with caveats such as all the mitigations required for the CPU flaws, which lower performance)

3) Intel's advertised peak performance at advertised TDP = BIG FAT LIE

vondur · on June 12, 2019

The Core i9-9980XE pulls from 199W->245W depending on the workload and AVX instructions being used under stock settings. The Ryzen is listed as a 105W part, although when overclocked, I'm sure it will pull more than that.

https://www.tomshardware.com/reviews/intel-core-i9-9980xe-cp...

HeWhoLurksLate · on June 12, 2019

That means that the Ryzen part can pull ninety-four more watts- and basically double its TDP and still go under what the i9-9980XE will draw.

There also seem to be some new X570 motherboards that will actually support this level of craziness, too.

nolok · on June 12, 2019

AMD chip at 105w (and AMD give real tdp), while the Intel chip is at 185w (and Intel give tdp in non turbo mode).

Intel is currently getting absolutely destroyed on that front.

kllrnohj · on June 12, 2019

> AMD chip at 105w (and AMD give real tdp), while the Intel chip is at 185w (and Intel give tdp in non turbo mode).

Both AMD & Intel list TDP for all cores used at base clock frequencies. The major difference is Intel heavily leverages what they call all-core boost to never actually run at their base clock, allowing them to list rather ridiculously low base clock frequencies. For example the i9-9900K's base frequency is listed at 3.6ghz, but the all-core turbo frequency is a whopping 4.7ghz. That difference is how you end up with a CPU that expects a whopping 210W of sustained power delivery (the 9900K's PL2 spec) even though its TDP is only 95W.

AMD doesn't (didn't?) have an all-core boost concept, so their base clocks are just higher, making their TDP number closer to real-world. But still technically base-clock numbers and not boost numbers, and so you will still see power draw in excess of TDP.

Scaevolus · on June 12, 2019

AWS and Azure have had EPYC instances available for a while, so the math (presumably) works out for cloud operators.

0x8BADF00D · on June 12, 2019

AWS also has aarch64 instances (we use them for CI and simulations at my day job). It’s starting to look like Intel’s days are numbered.

tracker1 · on June 12, 2019

The difference in power is not enough to offset the price/performance differences. AMD is actually on par or better than Intel with Zen 2 anyway.

mrb · on June 12, 2019

Surprisingly no one noticed or reported that the memory is heavily overclocked by +29% in this specific benchmark. Here is the direct link to the detailed results: http://browser.geekbench.com/v4/cpu/13495867

Officially Ryzen 9 3950X supports up to DDR4-3200 (1600 MHz) according to the published specs https://www.amd.com/en/products/cpu/amd-ryzen-9-3950x however in this benchmark the memory was overclocked to 2063 MHz:

  Memory: 32768 MB DDR4 SDRAM 2063MHz

Memory overclocking heavily impacts Geekbench multi-core scores. For example the old Threadripper 2950X sees a score boosted by +18% (39580 vs 46908) with a +9% overclock (1466 vs 1600 MHz): http://browser.geekbench.com/v4/cpu/compare/13400527?baselin... Although to be honest comparing random Geekbench scores in their database is not exact science because too few system details are reported (for example we don't know if the user systems are running dual or quad-channel DDR4) and we don't know what other hardware mods users make.

mappu · on June 12, 2019

> Officially Ryzen 9 3950X supports up to DDR4-3200 (1600 MHz)

No, it supports "4200+ with ease, 5133 demonstrated".

From official slides https://www.anandtech.com/show/14525/amd-zen-2-microarchitec...

mrb · on June 12, 2019

While AMD claim it can be overclocked to 4200 or 5133, it doesn't invalidate my claim that officially it is spec'd for DDR4-3200 according to the product page: https://www.amd.com/en/products/cpu/amd-ryzen-9-3950x

Note I am not playing down the 3950X's performance. It is overall a processor superior to Intel's counterparts in most aspects.

extropy · on June 12, 2019

Because officially ddr4 is only up to 3200. See https://en.m.wikipedia.org/wiki/DDR4_SDRAM

Every ddr4 module beyond that is officially a 3200 module with overclock option. That's why you need to enable Extreme Memory Profile in your bios to use speeds beyond 3200.

shmerl · on June 12, 2019

I wonder if G.Skill will release new RAM targeted for Zen 2. Their 3200 MHz FlareX works pretty well with Zen+.

winkeltripel · on June 12, 2019

It's G.Skill. They'll release ~100 new SKUs for it, as long as they can get enough well-binned modules from samsung.

ajross · on June 12, 2019

That's... literally the opposite of what "supports" means... It works with overclocked memory some of the time, but they don't promise that it will.

The point being that this is a tricked out rig, not an official reporting of the CPU's performance. And that makes the headline essentially a lie.

thomastjeffery · on June 12, 2019

Geekbench doesn't compare stock rigs, it compares benchmark results - commonly used by overclockers, even those who go to extremes like liquid nitrogen. The benchmark results this is being compared to are also heavily overclocked and tuned systems.

ajross · on June 12, 2019

Cite for that? All I see are numbers with an Intel CPU model next to them. I don't see anything reporting the hardware configuration except for the one AMD system, which as noted is very significantly tweaked.

opencl · on June 12, 2019

The model name with a number next to it is some sort of average (they don't say but I think it's geometric mean?) computed from all scores submitted from that particular model. It's not terribly useful because you have no idea how many of them are overclocked, by how much, the memory configs, etc. without reading through every entry and a lot of them are missing info anyway.

You can see all 9980XE Geekbench results here: https://browser.geekbench.com/v4/cpu/search?dir=asc&page=1&q...

This 3950X result is definitely not faster than the top overclocked 9980XE, but it is faster than something like 3/4 of them. Given the base clocks of each I would expect the stock 3950X will end up at least slightly faster than the stock 9980XE though.

rrrazdan · on June 12, 2019

What could be a cite for this. PC builders worldwide like to build their computers and then benchmark them on geekbench. Naturally the top most benches will be the heavily tuned ones.

KSS42 · on June 12, 2019

See Geekbench database. The ones with high scores seem to have memory around 2100/4200

For example : https://browser.geekbench.com/v4/cpu/13481225

Single-Core Score Multi-Core Score 6178 56017

Memory 65536 MB DDR4 SDRAM 2101MHz

Name Intel Core i9-9980XE

fortytw2 · on June 12, 2019

I think the bigger news here is that it almost _doubles_ the score of the similar AMD Threadripper 2950X (also 16c/32t).

34650 to 61072 in a generation is no joke, while being both a far smaller, much lower power part.

lostmsu · on June 12, 2019

Don't you think it means the number is probably fake?

Before the release and subsequent independent testing the trust in any exceptional results should be very low.

jsgo · on June 12, 2019

But those independent tests are inevitable and probably right at or right after launch. Does AMD stand to gain anything by falsifying test results that are (relatively) easy to fact check independently?

I mean, no one should lose their minds over it right now or anything, but it seems impressive. I certainly don't see an upside to giving bogus stats right now.

lostmsu · on June 12, 2019

It's not AMD giving these stats. It is some random website on the Internet. They might simply be seeking attention.

cududa · on June 12, 2019

Geekbench has been around for a while and is considered reputable.

NullPrefix · on June 13, 2019

Geekbench does not do the testing themselves, they only publish your score that you run on your machine.

lostmsu · on June 13, 2019

I trust Geekbench, but the article is not theirs either.

kllrnohj · on June 12, 2019

I'd be more likely to believe Geekbench is just a terrible, broken benchmark than anything else.

An Epyc 7501 (32c/64t) apparently only gets 17k multicore score on geekbench under windows: https://browser.geekbench.com/processors/2141

Which is hilariously wrong. And if you think that's some quirk of Epyc, well, same CPU gets 65k when run under Linux: https://browser.geekbench.com/v4/cpu/10782563 So clearly there's a software issue in play. Maybe this is related to the new Windows scheduler change. Maybe geekbench just has some pathologically bad behavior. Who knows.

So yes we should wait for release & independent testing before getting too excited, even if that's just so we get numbers from something other than geekbench.

jfpoole · on June 12, 2019

Geekbench exposes some strange behaviour around the memory allocator under Windows. On systems with more than 8 cores Geekbench spends a significant chunk of time in the memory allocator due to contention. This issue (at least to this degree) isn't present on Linux, so that's why Epyc scores are much higher on Linux than Windows.

simplyinfinity · on June 13, 2019

There is a windows scheduler bug that affects epyc cpus.

https://m.hexus.net/tech/news/cpu/126440-amd-working-microso...

tracker1 · on June 12, 2019

The memory path for current/prior Threadripper is a pretty well known issue, and likely the cause of the disparity. It may or may not have been an issue in other types of workloads. The new memory path is more consistent, slightly slower than best case for prior gen, but huge leap forward for Zen 2 considering the better handling for higher clocks on RAM.

TazeTSchnitzel · on June 12, 2019

AMD have the benefit of process shrink, architecture improvements, and no NUMA this time.

SketchySeaBeast · on June 12, 2019

I find myself drawn to these new chips and news, but you're absolutely right - we need to be skeptical here. But I really want to believe. Either way, I wont be ordering until I see a lot of real 3rd party benchmarks.

mrb · on June 12, 2019

The old Threadripper 2950X can get up to 46908 with a little bit of memory overclocking, here it is with DDR4 at 1600 Mhz, a modest +9% over stock at 1466 Mhz: http://browser.geekbench.com/v4/cpu/13400527

This Ryzen 9 3950X scores so high because the memory is heavily overclocked by +29%, see my other post in this thread.

alfredxing · on June 12, 2019

This might be an unfair comparison — the AMD numbers are from a single benchmark, and the article is comparing this against the aggregated scores of the i9-9980XE. A few i9-9980XE multi-core scores on Geekbench reach higher than 60k as well, with the highest being 77554 multi-core.

dmix · on June 12, 2019

https://browser.geekbench.com/v4/cpu/search?dir=desc&q=i9-99...

Looks like a couple hit 70k+ at 3.00 GHz base [1].

[1] https://browser.geekbench.com/v4/cpu/13419502

Arie · on June 12, 2019

Geekbench just lists the stock speeds for the chip, not the actual speeds used for the benchmark.

dmix · on June 12, 2019

Yep I think that's clear in the Geekbench interface.

mbauman · on June 12, 2019

The Ryzens have an absurdly long branch prediction history that make them much better at repetitive tasks than random real-world workflows. I wonder how much this is effectively "gaming" the Geekbench suite.

https://discourse.julialang.org/t/psa-microbenchmarks-rememb...

chipguy · on June 12, 2019

That's not the impression I got from that thread. They seem to agree that this is bad for benchmarking, but remain undecided on whether that's good or bad for real-world processing.

It depends on the work. So as always benchmark suites are to be taken with a grain of salt. More specific benchmarks, such as compiling a standard set of real software packages, can give a clearer picture of performance for those more specific use cases.

Until we see more specific data on how these chips perform for certain tasks, this is just FUD.

mbauman · on June 12, 2019

Yes, that's why I qualified my "real-world tasks" with "random". What is clear is that:

* Ryzen has a longer branch prediction history than Intel's processors.

* This will give it an advantage on repetitive executions.

* It's a challenge to robustly measure tasks since using repeated executions to gain confidence intervals can interfere with the measurement itself.

What's not clear is to what extent real-world tasks are repetitive enough to benefit or random enough to be negatively impacted. It's likely a mix of both.

By no means am I attempting to spread FUD — I find it quite interesting and wanted to spark a bit of discussion on it.

chipguy · on June 12, 2019

Pardon. I didn't mean to imply you were intentionally doing that. Just trying to make sure there's skepticism of benchmarks as well as skepticism that the boost from branch prediction is dishonest.

sbov · on June 12, 2019

> More specific benchmarks, such as compiling a standard set of real software packages, can give a clearer picture of performance for those more specific use cases.

Is there a good place to go for this? I've tried to find software development focused benchmarks before, but I've come up mostly empty.

chipguy · on June 12, 2019

there are many different types of benchmarks with many different CPUs/GPUs compared here: https://openbenchmarking.org/tests/pts

for a more specific example, linux kernel compilation benchmarks: https://openbenchmarking.org/showdown/pts/build-linux-kernel

localhost · on June 12, 2019

Phoronix is a good place to go for compilation benchmarks - https://github.com/phoronix-test-suite/phoronix-test-suite

chipguy · on June 12, 2019

The link I posted in a sibling comment is a more direct way to get to the results of that suite.

kohtatsu · on June 12, 2019

Branch predictors sound like a good target for side-channel attacks.

ReverseCold · on June 12, 2019

Well yes, that's what Spectre is.

bstar77 · on June 12, 2019

Bravo, everyone on the PC side has great options now, but I feel for Mac "Professionals". Sad they just got straddled with the horrendous over priced and under performing Xeon platform. It boggles my mind why Apple would release a $6k model that will get trounced by these chips for a fraction of the price. I know the expand-ability is what you are buying into, but I imagine 90% of Mac Pro customers could care less about terrabytes of memory or a video solution that improves current vram limits. Add to all of that the gimped performance you are going to get on the Intel parts with the latest security patches.

stcredzero · on June 12, 2019

I feel for Mac "Professionals". Sad they just got straddled with the horrendous over priced and under performing ____ platform.

Funny, but by making the name of the platform a blank, this applies just as much to 2005 as it does to 2019.

http://lowendmac.com/2015/remembering-the-final-powerpc-macs...

thirdsun · on June 13, 2019

Apple must have had an interest in going with AMD - the fact that they didn't makes me think that getting macOS ready as a productive, reliable OS on AMD CPUs isn't as trivial as we might assume. Also, is Thunderbolt even an option with AMD?

mbfg · on June 13, 2019

Isn't the iMac a better option then the Mac Pro? It certainly is from a performance per buck option, but perhaps just in general?

NightlyDev · on June 13, 2019

I see no reason why that should be a problem. And yes, AMD has thunderbolt too.

bstar77 · on June 13, 2019

I think you nailed it... Thunderbolt is not possible on AMD.

NightlyDev · on June 13, 2019

It is...

xedarius · on June 12, 2019

There’s an increasing groundswell of trust in AMD and their Ryzen chip. It’s great news, I’ve owned one for two years now and it’s fab.

The new XBox will feature a custom Ryzen of some form. Who’s next, Apple?

Nexxxeh · on June 12, 2019

>it’s fab.

Given that it's AMD, shouldn't that be "it's fabless"?

I've got a mix of Intel and AMD, and have had no loyalty back to when I replaced my Pentium 75 with a pre-unlocked AMD Duron from OcUK.

I'm so glad to see AMD not only raise its game exponentially, but also force Intel to compete. It's good for everyone.

My next purchase will probably be a Ryzen 5 2600, because the price drop ahead of the 3xxx has made them ridiculous value for money.

Definitely a good time to be a PC gamer.

Slightly frustrating that the integrated graphics 3x00G chips are basically Ryzen 2xxx chips though. I hope the g-range gets a refresh with proper Zen 2-based chips shortly.

WRT "who next", did you see the Chinese AMD custom Ryzen+Vega APU console last year, the Subor Z-Plus, with 8GB GDDR5 as shared system and graphics memory?

tracker1 · on June 12, 2019

Totally agreed on the 3xxx(G|H) parts not being Zen 2, and really misleading on that front. Though they're mostly underclocked with lots of room for boost, so competitive to Intel's. Also the onboard vega gfx almost doesn't suck by comparison.

Nexxxeh · on June 13, 2019

As a correction to the above, I replaced my Pentium 75 with a K6-2 350, which was then replaced with the unlocked Duron.

stcredzero · on June 12, 2019

Given that it's AMD, shouldn't that be "it's fabless"?

It's "fabless fab!"

joshstrange · on June 12, 2019

The current rumor is Apple is going to ARM in 2020 for their computers. There is uncertainty if that will include MacBook Pros or Mac Pros initially or if it will just be their Air and maybe the MacBooks to start with. That's not to say they won't take their higher-end computers to AMD but I would bet if they are moving to ARM at all they are going to push for everything to be on ARM eventually and it's probably not worth the effort to switch from Intel to AMD in the interim.

Wowfunhappy · on June 12, 2019

As long as they're coming out with new hardware configurations anyway, why should switching to AMD require substantial effort?

There are modified Darwin kernels that allow Hackintosh to work on AMD processors. These kernels have some stability issues, but if hobbyist outsiders can get most of the way, I don't forsee it being a big hurdle for actual Apple engineers.

blihp · on June 12, 2019

Because strategically, the move to ARM makes more sense for them to focus on even if sticking with Intel is a bit more painful in the short term. They already have a large team working on ARM processors and an architecture license for the platform. With x86, they are basically just resellers. So adding AMD's flavor of x86 to their lineup would likely be seen as a distraction for them without providing a long-term benefit.

Wowfunhappy · on June 13, 2019

But by that logic, they shouldn’t come out with any new, non-ARM laptops at all!

As I see it, as long as Apple is putting out x86 hardware, there’s no reason why it can’t be AMD x86 hardware.

(I’m also secretly hoping the ARM thing won’t actually happen, but that’s neither here nor there, and I’m probably wrong.)

joshstrange · on June 13, 2019

Honest question, why don't you want them to transition to ARM?

Wowfunhappy · on June 13, 2019

Primarily, compatibility with legacy apps, and compatibility with other OS's (eg Bootcamp/Parallels).

I also have major concerns about raw performance at the high end, and I suspect ARM would come with even more software lockdown, although there's no reason that has to be the case.

hinkley · on June 12, 2019

I was watching highlights of WWDC and they mentioned that they're adding support to XCode to migrate iPad apps to the desktop.

I subscribe to the theory that the Air will move to ARM at some point. Adding this feature to XCode sounds like the sort of thing you would do to prepare the way for an architecture shift. Especially if you were still on the fence about that shift. Let's just get a feel of how viable this space is before committing to anything.

undefuser · on June 13, 2019

Except the change to XCode is a direct conflict of interest with moving MacBooks to ARM platform. If they are moving to ARM soon, there is no point in adding a brand new feature to the IDE that helps convert ARM apps to x86 apps. The reason Apple is doing so is due to the new Ipad OS that resembles desktop interface.

martin_bech · on June 12, 2019

Playstation 5 will also be based of Ryzen gen2 CPU and Navi GPU, both AMD.

masklinn · on June 12, 2019

Pretty sure Intel simply doesn't do bespoke parts (especially relatively low margin ones) so AMD is pretty much the only option.

After dropping Gen 7's PPC, both the PS4 and the XB1 were customised AMD APU, and there's no great architectural rival.

the8472 · on June 12, 2019

Intel does semi-customized parts, but mostly at the other end of the spectrum for cloud vendors.

dangus · on June 12, 2019

Never say never. The Xbox had a customized Coppermine Pentium-III era processor from Intel.

masklinn · on June 13, 2019

> Never say never.

I didn't?

> The Xbox had a customized Coppermine Pentium-III era processor from Intel.

The original xbox was a PC in a box, the CPU was not a customised part.

dangus · on June 13, 2019

I didn't literally say you said never, "never say never" is a common phrase.

And yes, the CPU was customized:

https://www.anandtech.com/show/853/2

At the very least, half of the cache is disabled. They cherry-picked a feature from the Pentium III lineup that they wanted to keep while lowering the cache to Celeron levels. It's a deliberate modification to reduce cost while maintaining desired performance.

It's not detectably customized beyond that but it's not like it's a SKU you can buy off the shelf, either.

UI_at_80x24 · on June 12, 2019

AND the new Google Stadia (using AMD GPU's server-side to do the heavy-lifting).

lern_too_spel · on June 12, 2019

The existing Xboxes and PlayStations already use AMD CPUs.

mtgx · on June 12, 2019

I could understand Apple not wanting to jump all over Ryzen 1000 from day one, but not even Zen2 2-based Ryzen 3000? Even after the whole 5G spat it had with Intel?

I'm not sure what they're waiting for exactly.

TacticalTable · on June 12, 2019

The thought would be that they don't want to do any architecture shifts before they bring it all over to their own ARM chips. And AMD still doesn't really deal with laptop processors.

jayflux · on June 12, 2019

Firstly it might be more complicated than that, they may have a contract in place with intel where they need to stick with them for X amount of years in return for cheaper stock or better deals elsewhere (Apple and intel work together on other things). That’s hypothetical, but it’s certainly not as simple as “they should just switch”.

Secondly Apple might be waiting for their own chips to reach a point where they can be used in their laptops/desktops and jump on to that. It would be overkill to use ryzen as an interim.

fwip · on June 12, 2019

Thunderbolt.

unicornfinder · on June 12, 2019

Actually it's confirmed that some of the X570 motherboards will have TB3.

fwip · on June 12, 2019

Good to know, thanks!

ct520 · on June 12, 2019

WHAT?!? Thats awesome news

jackvalentine · on June 13, 2019

Some x470 boards support it to via an addin card and thunderbolt header. To get displayport passed through you need to run a cable from your graphics card output to the addin card input. It's not very tidy and doesn't work 100%.

I'm hoping someone eventually just does the needful and sticks a thunderbolt chipset on a PCIE4 graphics card and makes it work somehow.

Zekio · on June 12, 2019

Shouldn't be deal breaker anymore since thunderbolt became a part of USB

gigatexal · on June 12, 2019

That’s a lot of performance for 749 USD. Building a new workstation / gaming rig in about 18 months time so I will be spoiled for choice by then especially given the used market as these will be old hat by then.

bluedino · on June 12, 2019

Would rather see numbers on Cinebench, Video encoding, 7-zip...

kitchenkarma · on June 12, 2019

It looks like single core performance is still worse than i9 9900K. I wonder how this could look like when overclocked? Sadly my workflow prioritises fast core over multiple cores - audio production. This workflow cannot be made parallel as one plugin depends on the output of another. If plugins can't keep up with filling the buffer you get stuttering. Single core limits you how much processing you can have on a single audio track and multiple cores how many tracks of that processing you can get. It looks like I wouldn't be able to run my chain in realtime on this new AMD even if it had 100 cores.

lhl · on June 12, 2019

We'll need to wait a few more weeks for third party benchmarks, but one of the big deals is that Zen 2 has matched Intel's IPC even on single core performance now: https://www.anandtech.com/show/14525/amd-zen-2-microarchitec...

The numbers provided by AMD are supposedly benched before 1903 Windows scheduler updates (for CCX aware process threading, much faster clock ramping, etc) and without the latest Intel security mitigations, so it's possible that real world numbers might be even better: https://www.anandtech.com/show/14525/amd-zen-2-microarchitec...

Besides the massive L3 cache, Zen 2 now supports very fast RAM overclocking on part w/ Intel platforms (DDR4 3600 OOTB, air-cooled 4200+, and 5K+ on highend motherboards - a huge improvement considering how finicky Zen, and even Zen+ was) and also a huge FPU bump (including single-cycle AVX2) but I think for full details, again we'll be waiting either for July or later for AMD's Hot Chips presentation.

Every workload will be different, but considering AMD's node, efficiency, and security advantages, I wouldn't take it for granted anymore that Intel will have a lead even for single-core perf (especially once thermals come into play).

artimaeis · on June 12, 2019

The source mentions that this benchmark was of an early sample unit -- with a base clock of 3.9GHz and a boost clock of 4.29GHz. The final production unit is specified at base 3.5GHz and boost of 4.7GHz. I'd expect if it can sustain that boost clock with any longevity that it might come notably closer to the i9-9900k in performance.

prennert · on June 12, 2019

Why would it not be possible to use multiple cores? Even though the plugins depend on the output of the previous one, they could sit on different cores, passing their output on from core to core. Even though that would not be parallel, being distributed, it could be faster (in some cases it might not).

eropple · on June 12, 2019

> Why would it not be possible to use multiple cores?

Because the software doesn't do it (much; I've been told some applications do time-delayed mixing for stuff like delay) and the software is entrenched.

BubRoss · on June 12, 2019

I wince every time I see someone say something can't be made parallel, but this actually not the way to do it. You would want a chunk of samples to be dealt with on the same CPU as it goes through plugin transformations. This would give data locality.

Then other CPUs would be free to start the next chunk of samples. The amount of parallelism is going to depend on the buffer size and number of samples each plugin needs to operate.

reitzensteinm · on June 12, 2019

I don't think you can make a blanket statement here; it's going to really depend on the implementation details.

For example, if each plugin includes any kind of LUT, you don't have data locality either way, and you're much better off passing data between the plugins. If the plugins are complex, you'll be flushing your instruction cache, which will have to be refilled via random access as opposed to the linear reading of an audio segment.

Further, 192khz 24bit audio is only 0.5 megabytes per second. Skylake lists sustained L3 bandwidth as 18 bytes/cycle. This is enough to transfer 100k such audio streams simultaneously. It's very unlikely this is a bottleneck.

BubRoss · on June 13, 2019

There are a lot of assumptions and some misunderstanding here. The data locality is about latency first and foremost. DDR3 at it slowest actually has 30GB /s of bandwidth and DDR4 can get past 70. Memory bandwidth is rarely the issue.

Also instructions shouldn't be huge, but more importantly they don't change. If the audio buffer stays on the same CPU, it doesn't change either.

Don't forget that writing takes time too. Writing can be a big bottleneck. Keep the data local to the same CPU and it doesn't have to go out to main memory yet.

Other things you are saying about 'flushing' the instruction cache, L3 bandwidth numbers and theoretical LUT that make a difference in one scenario and not the other without measuring (even though the whole scenario is made up) just seem like stabs in the dark to argue about vague what-ifs.

reitzensteinm · on June 13, 2019

Skylake-X L3 latency is ~20ns. So if you build an SPSC queue between them, how many plugins are we chaining up linearly that this becomes an issue, or even a factor? 1000 might get us to 1ms?

OK, so we're left with a single core running a thousand plugins, and instruction cache pressure is a 'stab in the dark to argue about vague what-ifs'?

You take an absolutist view on what is so obviously a complicated trade off and talk down to me to boot. Maybe I know about high performance code, maybe I don't, maybe you do, maybe you don't. But I do know enough about talking to people on the internet to know to nip this conversation in the bud.

BubRoss · on June 13, 2019

> Skylake-X L3 latency is ~20ns. So if you build an SPSC queue between them

The latency is mostly about initial cache misses. There is no reason to take the time to write out a buffer of samples to memory, only to have another CPU access them with a cache miss. One of many things things you are missing here is prefetching. Instructions will be heavily prefetched as will samples when accessesed in any sort of linear fashion.

Also you can't explicit use caches or send data between them, that is going to be up to the CPU, and it will use the whole cache heirarchy.

> You take an absolutist view

Everything dealing with performance needs to be measured, but I have a good idea of how things work so I know what to prioritize and try first. Architecture is really the key to these things and in my replies I've illustrated why.

> Maybe I know about high performance code, maybe I don't

It sounds like you have read enough, but haven't necessarily gone through lots of optimizations and recitified what you know with the results of profiling. Understanding modern CPUs is good for understanding why results happen, but less so for estimating exactly what the results will be when going in blind.

> maybe you do, maybe you don't

I've got a decent handle on it at this point.

reitzensteinm · on June 13, 2019

If you were as good as you claim, you would have directly answered my argument instead of hitting a strawman for five paragraphs.

Your experience led to overconfidence and you identified a ridiculous bottleneck for the problem domain. This is complicated and FPU heavy code running on few pieces of tiny data. And yes, riddled with LUTs. The latency cost you're worried about is in the noise.

Instead of doing some back of the envelope calculations and realizing your mistake, you double down, handwave and smugly attack me.

Your conclusions are bullshit, as is your evaluation of my experience. For anyone else that happens to be reading, I suggest taking a look through the source of a few plugins and judging for yourself.

https://vcvrack.com/plugins.html

BubRoss · on June 13, 2019

There is no need to be upset, there is no real finality here, everything has to be measured.

That being said the LUTs would follow the same pattern as execution - all threads would use them and if they are a part of the executable they don't change. This combined with prefetching and out of order instructions means that their latency is likely to be hidden by the cache.

New data coming through however would be transformed, creating more new data. While the instructions and LUTs aren't changing the new data being created on each transformation can either be kept locally so it doesn't incur the same write back penalties and cache misses by due to allocating new memory, writing to it and eventually getting it to another CPU.

If the same CPU is working on the same memory buffer there is no need to try to allocate them for every filter or manage lifetimes and ownership of various buffers.

reitzensteinm · on June 14, 2019

If you took time to read the code linked, you'd notice two things:

1) It's very common for the processing of samples to not be independent, but have iterative state; for example delay effects, amplifiers, noise gates...

2) The work done per sample is substantial with nested loops, trig functions and hard to vectorize patterns

So not only does your technique break the model of the problem domain, the L3 latency you're so worried about when retrieving a block of samples is comparable to a single call to sin, which in some cases we're doing multiple times per sample.

Now you conflate passing data between threads with memory allocation, as though SPSC ring buffers aren't a trivial building block. This is after lecturing me on my many "misunderstandings"... if you're willing to assume I'm advocating malloc in the critical path (!?), no wonder you're finding so many.

I'm not upset, I'm just being blunt. Ditch the cockiness, or at least reserve it for when your arguments are bulletproof.

BubRoss · on June 14, 2019

> L3 latency you're so worried about

I'm not sure where this is coming from. If one cpu is generating new data and another CPU is picking it up, it's wasting locality. If lots of new data is generated it might get to other CPUs though shared cache or memory, but either way it isn't necessary.

Data accessed linearly is prefetched and latency is eventually hidden. This, combined with the fact that instructions aren't changing and are usually tiny in comparison, is why instruction locality is not the primary problem to solve.

The difference it makes it up to measurement, but trying to pin one filter per core is a simplistic and naive answer. It implies that concurrency is dependent on how many different transformations exist, when the reality is that the number of cores.that can be utilized will come down to the number of groups of data that can be dealt with without dependencies.

> SPSC ring buffers

That's a form of memory allocation. When you fabricate something to argue against, that's called a straw man fallacy.

reitzensteinm · on June 14, 2019

Are you claiming the act of writing bytes to a ring buffer as being memory allocation? In that case I misunderstood what you were saying and it was indeed a straw man.

In any case, we're clearly not going to find common ground here.

kitchenkarma · on June 12, 2019

These cannot run at the same time as the output of one feeds into another one. Data travelling from one core to another could mean additional performance loss. Some plugins use multiple cores if whatever they calculate can be parallelised, but still the quicker it can be done the more plugins you can run in your chain.

saltcured · on June 12, 2019

This is silly. A bottleneck for audio processing is a particular product's flaw, not an intrinsic challenge of audio. A modern machine capable of doing interactive, high-resolution graphics rendering or high-definition movie rendering can do a stupendous amount of audio processing without even trying.

The data rates for real-time audio are so much smaller than modern memory system capabilities that we can almost ignore them. A 192 kHz, 24-bit, 6-channel audio program is less than 3 MB/s, thousands of times slower than a modern workstation CPU and memory system can muster.

The stack of audio filters you describe are a natural fit for pipelined software architectures, and such architectures are trivially mapped to pipelined parallel processing models. Whatever buffer granularity one might make in a single-threaded, synchronous audio API to relay data through a sequence of filter functions can be distributed into an asynchronous pipeline, with workers on separate cores looping over a stream of input sample buffers. It just takes an SMP-style queue abstraction to handle the buffer relay between the workers, while each can invoke a typical synchronous function. Also, because these sorts of filters usually have a very consistent cost regardless of the input signal, they could be benchmarked on a given machine to plan an efficient allocation of pipeline stages to CPU cores (or to predict that the pipeline is too expensive for the given machine).

Finally, audio was a domain motivating DSPs and SIMD processing long before graphics. An awful lot of audio effects ought to be easily written for a high performance SIMD processing platform, just like custom shaders in a modern video game are mapped to GPUs by the graphics driver.

mntmoss · on June 12, 2019

I don't think you're wrong in a technical sense, but the human factors in a contemporary DAW environment are imposing a huge penalty on what's possible.

The biggest issue is that we're using plugins written by third parties to a few common standards. Even when the plugins themselves are not trying to make use of a multicore environment, you still get compatibility bugs and various taxes on re-encoding input and output streams to the desired bit depth and sample rate. It can really throw a wrench into optimizing at the DAW level because you can't just go in and fix the plugins to do the right thing.

Then add in the widely varying quality of the plugin developers, from "has hand-tuned efficient inner loops for different instruction set capabilities" to "left in denormal number processing, so the CPU dies when the signal gets quiet." Occasionally someone tries to do a GPU-based setup, only to be disappointed by memory latency becoming the bottleneck on overall latency(needless to say, latency is really prioritized over throughput in real-time audio).

Finally, the skillsets of the developers tend to be math-heavy in the first place: the product they're making is often something like a very accurate simulation of an analog oscillator or filter model, which takes tons of iterations per sample. Or something that is flinging around FFTs for an effect like autotune. They are giving the market what it wants, which is something that is slightly higher quality and probably dozens or hundreds of times more resource-hungry to process one channel.

If all you're doing is mixing and simple digital filters, you're in a great place: you can probably do hundreds of those. But we've managed to invent our way into new bottlenecks. And at the base of it, it's really that the tooling is wrong and we do need a DSP-centric environment like you suggest. (SOUL is a good candidate for going in this direction.)

kitchenkarma · on June 12, 2019

This is a simple fact of life and downvoting isn't going to change it. Plugin cannot start processing before it gets data from previous plugin (sure it can do some tricks like pre-computing coefficients for filters etc). How are you going to get around it? What's happening within a plugin of course can be parallelised, but other than that, the processing is inherently serial. If a computing a filter takes X time and a length of the buffer is Y you can only compute so many filters (Y/X) before it starts stuttering. You can spread that across different cores, but these filters cannot be processed at the same time, because each needs the output of the previous one.

saltcured · on June 12, 2019

Pipelining means that each stage further down the pipeline is processing an "earlier" time window than the previous stage. They don't run concurrently to speed up one buffer, but they run concurrently to sustain the throughput while having more active filters.

For N stages, instead of having each filter run at 1/N duty cycle, waiting for their turn to run, they can all remain mostly active. As soon as they are done with one buffer, the next one from the previous pipeline stage is likely to be waiting for them. This can actually lower total latency and avoid dropouts because the next buffer can begin processing in the first stage as soon as the previous buffer has been released to the second stage.

kitchenkarma · on June 12, 2019

I think this is one of the most misunderstood problem these days. Your idea could work if the process wasn't real-time. In real-time audio production scenario you cannot predict what event is going to happen so you cannot simply just process next buffer, because you won't know in advance what is needed to be processed. At the moment these pipelines are as advanced as they can be and there is simply no way around being able to process X filters in Y amount of time to work in real-time. If you think you have an idea that could work, you could solve one of the biggest problems music producers face that is not yet solved.

saltcured · on June 13, 2019

Something like a filter chain for an audio stream is truly the textbook candidate for pipelined concurrency. Conceptually, there are no events or conditional branching. Just a methodical iteration over input samples, in order, producing output samples also in order.

Whatever you can calculate sequentially like:

  while True:
     buf0 = input.recv()
     buf1 = filter1(buf0)
     buf2 = filter2(buf1)
     buf3 = filter3(buf2)
     output.send(buf3)

can instead be written as a set of concurrent worker loops.

Each worker is dedicated to running a specific filter function, so its internal state remains local to that one worker. Only the intermediate sample buffers get relayed between the workers, usually via a low-latency asynchronous queue or similar data structure. If a particular filter function is a little slow, the next stage will simply block on its input receive step until the slow stage can perform the send.

(Edited to try to fix pseudo code block)

kitchenkarma · on June 13, 2019

This is how it is typically being done. This is not a problem. Problem is that being concurrent, end to end this process is serial, so you can't process any element of this pipeline in parallel. You can run only so many of those until you run out of time to fill the buffer. I think it could be helpful for you to watch this video: https://www.youtube.com/watch?v=cN_DpYBzKso

saltcured · on June 15, 2019

Sorry for the late reply. We have to consider two kinds of latency separately.

A completely sequential process would have a full end-to-end pipeline delay between each audio frame. The first stage cannot start processing a frame until the last stage has finished processing the previous frame. In a real-time system, this turns into a severe throughput limit, as you start to have input/output overflow/underflow. The pipeline throughput is the reciprocal of the end-to-end frame delay.

But, concurrent execution of the pipeline on multiple CPU cores means that you can have many frames in flight at once. The total end-to-end delay is still the sum of the per-stage delays, but the inter-frame delay can be minimized. As soon as a stage has completed one frame, it can start work on the next in the sequence. In such a pipeline, the throughput is the reciprocal of the inter-frame delay for the slowest stage rather than of the total end-to-end delay. The real-time system can scale the number of pipeline stages with the number of CPU cores without encountering input/output overflow/underflow.

Because frame drops were mentioned early on in this discussion, I (and probably others who responded) assumed we were talking about this pipeline throughput issue. But, if your real-time application requires feedback of the results back into a live process, i.e. mixing the audio stream back into the listening environment for performers or audience, then I understand you also have a concern about end-to-end latency and not just buffer throughput.

One approach is to reduce the frame size, so that each frame processes more quickly at each stage. Practically speaking, each frame will be a little less efficient as there is more control-flow overhead to dispatch it. But, you can exploit the concurrent pipeline execution to absorb this added overhead. The smaller frames will get through the pipeline quickly, and the total pipeline throughput will still be high. Of course, there will be some practical limit to how small a frame gets before you no longer see an improvement.

Things like SIMD optimization are also a good way to increase the speed of an individual stage. Many signal-processing algorithms can use vectorized math for a frame of sequential samples, to increase the number of samples processed per cycle and to optimize the memory access patterns too. These modern cores keep increasing their SIMD widths and effective ops/cycle even when their regular clock rate isn't much higher. This is a lot of power left on the table if you do not write SIMD code.

And, as others have mentioned in the discussion, if your filters do not involve cross-channel effects, you can parallelize the pipelines for different channels. This also reduces the size of each frame and hence its processing cost, so the end-to-end delay drops while the throughput remains high with different channels being processed in truly parallel fashion.

Even a GPU-based solution could help. What is needed here is a software architecture where you run the entire pipeline on the GPU to take advantage of the very high speed RAM and cache zones within the GPU. You only transfer input from host to GPU and final results back from GPU to host. You will use only a very small subset of the GPU's processing units, compared to a graphics workload, but you can benefit from very fast buffers for managing filter state as well as the same kind of SIMD primitives to rip through a frame of samples. I realize that this would be difficult for a multi-vendor product with third-party plugins, etc.

labawi · on June 12, 2019

Assuming your samples are of duration T, and you need X CPU time to fully process a sample through all filters. Pipelining allows you to process audio with X > T, nearly X = N * T for N cores, but your latency is still going to be X.

If it is possible to process with small samples (T), with roughly correspondingly small processing time (X), there shouldn't be a problem keeping the latency small with pipelining. If filters depend on future data (lookahead), it is plausible reducing T might not be possible. Otherwise, it should be mostly a problem of weak software design and lots of legacy software and platforms.