Hacker News new | more | comments | ask | show | jobs | submit login
AMD Reveals Threadripper 2: Up to 32 Cores, 250W, X399 Refresh (anandtech.com)
340 points by gnufied 8 months ago | hide | past | web | favorite | 212 comments

Just put together a first gen Threadripper build for work a few months ago where out-of-the-box ECC support was too good to pass up for a box with >= 64GiB DRAM. If it’s possible to do a drop-in upgrade (cooling solution not withstanding) then this will be a big deal as we have basically not seen backwards compatibility, especially with such a compelling upgrade, in this part of the market in over a decade from Intel.

But 16 to 32 cores is a leap (250W TDP!!!!!), so I wonder what cooling solutions will remain viable (e.g., what about the TR4 edition Noctua cooler made just for TR?)

It's not actually that hard. AMD didn't cheap out on the integrate heat spreader (like Intel did) and it's HUGE. Commodity waterclocks on newegg already handle "500+ watts". I'm sure there will be a wide variety of solutions for 250 watts by the time they ship. It's really not that much power, many cases these days are designed for 3 or more 300 watt GPUs.

Just for example, my first gen TR has this guy on it[0], which is already spec'd for 300W.

I would think power delivery on existing motherboards only designed for 180W would be a bigger problem. But again, any board targeted at overclockers should already be prepared to handle at least 250W. The older boards will mostly just not support as much overclocking of the new, larger part.

[0]: https://www.arctic.ac/us_en/liquid-freezer-360.html

I hate to be the bearer of bad news, but that cooler is smaller than the Threadripper socket.


There are "proper" TR4 coolers, such as the Noctua TR4-specific air cooler which actually covers the entire socket, for a maximum connection between the radiator and socket.


Pictured there is the Enermax TR4-specific liquid-cooler. There is another one on the market somewhere, I forgot the name though.

Anyway, the bigger heat-plate makes a big difference. People have measured somewhere on the order of ~10C to ~20C cooler.

> I hate to be the bearer of bad news, but that cooler is smaller than the Threadripper socket.

Yes, I am aware of that (it's not news to me). The cooler is smaller than the integrated heat spreader (the visible metal cover on the top of the CPU), but it does fully cover all four (two active) CCX dies and conducts plenty of heat away.

I know you can get TR4-shaped coolers today, and they perform marginally better, but they were not available at the time. With all cores busy, this thing keeps the CPU at 50°C or less in ~25°C ambient (stock clocks/boost). That is totally sufficient — the CPU is rated to 68°C (and oddly, the Ryzen 1800X is rated to 95°C, so it's unclear why the TR isn't rated up to 95°C as well).

> Anyway, the bigger heat-plate makes a big difference. People have measured somewhere on the order of ~10C to ~20C cooler.

Yes, absolutely it makes a difference; something like ~10°C, at least under fairly common conditions, from my reading. 20°C probably indicates an incorrectly installed cooler.

> Yes, I am aware of that (it's not news to me). The cooler is smaller than the integrated heat spreader (the visible metal cover on the top of the CPU), but it does fully cover all four (two active) CCX dies and conducts plenty of heat away.


Technically covers, but you can see that the screw-holes of the cooler go right over the dies. So you've got some air-pockets over the most-important parts to cool.

> Yes, absolutely it makes a difference; something like ~10°C, at least under fairly common conditions, from my reading. 20°C probably indicates an incorrectly installed cooler.

I was looking at overclocked numbers at the ~20C metrics btw. 10C was closer to the "normal use" situation. Overclocking adds a lot of heat, and benefits from a cooler with a larger connection.

> Technically covers, but you can see that the screw-holes of the cooler go right over the dies. So you've got some air-pockets over the most-important parts to cool.

Sorry, I don't see any screw holes overlapping dies in that image; can you point them out?

Also, won't screw holes generally have screws in them, which are both conductive and not air?


I added a few arrows to make it more obvious.

Here's the article these pics are from: https://www.gamersnexus.net/news-pc/3008-threadripper-cooler...

> Also, won't screw holes generally have screws in them, which are both conductive and not air?


Screws have heads, and those heads have divots and are simply not a perfectly-flat surface.

I mean, nothing is perfectly flat. That's why we use thermal compound / thermal paste to connect things together. (Thermal compound is worse than copper at thermal exchange, but its way, way WAY better than air-gaps). So you use thermal-paste to fill in any air-gaps.

If you are going to use a "regular size" cooler, you probably should put thermal paste in all of those screw-heads, for a slightly better connection.

With a good cooler you might be limited by the current handling of the socket pins. I'm interested in what solutions will be found to circumvent that issue.

I think the solution AMD found to circumvent that issue was using 4094 pins.

I looked, and found the limit to be a little over 300W, based on research into LGA and AMD's publications pertaining TR4.

Cooling Threadripper has been my issue with adopting it as well. The only pre-assembled cooler I've seen is an Enermax AIO block. Otherwise, you have to loop it on your own.

Do you mean for liquid-cooled solutions only? We've been using this Noctua U14S TR4-SP3 (https://noctua.at/en/nh-u14s-tr4-sp3) on a 1950X (no overclock though) without issues.

Oh man, I love that fan. But I misread the height specs on it and I couldn’t fit everything into my case any longer, but my old fan had died. I Ordered a replacement on amazon and the sent out the wrong item twice, so for about 3 weeks I just ran it open and yelled at the cats if they came in my office. Amazon finally gave up on trying to send the right case, and just told me to by it somewhere else, and eventually gave me a refund. I wanted to not take any chances, and the cats were getting terrified of me, so I ordered a ThermalTake core X9. The thing is insane. You could reasonably fit two motherboards and 8 hard drives in it. Don’t try to fit it on a desk. It could almost be a desk. But it is all worth it because I love that fan so much.

I did not know this was a thing. Typical Noctua coming to the rescue!

How do the thermals compare with an AIO cooler? I've heard that their coolers come close to or exceed AIO cooling (assuming a standard Asetek Gen 5) on the usual square sized heat spreaders. Does Threadripper's more rectangular IHS impact temps at all with this cooler?

EDIT: Just saw the bit about not having an overclock. Has anyone tried overclocking on this Noctua cooler? What were the results when compared to the AIO solutions?

Take a look at this https://i.imgur.com/jT5iXWP.jpg New cooler in the box of TR2

https://imgur.com/gallery/08xsWbu What is the wraith ripper?

Noctua's U14-TR4/SP3 should work fine, easy peasy. And there are a ton of single piece AIO devices. No need for a custom loop.

Starting with Threadripper 2, AMD will (optionally) be shipping a capable air cooler derived from their current lineup of air coolers (Wraith coolers). Faithful to its namesake, it will be called "Wraith Ripper."

Power delivery by a motherboard might be a problem, too.

Definitely. On the other hand, many early TR boards were "gamer" boards and advertised overclocking ability. So they may have some power overhead to play with. Less overclocking ability with the old boards and the new 250W parts, of course.

AMD has reported that in actuality this chip should rarely hit 250W, essentially only if overclocked. As with the Ryzen 2 lineup, overclocking can actually hurt performance.

But you are correct that some of the more budget-oriented (and I use that term loosely) TR4 boards will struggle with power delivery. Worst case, you will shorten the lifespan of your motherboard and increase the risk of capacitor failure (eg: shorting, popping, etc).

I'm somewhat curious about what kind of cooling system Intel was using the other day with their 28 cores at 5 GHz demo/stunt. Some people isolated a frame of the streamed video where it shows the (insulated?) pipes that exit from the case and go behind the desk.

Edit: grammar

Water cooling with a chiller - water was close to freezing temperature.

Most popular way is to use arctic fish aquarium chiller

Edit: someone snapped a photo of it: https://i.imgur.com/dRINhkW.jpg and people identified it as http://www.hailea.com/e-hailea/product1/HC-1000A.htm

Anandtech also has an article on that.

It's fine for showing off of far the chip can go, but nothing someone would have under the desk.


In addition to the overwhelmingly slap-dash appearance, its power draw was reportedly through the roof (north of 750W). With that chip, a fully-loaded workstation's power draw would exceed the wattage of most North American wall outlets.

Suffice to say, Intel's 5GHz 28-core chip is nowhere near production-ready and their demo is more than likely a heavily overclocked top-binned Xeon chip. In other words, Intel's new offerings are looking like less than stellar vaporware.

Threadripper 1 still beats the pants off of Intel's HEDT offerings in terms of price (Intels' costs quickly exceed the $5K mark), Threadripper 2 twists knife so much that its a gaping wound in Intel's side. The Epyc 2 lineup will similarly affect Intel's enterprise/data-center offerings.

I was pretty sure it was mostly for show and not a feasible product but nonetheless I was guessing around 500W. Without counting the chiller running under the desk.

People have been saying some form of subambient cooling, phase change or similar.

250W should be no issue per se, even 500W for 32 is not a lot at any rate and would not be hard for a 12 phase VRM to keep it stable. It may require an extra fan

Don't forget the LGA pins.

LGAxxxx is an Intel socket type, not AMD. If you look at the pinout of LGAs most pins are power related

Extrapolating from Ryzen 1800X and a 16-core first-gen Threadripper, you'd overload the LGA pins at about 50% (300W) of the potentially possible ~600W you could make the chip take without sub-ambient cooling technology (and expect it to live for more than half a year at that level), i.e. with a forced-flow nucleate boiling cooler or a powerful waterblock. I once did some calculations for this, trying to find out if it could be worthwhile to wait for the 64-core EPYC and clock that one to like 3.5~3.8GHz, possibly with a governor that throttles the clock (and voltage) if the system is under low load. But assuming from some numbers I could find/calculate for how much less power they take, the predictions were so pessimistic that I did not pursue it further. (I did dream about a high-performance machine that could live in a case you could carry as a backpack, even though you'd probably not want to carry it very far due to the coolant being heavier than air.)

Threadreaper's TR4 socket is LGA socket with 4094 contacts.

What does this mean compared to say a 2011 i7?

I'm frankly amazed my first post-college desktop is soldiering on as a high-end gaming system with gpu, ssd and dram updates

Generally speaking gaming is single thread intensive on the CPU side, and all parallel task are offloaded to the GPU.

This is really more of a workstation or server cpu than a gaming cpu.

Edit: Tons of comments are pointing out exceptions. I was hoping to cover that with "generally speaking".

As a game dev, I agree with you. Also: all the casual observers that say "just multithread/core it! would make it so much faster!" and don't understand one bit about parallelisation problems, as if it's just a bit you flip.

To be fair of the criticism, game engines are way behind the stage of the art in concurrency and parallelism. It’s not as easy a flipping a switch, but it shouldn’t be as dire as it is either.

Actually, I think game endings define the art of concurrency. GPUs have hundreds of threads, and the SIMD paradigm is more powerful than the more flexible MIMD one used by most multithreaded server frameworks.

I think the big problem is that games are hyper optimized for single threaded performance. ECS systems are basically big arrays of mutable data manipulated by multiple systems. To do threading well, you'd pretty much have to dump that.

Not to say they aren't behind. I once saw a game talk about how awesome they are because they reinvented a thread pool. But their state of the art is in direct conflict with good threading practices.

Unfortunately, we haven't seen the paradigm shift in game engine development that would be necessary to really get away from hyper optimized single threaded performance. Until someone figures out a way to efficiently utilize (highly) parallel execution in a deterministic game engine that has to run at at least 60fps, there isn't an incentive to move beyond the tried and true ways of doing things. At least, I don't think we have, I'm not in AAA and only keep a tangential pulse on that level of gamedev.

That said, I'd be really interested to see how some of these Battle Royale engines tick (no pun intended). It seems like there's a ton of potential for "clustering" execution, essentially treating separate sections of maps as separate games, but considering it could come down to 100 people all being within the same 100 square meters, you can't necessarily rely on that model.

> Until someone figures out a way to efficiently utilize (highly) parallel execution in a deterministic game engine that has to run at at least 60fps, there isn't an incentive to move beyond the tried and true ways of doing things.

I was doing that 10 years ago... in a radar signal processor. I could do the same thing in a game engine, but the games industry seems to be allergic to devs who do not come up in the games industry. I suspect that has as much to do with the poor state of parallelism as anything.

They're optimized that way for a reason.

It'll be a number of years yet for large-scale multithreading to penetrate the consumer market.

92% of users still have 2 or 4 physical CPUs, so 2-8 threads.


There is a big difference between 2core/2thread and 4core/8thread. It would be interesting to see more granular data.

Is the difference of physical cores vs modern SMT that big for gaming?

I'd assume given asset size they're already dealing with a large amount of cache misses, and if you're stalling for a memory read anyway then it seems less impactful than on a high-thoroughput tuned scientific workload.

Game studios are already taking advantage of all the cores they can get.

This guys go all the way up to 6 hardware ones, depending on availability.


AAA studios, within the last couple years, yeah. But it was certainly the general trend until quite recently, and lower budget games may still be very single threaded on the CPU.

Also, many games that do utilize multiple CPU cores do not scale well beyond about 4-6 cores (which makes a lot of sense given the CPUs most gamers have and the priorities of the studios).

Even most indie titles use 2-4 cores now. Buying a threadripper is overkill, but a hexcore won't be wasted on games. Some rare AAA games will occasionally even use 8 cores.

An easy way to see this is to underclock your CPU as far as it will go, and see how many percent CPU your favorite game will use.

Fun fact: A friend has a first-gen Ryzen with 8 cores (16 hyperthreads), and wondered why Civilization V was always crashing on his system. Turns out that Civ V runs into weird race conditions with that many threads. When he pinned the game to 8 or less hyperthreads, it started working properly.

Writing code on top of legacy single threaded frameworks / content interfaces, under an insane holiday release deadline, while trying to enable multithreading... is pretty much my definition of hell.

I feel bad for the poor programmers who got that job.

Doubtful. Nobody's reporting issues with 5960Xs, and if it were really a race condition then you would occasionally get crashes even in 4- and 6-core processors.

Civ5 is one of the games that can trigger Ryzen segfaults, guessing he has a faulty chip. He should run kill-ryzen and see if his chip is stable.

Note that contrary to popular belief, the segfault bug does not "only" affect Linux or compiling workloads (it's a litho fault - these workloads just happen to trigger it) and there is no "safe" date range of first-gen Ryzen chips. Newer chips are more likely to work properly, but a smaller number are still faulty - even some RMA chips, since AMD stopped testing them after a few months.

I went through the AMD support thread and pulled out some examples: https://news.ycombinator.com/item?id=17074355

Looking through the latest posts in the AMD community thread, at least one user is reporting this even on a second-hand processor, although that's not really enough to make a definitive statement.

"Gaming = single threaded" really hasn't been true for a while now, at least in the arena of AAA games. My gaming desktop has a Core i7-4790k (4 core/8 thread), and I can remember several games offhand that were loading all 8 HT cores at 75+%.

No, most games are single threaded, even AAA ones. You might get this impression from a vocal minority though, "vocal" in the sense that they're the ones landing on your radar, while the technical details of other games do not.

(you could argue for ps3 of course, But there it's not a choice)

I don't know enough about games but if that's true, the PlayStation 4 and the Xbox One have 6 to 7 cores just idling most of the time? Why did Sony and Microsoft put 8 core CPUs in their consoles instead fewer cores with higher clockspeed if that's the case?

they use extra cores for live video recording and fun like that, also console games are usually run in a vm on the console these days so it's giving the OS some spare space too (contact lists/chat, achievements, etc.). Also it gives games the possibility to run on more cores if you're AAA and want to leverage them.

if you are disagreeing with my broad-ish statements about two very different platforms please explain how I am wrong instead of just downvoting.

You are wrong because games released in the past few years are heavily multithreaded. The only game recently that I've seen not be multithreaded is Battletech. The latest Total War series are heavily multithreaded. As far as the cores go on consoles, game developers try to use all of them all the time, and in general there's only one core that is half dedicated to doing OS / interaction / video recording tasks.

I checked a few sources and you are right. turns out I was in a bubble there.

This is not true. If nothing else a lot of the work done by a game engine pipeline is actually handled by the OS or device driver (GPU, audio, and network) and scheduled in a different thread. So even a “single threaded” game would benefit from 2-4 cores.

confirming that there's definitely a trend towards separate threads for audio, but the rest, especially networking, depends strongly on the engine

No... it’s made concurrent by the OS anyway. If you tell OpenGL or DirectX or Vulkan or whatever to render a vertex buffer, what actually happens is it is added to a queue and returns immediately. An OS-provided thread asynchronously processes that queue and actually spends a fair amount of resources, on a different hardware core, scheduling the hardware to do what you asked. Same for audio and network. This is regardless of what concurrency the game is written to exploit.

Great, the 2011 i7 has more than one core.

it's still true, the sweet spot just moved to 2 cores instead of 1

This is changing (very slowly), more and more games are scaling to higher thread counts (and having more capabilities as a result). Ashes of the Singularity is a good example of a recent game that actually benefits from higher end high core count CPUs.

I imagine many genres will never be heavy CPU users though, like low player count competitive FPS and MOBA.

I've got an i7 from a similar time period (4700 maybe? something like that). It's been in a closet for a couple years cause I haven't had the time for it, but I'm considering upgrading to some more current components. Sorry, this isn't really the place to ask, but what sort of GPU and ram are you running?

The problem with GPU right now is that there is great uncertainty when the new Nvidia GPUs are coming out... the CEO of Nvidia said "not for a long time," but that doesn't have to mean anything.

GPU and Ram prices are sadly also very high at the moment... probably the worst time to build a new high end PC.

yeah for sure. I mean GPU/RAM prices have been trending up for 18 months, and don't show any sign of slacking, so it is what it is. I think I'll just have to bite the bullet and upgrade from the geforce 5xx or whatever it has right now. It sounds like Radeons are the more OSS-friendly option right now anyway? (I don't game on Linux, but "vote with your wallet" etc)

AMD is the clear winner on Linux simply because the official drivers are in the mainline kernel. No more dealing with NVIDIA binary blobs, and DKMS + the breakage caused when NVIDIAs shim is out of step with the particular kernel you want to run. NVIDIA wins perf/price though, and availability depending on where you are. :(

I love the idea of opens source drivers in the mainline kernel. But from what I can tell linux + nvidia blob is still the more featureful, stable, and performant 3d stack for linux. I can stay logged in for weeks, watch bluray (with low CPU utilization), games, tinker with WebGL, and open a zillion terminals/tabs.

AMD is getting better, but doesn't seem quit there yet. Steam and phoronix still seem to generally agree nvidia is still the #1 choice on linux. #2 (stability wise) seems to be Intel's GPU. AMD's working on it, but decent performance often requires a custom build stack not supported on any current Linux distro. I'd rather not have to recompile my own kernel, mesa, and related just to get reasonable stable and decent performance.

> I love the idea of opens source drivers in the mainline kernel. But from what I can tell linux + nvidia blob is still the more featureful, stable, and performant 3d stack for linux

On my personal experience, I disagree.

> nvidia blob is still the more featureful

yet which still cannot run Wayland/Xwayland+XGL. And you cannot do anything about it.

I game on Linux with Nvidia (sadly, instead of training neural nets like I thought I'd do with the GPU).

There was a cryptocurrency bubble at the end of last year, and that was the primary driver of GPU prices. New GPUs are almost going for MSRP now.

I'm also at something similar; 2nd gen i7 (2600K) from 2012. I upgraded to Nvidia 1060, and have the original 16GB of DDR3. It's both amazing and disappointing how well the system is still managing.

I have the same hardware, and I'm sitting here just looking for excuses to upgrade (of all reasons, for a new heatsink I was given) and I just can't bring myself to place the order because I just don't need it.

As you say, these i7s are still handling the load quite nicely.

I5-2500k at 4.6ghz on big air that I bought when it came out was running great with a GTX 1080. Upgrading to a Ryzen 7 1800x@4ghz was a marginal perf game (and ~0 FPS gain in most games). The m2 SSD from a SATA SSD seems like it helped the most.

Still running a 2500K here too, paired with a GTX 1070. Absolutely no qualitative issues with any games I play. Interestingly though I do notice its age when programming - builds take noticeably longer when compared to my work laptop (XPS 13 w/ i7-7700). Could be due to the higher core count I guess (not sure what level of parallelism my builds exploit beyond the parallelism offered by Make).

Indeed, I'm still running an i5-2500K as my primary gaming machine (with a GTX 970) and upgrading is hard to justify. It doesn't even have USB3!

Slap a USB 3.0 card in the thing and keep running it. The state of the art hasn't advanced all that much since Sandy Bridge.

I've got a i5-4670k and I was all set to upgrade my processor this year, but then I looked at the numbers and realized it would barely help what I use it for (coding and gaming). Made me very sad, as I just couldn't justify the upgrade. So now I've got it running a 4.5 GHz overclock to see if it'll give me a reason, and to make up the 10% performance in the meantime.

TDP is way too high for a gaming system. Unless you like to sit in hot room and needing a gigantic power supply?

No game out right now is going to leverage all those cores, while gaming it's not going to be pushing 250W.

As far as heat, the Vega 64 GPU I've got draws way more power than that too. I don't think this product is really targeted towards gamers anyway, but I don't think they'd care much about the power draw.

I realise your general point is correct, but Football Manager 2018 is pretty mainstream and will eat as many cores as you throw at it.

You assume there’s that much computation to perform. If they’re able to achieve anywhere close to 100%, there’s a pretty extreme bug in the game.

I'm not sure how you see this as a bug - it runs a huge number of agent-based simulations to calculate match results, there are many matches to be played, and yes, it will use all your CPU power to do that. It doesn't do this 100% of the time, but the vast majority of computation in the game happens in a way that is very parallelisable.

If a football manager game maxes out a processor of this caliber, it's doing calculations for fun, not for function.

I don't see what your objection is to a game that is fundamentally a simulation, trying to simulate as much as possible as accurately as possible. You can sit down and watch every single one of the hundreds of games it simulates each day of game time for 90 minutes in real time, and the gameplay bears up pretty well to reality. You can also turn all this off and just run a single league, but I certainly appreciate the effort they've put into parallelising it. Obviously if you think games are fundamentally frivolous, then you'll think any amount of computational power sunk into them is also frivolous, but it seems an odd hill to die on.

Or its mining.

Gaming is irrelevant for this kind of CPU. Gaming is not the driving force for CPU in general. It is a niche market.

I first read this as "a soldered-on GPU" :O

I was just telling myself today that I wasn't going to build a new desktop for a while. Now I'm thinking current desktop might become the new VM server and I get a new gaming/dev box!

I would not count on this being great for gaming. Already the current threadripper is slower in games than a regular Ryzen processor or Intel i5/i7, slower base and turbo clock and added memory latency won't help with that.

It would still be great - just not as good as some alternatives.

But if you're doing development spinning up 32 cores is a very realistic proposition (compilers are good at parallelizing work, especially on big projects, spinning up dev cluster locally, etc.) - so you're getting a top productivity machine that can game - if you can afford it and are not interested in chasing gaming benchmarks then I don't see how it wouldn't be great.

Also virtualmachines!

It doesn't have to be great. It just has to be good enough.

Is it actually slower or is it just the good old Intel compiler at work again? Binaries compiled with it have great performance on Intel chips, but perform a vendor id check and slow down on non Intel chips. As long as Intel CPUs remain dominant game developers will continue to use it.

It really is slower. In games it is slower than AMD Ryzen processors, no compiler manipulation here. It is also to be expected with a lower clock and higher memory latency.

I got 1st gen TR w/ 16 cores, coming from 2013 4-core Haswell Xeon, and I don't regret it at all.

Kinda excited and can't wait for more details. Key detail from anandtech article:

"Threadripper this is still the case: the two now ‘active’ parts of the chip do not have direct memory access. This technically adds latency to the platform, however AMD is of the impression that for all but the most memory bound tasks, this should not be an issue"

Sounds pretty much exactly like an Epyc, but with half the memory channels disabled.

Seems kinda silly unless it's pretty cheap. After all the 7351P is $750, guarantees ECC, and has twice the memory bandwidth of the threadripper.

7351P is the 16-core Epyc part. The comparable Epyc part is the 32-core 7551P, at $2317 (Newegg).

There is definitely differentiation between the server line of chips and the TR "HEDT" chips, other than the crippled memory channels.

The TR4 socket sacrifices half of the SP3 memory channels to use for more power.

As a result, TR models come with (much) higher stock clocks: 3.0 GHz base -> 3.4 GHz all-core turbo vs Epyc 7551P 2.0 GHz base -> 2.55 GHz all core.

So all in all, TR may be better for your workflow than Epyc even at the same core count and price (if clock matters more to you than memory bandwidth). And keep in mind, server boards aren't free.

Not sure I get the value. Unless you like motherboards with LEDs hot glued everywhere.

To compare: Epyc 7351P, 2.9GHz, 16 core, 32 threads, 8 memory channels = $799 TR 1920X, 3.5GHz, 12 core, 24 threads, 4 memory channel = $761 TR 1950X, 3.4GHz, 16 core, 32 threads, 4 memory channel = $959

Sure the 1950X is 17% faster clocked, but with half the mem bandwidth and half the mem channels (half the performance for random lookups).

So generally the 1950X will be 17% faster with no cache misses, and 50% as fast with all cache misses.

Motherboard prices are pretty close, the Epyc, Dual GigE, 3x PCI-e x16, etc = $369. I found some TR boards cheaper, but not with ECC or 3x PCI-e x16.

Sure the future thread ripper 2 will be faster... but so will the Epyc 2. Not sure I see the point in the TR.

The 17% faster clock is the point. No one wants their computer to be 17% slower because sometimes they also want a lot of threads. In server terms it’s sort of like latency vs throughput.

> Not sure I get the value. Unless you like motherboards with LEDs hot glued everywhere.

Clock and single core performance absolutely matters to some customers. This is part of why Intel can continue to dominate the home and server market despite selling lower core count chips — they are clocked a little higher and have a little higher IPC.

(Also, I totally agree about the silly LEDs, but I also think board manufacturers wouldn't do it unless it produced a return, i.e., added value on average.)

> To compare: Epyc 7351P, 2.9GHz, 16 core, 32 threads, 8 memory channels = $799

> TR 1950X, 3.4GHz, 16 core, 32 threads, 4 memory channel = $959

7351P @2.9 GHz is the boost clock — base is 2.4.

1950x @3.4 is the base clock — boost is 4.0, on up to 4 cores (not all-core like the 7351P part).

> So generally the 1950X will be 17% faster with no cache misses, and 50% as fast with all cache misses.

Your 17% number is sort of wrong, or missing the point — yes, that's the clock difference for all-core workloads, but TR boost kicks in at 1-4 thread workloads.

4.0/2.9 is huge. That's 38% additional CPU on (very common) 1-4 thread workflows. That's worth a premium to some people.

The obvious question you might then ask me is, if your workflow is only 1-4 threads, why buy a 16 core CPU? Well, sometimes people have workflows that vary over time. Maybe some of the time you only need a few cores to manage interrupt traffic and keep the GPUs fed, and maybe other times you do want all of the cores to complete a parallelizable, CPU-bound task like compilation or h.265 compression faster.

If your workload is exclusively embarassingly parallel and you can keep all cores busy all the time, yes, a server platform like Epyc 7351P is a much better value for you than TR. Or if your workflow needs more than 80 GiB/s (TR1) or 95 GiB/s (TR2) memory throughput, a server platform like Epyc is probably a better value for you.

> Sure the future thread ripper 2 will be faster... but so will the Epyc 2. Not sure I see the point in the TR.

Epyc 2 will almost certainly not be clocked higher than TR2, due to the additional power draw the TR socket has vs SP3 — though, the gap may shrink.

You make a good case, agreed 4.0/2.9 is a noticeable difference and I can see how a 1-4 core boost (once CCX) would be useful.

I keep my TR (1950X) running at 4.0ghz 24/7...runs like a champ.

Neat. Just curious, if you don't mind — is it overvolted, and if so, by how much? And what's your cooling solution and how well does it do (offset from ambient)?

AMD is going to be pushing Epyc to 64 core with the new 7nm node early next year, they just pushed the changes to the TR lineup first.

Nice I guess, but I wish we could get back to working on getting FASTER cores rather than more of them. Some tasks just don't multi-thread.

The laws of physics are intruding. Clock rate doubling without prohibitive cost, power, and heat have gone. At 3 GHz electrons don't go very far in 1/3rd of a NS. Chips are getting physically bigger (look at the Epyc). Easy increases in IPC are gone. There's an embarrassing number of transistors available, but few uses for them actually increase performance on things consumers care about.

Even GPUs are starting to peter out with lengthening product cycles, smaller improvements, and the increasing tendency to just rename the previous generation with a new name, as if it was actually significantly improved.

So CPUs have peaked, most of the low hanging fruit has already been taken. Most easily shown by pretty much all the popular CPUs from the previous generation maxed out at 95 watts or so (desktop or server), but are at 180 watts in the current generation. This happened years ago for desktops, but is hitting laptops (last year), and smart phones (this year or so).

GPUs are about half way there, improvements are harder. No more annual updates with 2x the performance on most common use cases. Feature improvements are getting more obscure, gaming sites are using 16x zoom to show the improvements. GTX1170 is rumored to be 50% faster than the GTX1070... from 2 years ago.

Specific use ASICs for mining, AI, and similarly narrow use cases are still early in their development. Multiple vendors are showing 2x improvements with each generation, even crazy things like including said chips in consumer devices with tiny power budgets.

So basically a core twice as fast isn't happening, it's not because anyone is lazy. There's tons of room for improvement in specific use cases where today's silicon is crazy inefficient. Things like machine learning, software device radios, vision specific processing, and related are seeing rapid improvements.

I think it is not happening because it doesn't have to. Current CPUs are more than fine for home user and HEDT market is too small to excuse huge spending for research. I think once electron apps will become more popular home users will get power hungry again and that will kick off another cpu revolution.

It's not happening because it's hard to do. Extracting instruction level parallelism is difficult(which is how CPUs gain IPC) and then also the cores have to be wider with exponential complexity. ie. At the point you have to sacrifice 50% of CPU's power efficiency for 5% IPC gain, it's no longer worth it.

(Very good points, small doubt to pick)

> Chips are getting physically bigger

They are? Over the years, I got the notion that the cheap controller chips were about 30 mm2, the regular desktop CPUs about 100 mm2, and server class powerhouses around 300 mm2.

With a quick look, I found mention of Epyc being four 200 mm2 dies packed together: https://www.eetimes.com/author.asp?section_id=36&doc_id=1331...

Wafers have gotten bigger, https://en.wikipedia.org/wiki/Wafer_(electronics), but I suppose that manufacturers just want more dies per wafer, not larger dies.

Epyc chip (the package including the 4 ASICS) is pretty huge. It's a decent fraction of a mini-itx motherboard just for the socket. The good news is that while it takes twice the power (170-190 watts or so, 250 in the next generation) it's much easier to dissipate that much with such a large chip. The head density is as low or lower than previous generations.

AMD was pretty krafty with this generation. Intel has a huge R&D budget, has it's own fabs, and pushes many different ASICs split into a dizzying array of products. Intel can afford this because they have a MUCH higher volume of parts to amortize the costs over.

To compete at many different price/power/performance points AMD uses the exact same ASIC for desktop (1 ASICs with 2 memory channels), workstation (2 ASICs with 4 memory channels), and server (4 ASICS with 8 memory channels).

Now AMD can compete across a decent fraction of Intel's produces with a single ASIC and spend more R&D on the second generation without having to re-engineer an entire family of ASICs.

Compare any photos of a threadripper or epyc chip, they are easily visible even onstage, it's about as big as the palm of your hand. I believe it's 69.2 mm x 57.9 mm according to a spec I found on the TR4 socket.

Heterogenous computing is picking up the slack. More dedicated ASICs for things like vision processing and deep learning. Today's floating-point unit is tomorrow's on-board media processor.

> I wish we could get back to working on getting FASTER cores rather than more of them. Some tasks just don't multi-thread.

Out of curiosity, are you working on tasks that don't parallelize? There are some for sure, but maybe more that do parallelize, just not trivially or easily? It seems like some people have wanted faster CPUs rather than more parallelism for as long as I can remember... decades. I'm wondering out loud if your wish has ever not been true.

I suspect we're stuck with getting more cores until there's a new fundamental breakthrough in physics. We're hitting the wall with CPU speeds, which is why GPUs have been seeing all the growth recently.

So I just went looking and found a CS term for hard to parallelize I hadn't heard before: P-complete. https://en.wikipedia.org/wiki/P-complete Are you working on stuff related to or dependent on any of these examples?

I'm pretty sure OoO execution (which is how single threaded execution is improved in modern CPU cores) won't help with P- complete problems.

No not really. One of the biggest attractions of OoO is that it helps you find parallelism at the instruction level. So OoO as an architectural technique is one of the most helpful to problems hard to parallelize. The problem is there is a limited window of instructions where it can be applied. Trying to make one massive processor to find huge amounts of instruction level parallelism is just too hard.

Recently on here[0], Arm claimed for every 7% extra resources they put towards increasing the RoB they got 1% performance. Past a certain point it's not worth it. That was their justification for having half the window of rival Samsung chips.

[0] https://www.anandtech.com/show/12785/arm-cortex-a76-cpu-unve...

Yes, but if there's mathematically zero parallelism then OoO can't really help that much.

Audio production. It is difficult if not impossible to parallelise one channel workload as each plugin depends on results of previous one.

If the plugins are chained together in a pipeline, that's actually trivial to parallelize: Each plugin runs on its own core/hyperthread, with small ring buffers between them to pass data along the pipeline.

But to process one plugin it still takes x amount of time and once cumulative x is longer than time of the audio buffer you'll get drop outs. That's why single core performance is the most important factor.

Ah, you mean realtime? Yeah, then your concerns are right.

What kind of workloads require more than ten thousand instructions per sample?

Splice type of synthesizer emulations, variants of convolution, physical modelling, filtering, oversampling and so on...

Synth emulation sounds like it can take a fractional-millisecond hit to run buffered on a different core. "variants of convolution" is hopelessly vague. Physical modeling can't be parallelized? Filtering I'm not an expert on but that's exactly where I'd expect a few instructions to go very far. Oversampling I already took into account when saying "ten thousand".

This is not to say that your problems aren't real. But I'm skeptical a concerted effort was ever made to go multi-core here.

What's an acceptable level of total latency, so I can understand better?

There are plugins that run their processing in parallel, but still they depend on results from the previous one in the chain. So there is only x number of plugins you can run in a channel, regardless how they process samples internally.

Acceptable latency would be between 2 to 25ms, but typically it is 5ms to work comfortable. My projects stutter on 25ms latency on Ryzen 1800x.

I completely blame the OS and/or software here, not the chip.

Let me put it this way:

Take a sample rate of 200kHz.

Take a pipeline with 12 steps that each require 2-3GHz.

These cores can easily synchronize to within a fraction of a microsecond, so round up to 1us.

Assume each plugin needs 10 samples available, and that the input-to-output latency might get as high as 20 samples when it's running on a completely dedicated core.

So put a buffer between each core of 20 samples, plus an extra 50% to be conservative, plus an extra 15us=3 samples to synchronize the cores with very conservative slop. So 33 samples per buffer.

12 inter-core buffers now take up a total of 396 samples, or slightly under 2ms. For a pipeline doing ~30GHz of work.

Nothing about modern multi-core CPU design makes a pipeline like this difficult. It's all software considerations, where the use of soft-realtime threads 'should' be straightforward, while letting the OS use the spare cores.

If you have a stutter at 25ms then someone blatantly screwed up their code.

Now do this for 32 (or more) audio tracks simultaneously.

You sound like you're disagreeing? If so I think you lost track of what this discussion was about.

I'm arguing that in audio processing, if the software is well-written, more cores can crush the performance of a single fast core. Even if it's a very serial pipeline.

Now you're here, describing an audio processing scenario where it's trivial to split the load up over many cores and run 10x faster or more.

I agree, more cores are better...?

If you're just arguing that you can't dedicate 12 cores per track while also having 32 tracks... that's certainly true, but completely irrelevant? Completely accurate that the multi-core CPU is not 300 times faster than a single fast core. But nobody said it was.

More cores are better, but you are limited by the slowest chain. So you can have 100 of cores, but if only one chain can't be computed for time duration of the buffer it will stutter. It is not uncommon to have like %15 utilisation on 16+ core cpu and having it stutter, because all cores are too slow to process the chain on time.

SO many comments missing the entire point in reply to this!

It's like none have noticed all processors, even gaming centric ones, come with a bunch of cores these days. Athlon dual core is not exactly a recent innovation, it's archaeology. So all the trivia and "yeah but..."

+ Gaming and a few other tasks mostly do not benefit from more cores and tend to be extremely GHz dependent. Thanks to massive GPU parallelisation we're left mainly CPU bound.

+ Even gaming pitched "Extreme" SKUs are disappointing from a GHz PoV

+ We haven't been in MS-DOS mode or had problems serving 120 OS background processes for a very, very long time indeed.

+ Power and heat have still gone wildly up despite the switch to cores not GHz

So, processor manufacturers hopped off the GHz race for well known reasons. We're now a decade on from that and GHz is about the same like they have stopped thinking about GHz entirely.

It's not unreasonable to have hoped to have hit 5GHz without wildly unrepresentative cheats hidden under the desk, or that they would have kept at least a few percent of their research chasing some more Hz as well as cores.

Sure, we can all wish that physics were to bend to our 'reasonable' expectations.

The reality of it is:

1) increasing the clock rate is hard past 3 GHz or so. The memory wall doesn't get any easier, so IPC drops as clock rate increases. Also heat becomes more of an issue.

2) increasing pipeline length (to make it easier to hit higher clocks) has increasing penalties for each branch misprediction. So IPC drops significantly and you end up back where we are today (high IPC, but low clock).

3) shrinking chips to increase the clock rate no longer works. Generally the insulation layer is so thin that making it thinner results in more leaking.

4) Ever increasingly smart cores (to get higher IPC), makes it ever harder to increase the clock. It's actually pretty amazing what happens in 1/3rd of a ns. Voltage changes in silicon actually move quite slowly (relatively to the speed of light) in silicon. Things like looking at a queue of 80 instructions, figuring out which ones have no pending dependencies, and issuing them to the right unit is pretty impressive for 0.33 ns. Moving to 0.2ns makes it worse.

So could AMD, Intel, or anyone with a decent CPU team design a 5 GHz cpu... sure, no problem. Would it be any faster for general purpose code running on today's 3 GHz CPUs, unlikely.

Personally for my desktop use I don't care about best case performance (0% cache misses), or even average case performance (which is generally way more than I need). I care about worst case performance. Stutters, hangs, lagginess, etc. Linux, enough ram (32GB in my case), GTX 1070, and a M2 for /home. So I welcome the additional cores. Plex transcoding, compressing family videos, making new icons for 10,000 photos, large parallel compiles, etc. Hell sometimes I game while the kids are watching plex streaming and neither of us notice. Thank god for M.2 which killed off the rather serious bottlenecks with SATA.

I will keep my dual X5690 machine for few more years I guess. The increase IPC of modern CPU is only 33% compared to theses old Xeons. Although, power consumption is another factor to consider...

> Nice I guess, but I wish we could get back to working on getting FASTER cores rather than more of them.

Absolutely. I can double the speed* for a ~10-15X increase in power. But then you won't buy my core ;)

*In theory a massive ROB could do this, frontend bottlenecks notwithstanding.

Yeah, but good luck renaming registers fast enough.

The breakthrough has to come from software. Might need to reimagine our programming/system model altogether.

Sure they do.

My computer has 130 processes on average, even if they are single threaded, they get all nicely distributed across all cores.

We are not in MS-DOS where a single application owns the hardware while it runs.

And 129 of them are sleeping 99.95%+ of the time and wouldn't have any noticeable impact on a same-clock single core system anyway.

Ok, but almost every single one of those processes uses a tiny bit of processor time.

I remember reading a article here on hn about inverse hyper threading (I don't remember the correct term) where multiple cores would work together to increase single threaded performance (if some cores are idle). I believe some company (not a major player like AMD, Intel or Samsung etc.) claimed they would have solved major problems with previous implementations of this idea or something along those lines.

Sounds interesting but is that even possible?

Seems like it would have been the perfect fit for AMDs Bulldozer back then due to the chips worse single threaded performance and it's unused multi threaded performance in mainstream software.

I was under the impression there was still some advances here, even given the lack of movement in clock speeds. That is, basically being able to pipeline better and service specific operations in fewer cycles or with less latency.

I just did a little googling, and came across the Intel Optimation Reference Manual[1], and in Appendix C you can see how latency and throughput changed for architectures over time (e.g. table C-4). Being able to assess all the implications of these is beyond me currently, but if an operation goes from 5 clock cycles to 4 (or it ties up certain transisters less for other ops at the same time, my understanding is this stuff is super complex), that could be a huge boost for specific actions.

1: https://www.intel.com/content/dam/www/public/us/en/documents...


We can't make cores higher frequency, because the power consumption (and heat dissipation) is prohibitive.

Doing more per cycle? It's hard to see how more work could be squeezed out of most operations (maybe fdiv or div could be shrunk a few cycles), so you're left with introducing niche instructions or wider, more complex SIMD instructions.

Spending more of your time being productive? This requires more accurate branch prediction and other speculative execution, but everyone's freaking out and trying to rip out speculative execution because of a bug that lets you read memory you already have access to.

You might be able to squeeze out more performance, but you're looking at 10% gains, not 2× gains.

There is some movement in the clock-speed arena [1].

[1] https://www.anandtech.com/show/12893/intels-28core-5-ghz-cpu...

That is just Intel grabbing some attention from the Threadripper 32 core announcement. It doesn't run at 5Ghz, that was heavily overclocked using a custom cooling system hidden under the table.

I was amused that the heroic efforts by Intel to make AMD looks bad included a 1 horse power (735 watts) pump in the cooler.

So sure Intel can make 32 cores go fast, by taking a $10,000 chip (literally, it's a relabel of the Intel Xeon 8180), a giant power supply, impressive cooling, and a 735 watt pump. Still only has 75% of the memory bandwidth/memory channels of the AMD epyc.

My main hope with threadripper is that they hit the lower price points that the HEDT intel chips do (I.e. $300-$500).

> My main hope with threadripper is that they hit the lower price points that the HEDT intel chips do (I.e. $300-$500).

1900x sells for ~$450. It's just the 8 core part, though — might as well grab the Ryzen 8 core for way less cost instead unless you need the memory b/w or PCIe lanes.

Yeah, that's the one I have my eye on. Was hoping for something more competitive with the i7-7800x or so (6 core, 12 threads, 4 memory channels, 4 GHz, and $390 list).

They're pretty similar, but I think 1900x is slightly higher end. (And arguably, better value if you can utilize all the cores.)

i7-7800x (Skylake-X) is a sort-of last gen part (Kabylake-X briefly existed in the form of i7-7740X, although only in a 4-core SKU); 4 GHz is its maximum turbo clock (may be only single core); base clock is 3.5.

1900x is current gen, although about to become last-gen; its base clock is 3.8, 4.0 turbo (single core).

Same number of memory channels. 33% more cores. AMD's IPC lags Intel a little bit, but not by 25%. Is it worth the $60 (15%) premium? Maybe.

When TR2's version of 1900x comes out, TR1 1900x may experience a nice price drop, making it (even) more competitive/affordable.

The 40th anniversary of 8086 is just attention grabbing from Intel? Not every move from Intel has to be a reaction to competitors...

Nowhere in the linked article is there any mention of 40th anniversary. The title is simply "Intel’s 28-Core 5 GHz CPU: Coming in Q4" and the first line of the article is "...the launch of Intel’s first 5 GHz processor, the 6-core Core i7-8086K, Intel today also showcased a 28-core single socket machine also running at 5 GHz."

A bit of Googling verifies that you are at least partially correct: "Intel announced that it is releasing the Core i7-8086K, a special edition processor that commemorates the 40th anniversary of the 8086..." and "...reports speculated that Intel would only produce 50,000 units. Our sources have confirmed that this is a limited-edition chip, so Intel isn't positioning it against AMD's competing Ryzen processors."

That is water chilled phase change cooling, no consumer is going to run a desktop with that kind of cooling. It is as pointless as liquid nitrogen based benchmarks IMO.

But I think threadripper 2 should have faster clock speeds regardless. The zen+ 12nm cores that it uses are known to have higher clock speeds.

We've only seen early 32- and 24- core numbers. Hopefully the 16-core part gets a refresh with interesting improvements.

Or we could just write more efficient software?

What makes you think people aren’t working on it?

Look at paper topics at major computer architecture conferences like ISCA. There's been a clear shift from single thread architecture research to parallel and low power single thread core research. Not a complete shift of course, but the trend is clear.

Are individual cores in this thing something nearly as powerful as in conventional quad-core desktop/server CPUs?

Also can it sleep/downclock individual cores to cool down when its full power is not needed?

Yes. Threadripper is essentially just four well-binned Ryzen 7s tied together with a creative interconnect. The memory bandwidth doesn't increase as proportionally as the core count, but each individual core should operate just as fast as in a typical desktop part.

Edit: To answer the second part of the parent question, since the dies are the same, it should have all the same power-saving features available in AMD's laptop and desktop processors.

Epyc is 4 ryzen 7's. The threadripper (at least the ones shipping today) are two Ryzen 7s.

The memory bandwidth does increase proportionally (it's 2x ryzen for threadripper and 4x ryzen for eypc).

Individual cores do tend to be lower on the threadripper and epyc because of power/heat limitations, so they tend to be lower clocked and tend to have less cache per core.

According to the linked article, Threadripper 2 uses the equivalent of 4 Ryzen 7s, however they disable the memory busses on two of them which yields only a 2x increase in memory bandwidth. You're correct that Threadripper tends to clock slightly lower, but the cache is on-die so it shouldn't be any different per-core than in a typical Ryzen.

Wow. Now I am amazed. Thanks!

Particularly in case of Threadripper, they use cherry picked binned cores so they can actually hit even [slightly] higher clocks than their mainstream cousins.

We don't know the Threadripper 2's clocks yet, but last gen TR 1950x could hit 4.2Ghz with Turbo, while the best mainstream Ryzen the 1800x could do 4.1Ghz with Turbo.

The Ryzen 2 top dog right now can do 4.25Ghz with Turbo.. so we could perhaps expect TR2 to hit 4.3Ghz+ on Turbo.

I was hoping they would put a GPU and HBM2 in place of those non-functional die. Like their old EHP concept that never saw the light of day. With enough HBM you wouldn't even need regular RAM modules and could fit a monster system on an ITX board.

(1) the TR4 socket is too big for mITX, even the "hold my beer" team over at Asrock only could stuff it into mATX.

(2) TR4 mobile don't have video outputs, so a GPU would only be useful for compute.

If you can't fit a discrete GPU the Hades Canyon NUC is still the best you can do, although AMD is readying their own competitor called "Fenghuang" with a bigger+newer GPU - 28 CU Vega vs the 24 CU Polaris on the Intel (yes, despite Vega branding it's actually based on Polaris).

It would fit if the GPU and RAM were both in the module. No GFX or RAM slots needed on the board.

Well, I would be really happy if they would add a couple of GBs HBM RAM in their next APU. After the Linux 4.17 kernel release I finally ordered a Ryzen 5 2400G to cram it in a tiny mini ITX desktop.

I just got a 2400G too. I'm in the process of designing a custom 3d printed case - aiming for a very small design. It's a great APU for everyday stuff, but the EHP concept could be a monster in a similar package. Running Fedora Rawhide on it and I still get lockups after a few minutes of BZflag - but it's starting more reliably after the last update.

Yeah, not following the HW releases I was surprised the other day to find out the Intel NUC Hades Canyon and how well the integrated GPU+HBM work.

I already bought a Streacom F1CWS but I'm already regretting it as, without mods, there's no way that this case can have a decent airflow even if it's allegedly designed to support CPU with 65W TDP. ^__^;

I liked the form factor too much tho.

Problem is windows support for this many cores is still a problem. On a Linux server, sure we've been using multi-socket motherboards w/ high core count CPUs for years. But many applications (eg; Microsoft Edge) still crash w/ 16 cores.

So all those 1/2 CPU Xeon servers with 16+ cores running Windows and which don't crash are just an anomaly? I wouldn't be so quick to judge. I would guess that your case is the anomaly. Anecdotally speaking, as admin to several 12 and 24 core dual CPU systems running Windows 7/10, my first guess would be that you have some other issue e.g. faulty stick of RAM (have you run memtest86?), a bad driver or the most common cause of crashes: a bug in the code, which only happens in certain specific cases such as "while Skype is running in the background".

I have Intel's 20 core CPU and never had problems.

My guess is it's a timing bug that likely exists with any number of cores, but becomes more likely the higher the parallelism.

My guess is it's something else entirely specific to that system.

Really weird to just single out many cores as the cause. It is also just one application that barely has anything to do with windows anyway. There has never been a lack of multiple-cpus nor many-cores (and 16 isn't even many) on windows as the post implies either.

Hey, I work on the web platform team at Microsoft. Do you have any more information about this? I'm not sure this is a bug that we're aware of. If you've personally experienced the bug, could you pull the Windows build number (Settings->System->About) and send it to t-jobak (at) microsoft.com along with the steps to reproduce (if any)? We'd really appreciate it and might be able to get a fix out in the next release.

Intel leads in AVX/AVX2/AVX512 perf by a large factor over Ryzen CPUs. (A software that makes that manifest is Prime95 (Mersenne primality tester)). Looks like AMD did have to cut something after all.

Yeah, it was known Ryzen cut AVX performance in order to lower power consumption. That's how we ended up with 65W 8-core at 14nm.

Indeed, but the problem is that often AVX intensive code triggers intel to significantly throttle, or run out of bandwidth from cache or main memory.

A surprisingly number of codes don't seem to benefit from the AVX performance. Most notably even on the most FP intensive code don't seem to benefit much from the higher end xeons (with two AVX512 units) over the lower models with a single AVX512 unit.

So sure if you compute mercenne primes all day go for it. If you are actually doing some real world work that's vector intensive you are often better off with AMD (per $.)

Keep in mind that today's chips have more performance per memory bandwidth then previous generations. So generally memory is a bigger bottleneck and AMD's 1.33x advantage in bandwidth/memory channels is an advantage on a wider variety of codes than the previous generation.

> Indeed, but the problem is that often AVX intensive code triggers intel to significantly throttle

Xeon-W, HEDT, and consumer processors suffer this to a much smaller extent than Skylake-SP does, and this slowdown is already baked into the benchmarks (the deviation from benchmarks only really exists for mixed workloads). In most cases, AVX is still a huge speedup.

Also, on consumer+HEDT you can manually configure the AVX offset anyway. Want zero offset, and can handle the power/heat? Go hog wild...

> Most notably even on the most FP intensive code don't seem to benefit much from the higher end xeons (with two AVX512 units) over the lower models with a single AVX512 unit.

Intel's documentation on some of the HEDT chips was incorrect, the i7s actually have dual AVX512 as well as the i9s. If you were referring to HEDT as "Xeons" that would be why. Otherwise, it might be another documentation issue. The difference should be very clear in something like x265 encoding.

2/4 CCX’s lack direct memory access. This is kind of a deal breaker. Threadripper is already memory bound, so the 2 new packages will just be waiting on RAM most the time.

Cite: http://en.community.dell.com/cfs-file/__key/telligent-evolut...

It would be nice if L3 wasn’t an eviction cache :/

> Threadripper is already memory bound

This depends heavily on your workflow. If you find TR memory bound, then you want a server part. TR has 4 memory controllers that will each do DDR4-3200 in TR2 per TFA. That should be able to push ~95 GiB/s.

What do you compare that against?

Intel's highest end HEDT part, i9 7980XE with 18 cores, only supports DDR4-2666 and has the same number of controllers, so it should hit ~80 GiB/s. It retails for $2000.

If you want more memory bandwidth you're either buying an even more expensive Xeon, or an AMD Epyc part.

>This depends heavily on your workflow. If you find TR memory bound, then you want a server part. TR has 4 memory controllers that will each do DDR4-3200 in TR2 per TFA. That should be able to push ~95 GiB/s.

Wrong. You neither read the OP or my citation. Eypc can push 95GiB/s with its 8 channels. TR2 isn't increasing the number of channels, so 2 CCX's (8 cores, 16hyper threads) don't have direct memory access. They have to work with in the inter-chip CCX bus to get memory access. This is going to carry a lot of performance penalties.

> Wrong.

What? Are you seriously claiming that a workflow that only uses 4 GiB/s of memory throughput is somehow memory bound on a controller that can push 95 GiB/s? Workflow matters and affects reasonable CPU choice.

> You neither read the OP or my citation. Eypc can push 95GiB/s with its 8 channels.

Um, also not true. Your link does not include any memory throughput figures supporting your contention that TR is "memory bound." I believe you are totally mistaken about the 95 GiB/s figure for Epyc; see below:

TR, quad channel: https://en.wikichip.org/wiki/amd/ryzen_threadripper/1950x#Me...

Epyc, octa channel: https://en.wikichip.org/wiki/amd/epyc/7551p#Memory_controlle...

Epyc can push 159 GiB/s with DDR4-2666. Per socket:

> In a dual-socket configuration, the maximum supported memory doubles to 4 TiB along with the maximum theoretical bandwidth of 317.9 GiB/s.

You said:

> TR2 isn't increasing the number of channels, so 2 CCX's (8 cores, 16hyper threads) don't have direct memory access.

Understood. I have never said otherwise.

> This is going to carry a lot of performance penalties.

The extra cores will have higher NUMA latency than the memory-local cores, yes. But it does not somehow decrease the total memory bandwidth available across the CPU.

Cache coherence in application development is more important now than ever

Are there any reason, why you don't want to use Threadripper for Server?

Quad Memory Channel + 32 Core + NVMe Sounds perfect to me for Sever. And it should be priced similar to the Xeon-D 16 Core.

No guarantee on ECC, half the cores and half the memory bandwidth of epyc.

TR Supports ECC, and many has it running. Of coz if you need the memory bandwidth EPYC is best fit.

Sexy, but for home use I don't need a space heater, on the contrary. What's the fastest (multiple core) you can get these days inside 65 W and who wins there, AMD or Intel?

Pretty sure it is AMD with Ryzen 7 1700 or Ryzen 7 2700 both rated at 65W TDP

I have a Ryzen 1700 at home, and I am very happy with it. The fan that came with it runs very quietly even under heavy load. On workloads that utilize all cores, frequency will drop to 2-2.5 GHz after a while, but for those, the number of cores/threads tends to have a bigger impact on throughput than clock frequency / performance per core.

Aww thanks - for some reason I had forgotten there are Ryzen 7s at 65 W and was only looking at the 5s...

I was very happy to discover the 1700 back then, because I wanted an 8-core machine really badly (not that I really need one, but hey, I only upgrade my desktop every ~5 years), but I did not want an electric radiator. ;-)

It's only a space heater when it's being used. These CPUs consume very little power when idle. So if you have a task that can leverage 32 cores.. that task should in theory complete that much faster than on your 65W CPU. So the difference in total heat generated shouldn't be too bad.

It's very likely that my definition of a silent PC differs greatly from yours :)

Oh lovely competition. How I adore you.

I wish I can get threadripper on an mini-ITX motherboard

The socket size and accompanying cooler makes that impossible without using laptop-sized (and lower-performing) RAM. There are mATX boards and very small mATX chassis out though.

Have you seen a Threadripper socket? They're freakishly huge (78mm X 105mm). That's a huge chunk of a mini-ITX board (170mm X 170mm) all by itself. There's no way.

And even with an ordinary socket, mITX boards have anemic amounts of PCIe lanes. One of the top selling points of TR is the huge number of PCIe lanes for a non-server part. All ATX boards around launch were required to support the full complement of physical lanes. A mITX configuration would have both anemic PCIe and anemic memory.

On the other hand, I think TR can work without chipset. Am I right?

Would 4 DIMM slots should be enough to saturate all controllers?

Yeah, 4 DIMMs should be sufficient[0], and I guess some mini-ITX boards have 4 slots (e.g., ASRock X299E-ITX/ac).

I don't know about TR working without a chipset. This picture[1] suggests the X399 part isn't absolutely essential, but I don't know if there are other considerations that make it essential.

Anandtech says[2]:

> Unlike Ryzen, the base processor is not a true SoC as the term has evolved over the years. In order to get the compliment of SATA and USB ports, each Threadripper CPU needs to be paired with an X399 chipset. So aside from the CPU PCIe lanes, the 'new' X399 chipset also gets some IO to play with.

So, maybe? If you are willing to forgo all the chipset-attached IO, and most PCIe lanes. That could be useful for a niche workflow or maybe a small form-factor desktop workstation that just needs CPU compute.

[0]: https://en.wikichip.org/wiki/amd/ryzen_threadripper/1950x

[1]: https://en.wikichip.org/w/images/6/61/x399_platform.png

[2]: https://www.anandtech.com/show/11685/amd-threadripper-x399-m...

Thank you for full and informative reply!

This looks like a fun CPU for running Redis on.

At one instance per core, this ..... ok, I'm getting carried away, aren't I ?

Could an admin please fix the title?

AMD will have a kill on the market if they can price this 32 core chip around $1999.

The top end Threadripper 1 16c/32t was priced at $1k. I don't think this is going to be much more than that. $1499 max is my guess. (Disclaimer: I own AMD stock so I am optimistic)

That sounds right to me. At the beginning of the year I found a TR 9150 + a nice motherboard bundled together at a local shop out the door for around $900 before rebates. I'd be a little surprised if this part came in at more than $1500, and not at all surprised to see it available for around $1200 within a couple months.

$1499? too good to be true. ;)

Given that their 1-socket 32c EPYC part (7551P) for $2300, I'd guess it'll be cheaper than 2k unless they can show the two nodes (half the chip) lacking direct memory channel access won't have a problem being fed data.

Somebody told me that cpu prices are like flight prices and your best bet is to buy a bit old.

What is the best time to buy a CPU? After how many months does the biggest drop occur?

Check on https://camelcamelcamel.com/

There are big dips, buy not related to time since release. I guess it's more about when a new version comes out. Anyway - you can verify whether that happened or not from the chart.

If you are shopping for a bit of a CPU than newer models, make sure that the motherboard that fits your requirements is also available. Motherboards tend to get more expensive 2nd hand once they stop being manufactured, as opposed to CPUs.

I will keep this in mind, good info, thanks

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact