But 16 to 32 cores is a leap (250W TDP!!!!!), so I wonder what cooling solutions will remain viable (e.g., what about the TR4 edition Noctua cooler made just for TR?)
I would think power delivery on existing motherboards only designed for 180W would be a bigger problem. But again, any board targeted at overclockers should already be prepared to handle at least 250W. The older boards will mostly just not support as much overclocking of the new, larger part.
There are "proper" TR4 coolers, such as the Noctua TR4-specific air cooler which actually covers the entire socket, for a maximum connection between the radiator and socket.
Pictured there is the Enermax TR4-specific liquid-cooler. There is another one on the market somewhere, I forgot the name though.
Anyway, the bigger heat-plate makes a big difference. People have measured somewhere on the order of ~10C to ~20C cooler.
Yes, I am aware of that (it's not news to me). The cooler is smaller than the integrated heat spreader (the visible metal cover on the top of the CPU), but it does fully cover all four (two active) CCX dies and conducts plenty of heat away.
I know you can get TR4-shaped coolers today, and they perform marginally better, but they were not available at the time. With all cores busy, this thing keeps the CPU at 50°C or less in ~25°C ambient (stock clocks/boost). That is totally sufficient — the CPU is rated to 68°C (and oddly, the Ryzen 1800X is rated to 95°C, so it's unclear why the TR isn't rated up to 95°C as well).
> Anyway, the bigger heat-plate makes a big difference. People have measured somewhere on the order of ~10C to ~20C cooler.
Yes, absolutely it makes a difference; something like ~10°C, at least under fairly common conditions, from my reading. 20°C probably indicates an incorrectly installed cooler.
Technically covers, but you can see that the screw-holes of the cooler go right over the dies. So you've got some air-pockets over the most-important parts to cool.
> Yes, absolutely it makes a difference; something like ~10°C, at least under fairly common conditions, from my reading. 20°C probably indicates an incorrectly installed cooler.
I was looking at overclocked numbers at the ~20C metrics btw. 10C was closer to the "normal use" situation. Overclocking adds a lot of heat, and benefits from a cooler with a larger connection.
Sorry, I don't see any screw holes overlapping dies in that image; can you point them out?
Also, won't screw holes generally have screws in them, which are both conductive and not air?
I added a few arrows to make it more obvious.
Here's the article these pics are from: https://www.gamersnexus.net/news-pc/3008-threadripper-cooler...
> Also, won't screw holes generally have screws in them, which are both conductive and not air?
Screws have heads, and those heads have divots and are simply not a perfectly-flat surface.
I mean, nothing is perfectly flat. That's why we use thermal compound / thermal paste to connect things together. (Thermal compound is worse than copper at thermal exchange, but its way, way WAY better than air-gaps). So you use thermal-paste to fill in any air-gaps.
If you are going to use a "regular size" cooler, you probably should put thermal paste in all of those screw-heads, for a slightly better connection.
How do the thermals compare with an AIO cooler? I've heard that their coolers come close to or exceed AIO cooling (assuming a standard Asetek Gen 5) on the usual square sized heat spreaders. Does Threadripper's more rectangular IHS impact temps at all with this cooler?
EDIT: Just saw the bit about not having an overclock. Has anyone tried overclocking on this Noctua cooler? What were the results when compared to the AIO solutions?
But you are correct that some of the more budget-oriented (and I use that term loosely) TR4 boards will struggle with power delivery. Worst case, you will shorten the lifespan of your motherboard and increase the risk of capacitor failure (eg: shorting, popping, etc).
Most popular way is to use arctic fish aquarium chiller
Edit: someone snapped a photo of it: https://i.imgur.com/dRINhkW.jpg and people identified it as http://www.hailea.com/e-hailea/product1/HC-1000A.htm
It's fine for showing off of far the chip can go, but nothing someone would have under the desk.
Suffice to say, Intel's 5GHz 28-core chip is nowhere near production-ready and their demo is more than likely a heavily overclocked top-binned Xeon chip. In other words, Intel's new offerings are looking like less than stellar vaporware.
Threadripper 1 still beats the pants off of Intel's HEDT offerings in terms of price (Intels' costs quickly exceed the $5K mark), Threadripper 2 twists knife so much that its a gaping wound in Intel's side. The Epyc 2 lineup will similarly affect Intel's enterprise/data-center offerings.
I'm frankly amazed my first post-college desktop is soldiering on as a high-end gaming system with gpu, ssd and dram updates
This is really more of a workstation or server cpu than a gaming cpu.
Edit: Tons of comments are pointing out exceptions. I was hoping to cover that with "generally speaking".
Not to say they aren't behind. I once saw a game talk about how awesome they are because they reinvented a thread pool. But their state of the art is in direct conflict with good threading practices.
That said, I'd be really interested to see how some of these Battle Royale engines tick (no pun intended). It seems like there's a ton of potential for "clustering" execution, essentially treating separate sections of maps as separate games, but considering it could come down to 100 people all being within the same 100 square meters, you can't necessarily rely on that model.
I was doing that 10 years ago... in a radar signal processor. I could do the same thing in a game engine, but the games industry seems to be allergic to devs who do not come up in the games industry. I suspect that has as much to do with the poor state of parallelism as anything.
It'll be a number of years yet for large-scale multithreading to penetrate the consumer market.
92% of users still have 2 or 4 physical CPUs, so 2-8 threads.
I'd assume given asset size they're already dealing with a large amount of cache misses, and if you're stalling for a memory read anyway then it seems less impactful than on a high-thoroughput tuned scientific workload.
This guys go all the way up to 6 hardware ones, depending on availability.
Also, many games that do utilize multiple CPU cores do not scale well beyond about 4-6 cores (which makes a lot of sense given the CPUs most gamers have and the priorities of the studios).
An easy way to see this is to underclock your CPU as far as it will go, and see how many percent CPU your favorite game will use.
I feel bad for the poor programmers who got that job.
Civ5 is one of the games that can trigger Ryzen segfaults, guessing he has a faulty chip. He should run kill-ryzen and see if his chip is stable.
Note that contrary to popular belief, the segfault bug does not "only" affect Linux or compiling workloads (it's a litho fault - these workloads just happen to trigger it) and there is no "safe" date range of first-gen Ryzen chips. Newer chips are more likely to work properly, but a smaller number are still faulty - even some RMA chips, since AMD stopped testing them after a few months.
I went through the AMD support thread and pulled out some examples: https://news.ycombinator.com/item?id=17074355
Looking through the latest posts in the AMD community thread, at least one user is reporting this even on a second-hand processor, although that's not really enough to make a definitive statement.
(you could argue for ps3 of course, But there it's not a choice)
I imagine many genres will never be heavy CPU users though, like low player count competitive FPS and MOBA.
GPU and Ram prices are sadly also very high at the moment... probably the worst time to build a new high end PC.
AMD is getting better, but doesn't seem quit there yet. Steam and phoronix still seem to generally agree nvidia is still the #1 choice on linux. #2 (stability wise) seems to be Intel's GPU. AMD's working on it, but decent performance often requires a custom build stack not supported on any current Linux distro. I'd rather not have to recompile my own kernel, mesa, and related just to get reasonable stable and decent performance.
On my personal experience, I disagree.
yet which still cannot run Wayland/Xwayland+XGL. And you cannot do anything about it.
As you say, these i7s are still handling the load quite nicely.
As far as heat, the Vega 64 GPU I've got draws way more power than that too. I don't think this product is really targeted towards gamers anyway, but I don't think they'd care much about the power draw.
But if you're doing development spinning up 32 cores is a very realistic proposition (compilers are good at parallelizing work, especially on big projects, spinning up dev cluster locally, etc.) - so you're getting a top productivity machine that can game - if you can afford it and are not interested in chasing gaming benchmarks then I don't see how it wouldn't be great.
"Threadripper this is still the case: the two now ‘active’ parts of the chip do not have direct memory access. This technically adds latency to the platform, however AMD is of the impression that for all but the most memory bound tasks, this should not be an issue"
Seems kinda silly unless it's pretty cheap. After all the 7351P is $750, guarantees ECC, and has twice the memory bandwidth of the threadripper.
There is definitely differentiation between the server line of chips and the TR "HEDT" chips, other than the crippled memory channels.
The TR4 socket sacrifices half of the SP3 memory channels to use for more power.
As a result, TR models come with (much) higher stock clocks: 3.0 GHz base -> 3.4 GHz all-core turbo vs Epyc 7551P 2.0 GHz base -> 2.55 GHz all core.
So all in all, TR may be better for your workflow than Epyc even at the same core count and price (if clock matters more to you than memory bandwidth). And keep in mind, server boards aren't free.
Epyc 7351P, 2.9GHz, 16 core, 32 threads, 8 memory channels = $799
TR 1920X, 3.5GHz, 12 core, 24 threads, 4 memory channel = $761
TR 1950X, 3.4GHz, 16 core, 32 threads, 4 memory channel = $959
Sure the 1950X is 17% faster clocked, but with half the mem bandwidth and half the mem channels (half the performance for random lookups).
So generally the 1950X will be 17% faster with no cache misses, and 50% as fast with all cache misses.
Motherboard prices are pretty close, the Epyc, Dual GigE, 3x PCI-e x16, etc = $369. I found some TR boards cheaper, but not with ECC or 3x PCI-e x16.
Sure the future thread ripper 2 will be faster... but so will the Epyc 2. Not sure I see the point in the TR.
Clock and single core performance absolutely matters to some customers. This is part of why Intel can continue to dominate the home and server market despite selling lower core count chips — they are clocked a little higher and have a little higher IPC.
(Also, I totally agree about the silly LEDs, but I also think board manufacturers wouldn't do it unless it produced a return, i.e., added value on average.)
> To compare: Epyc 7351P, 2.9GHz, 16 core, 32 threads, 8 memory channels = $799
> TR 1950X, 3.4GHz, 16 core, 32 threads, 4 memory channel = $959
7351P @2.9 GHz is the boost clock — base is 2.4.
1950x @3.4 is the base clock — boost is 4.0, on up to 4 cores (not all-core like the 7351P part).
> So generally the 1950X will be 17% faster with no cache misses, and 50% as fast with all cache misses.
Your 17% number is sort of wrong, or missing the point — yes, that's the clock difference for all-core workloads, but TR boost kicks in at 1-4 thread workloads.
4.0/2.9 is huge. That's 38% additional CPU on (very common) 1-4 thread workflows. That's worth a premium to some people.
The obvious question you might then ask me is, if your workflow is only 1-4 threads, why buy a 16 core CPU? Well, sometimes people have workflows that vary over time. Maybe some of the time you only need a few cores to manage interrupt traffic and keep the GPUs fed, and maybe other times you do want all of the cores to complete a parallelizable, CPU-bound task like compilation or h.265 compression faster.
If your workload is exclusively embarassingly parallel and you can keep all cores busy all the time, yes, a server platform like Epyc 7351P is a much better value for you than TR. Or if your workflow needs more than 80 GiB/s (TR1) or 95 GiB/s (TR2) memory throughput, a server platform like Epyc is probably a better value for you.
> Sure the future thread ripper 2 will be faster... but so will the Epyc 2. Not sure I see the point in the TR.
Epyc 2 will almost certainly not be clocked higher than TR2, due to the additional power draw the TR socket has vs SP3 — though, the gap may shrink.
Even GPUs are starting to peter out with lengthening product cycles, smaller improvements, and the increasing tendency to just rename the previous generation with a new name, as if it was actually significantly improved.
So CPUs have peaked, most of the low hanging fruit has already been taken. Most easily shown by pretty much all the popular CPUs from the previous generation maxed out at 95 watts or so (desktop or server), but are at 180 watts in the current generation. This happened years ago for desktops, but is hitting laptops (last year), and smart phones (this year or so).
GPUs are about half way there, improvements are harder. No more annual updates with 2x the performance on most common use cases. Feature improvements are getting more obscure, gaming sites are using 16x zoom to show the improvements. GTX1170 is rumored to be 50% faster than the GTX1070... from 2 years ago.
Specific use ASICs for mining, AI, and similarly narrow use cases are still early in their development. Multiple vendors are showing 2x improvements with each generation, even crazy things like including said chips in consumer devices with tiny power budgets.
So basically a core twice as fast isn't happening, it's not because anyone is lazy. There's tons of room for improvement in specific use cases where today's silicon is crazy inefficient. Things like machine learning, software device radios, vision specific processing, and related are seeing rapid improvements.
> Chips are getting physically bigger
They are? Over the years, I got the notion that the cheap controller chips were about 30 mm2, the regular desktop CPUs about 100 mm2, and server class powerhouses around 300 mm2.
With a quick look, I found mention of Epyc being four 200 mm2 dies packed together: https://www.eetimes.com/author.asp?section_id=36&doc_id=1331...
Wafers have gotten bigger, https://en.wikipedia.org/wiki/Wafer_(electronics), but I suppose that manufacturers just want more dies per wafer, not larger dies.
AMD was pretty krafty with this generation. Intel has a huge R&D budget, has it's own fabs, and pushes many different ASICs split into a dizzying array of products. Intel can afford this because they have a MUCH higher volume of parts to amortize the costs over.
To compete at many different price/power/performance points AMD uses the exact same ASIC for desktop (1 ASICs with 2 memory channels), workstation (2 ASICs with 4 memory channels), and server (4 ASICS with 8 memory channels).
Now AMD can compete across a decent fraction of Intel's produces with a single ASIC and spend more R&D on the second generation without having to re-engineer an entire family of ASICs.
Compare any photos of a threadripper or epyc chip, they are easily visible even onstage, it's about as big as the palm of your hand. I believe it's 69.2 mm x 57.9 mm according to a spec I found on the TR4 socket.
Out of curiosity, are you working on tasks that don't parallelize? There are some for sure, but maybe more that do parallelize, just not trivially or easily? It seems like some people have wanted faster CPUs rather than more parallelism for as long as I can remember... decades. I'm wondering out loud if your wish has ever not been true.
I suspect we're stuck with getting more cores until there's a new fundamental breakthrough in physics. We're hitting the wall with CPU speeds, which is why GPUs have been seeing all the growth recently.
So I just went looking and found a CS term for hard to parallelize I hadn't heard before: P-complete. https://en.wikipedia.org/wiki/P-complete Are you working on stuff related to or dependent on any of these examples?
This is not to say that your problems aren't real. But I'm skeptical a concerted effort was ever made to go multi-core here.
What's an acceptable level of total latency, so I can understand better?
Acceptable latency would be between 2 to 25ms, but typically it is 5ms to work comfortable.
My projects stutter on 25ms latency on Ryzen 1800x.
Let me put it this way:
Take a sample rate of 200kHz.
Take a pipeline with 12 steps that each require 2-3GHz.
These cores can easily synchronize to within a fraction of a microsecond, so round up to 1us.
Assume each plugin needs 10 samples available, and that the input-to-output latency might get as high as 20 samples when it's running on a completely dedicated core.
So put a buffer between each core of 20 samples, plus an extra 50% to be conservative, plus an extra 15us=3 samples to synchronize the cores with very conservative slop. So 33 samples per buffer.
12 inter-core buffers now take up a total of 396 samples, or slightly under 2ms. For a pipeline doing ~30GHz of work.
Nothing about modern multi-core CPU design makes a pipeline like this difficult. It's all software considerations, where the use of soft-realtime threads 'should' be straightforward, while letting the OS use the spare cores.
If you have a stutter at 25ms then someone blatantly screwed up their code.
I'm arguing that in audio processing, if the software is well-written, more cores can crush the performance of a single fast core. Even if it's a very serial pipeline.
Now you're here, describing an audio processing scenario where it's trivial to split the load up over many cores and run 10x faster or more.
I agree, more cores are better...?
If you're just arguing that you can't dedicate 12 cores per track while also having 32 tracks... that's certainly true, but completely irrelevant? Completely accurate that the multi-core CPU is not 300 times faster than a single fast core. But nobody said it was.
It's like none have noticed all processors, even gaming centric ones, come with a bunch of cores these days. Athlon dual core is not exactly a recent innovation, it's archaeology. So all the trivia and "yeah but..."
+ Gaming and a few other tasks mostly do not benefit from more cores and tend to be extremely GHz dependent. Thanks to massive GPU parallelisation we're left mainly CPU bound.
+ Even gaming pitched "Extreme" SKUs are disappointing from a GHz PoV
+ We haven't been in MS-DOS mode or had problems serving 120 OS background processes for a very, very long time indeed.
+ Power and heat have still gone wildly up despite the switch to cores not GHz
So, processor manufacturers hopped off the GHz race for well known reasons. We're now a decade on from that and GHz is about the same like they have stopped thinking about GHz entirely.
It's not unreasonable to have hoped to have hit 5GHz without wildly unrepresentative cheats hidden under the desk, or that they would have kept at least a few percent of their research chasing some more Hz as well as cores.
The reality of it is:
1) increasing the clock rate is hard past 3 GHz or so. The memory wall doesn't get any easier, so IPC drops as clock rate increases. Also heat becomes more of an issue.
2) increasing pipeline length (to make it easier to hit higher clocks) has increasing penalties for each branch misprediction. So IPC drops significantly and you end up back where we are today (high IPC, but low clock).
3) shrinking chips to increase the clock rate no longer works. Generally the insulation layer is so thin that making it thinner results in more leaking.
4) Ever increasingly smart cores (to get higher IPC), makes it ever harder to increase the clock. It's actually pretty amazing what happens in 1/3rd of a ns. Voltage changes in silicon actually move quite slowly (relatively to the speed of light) in silicon. Things like looking at a queue of 80 instructions, figuring out which ones have no pending dependencies, and issuing them to the right unit is pretty impressive for 0.33 ns. Moving to 0.2ns makes it worse.
So could AMD, Intel, or anyone with a decent CPU team design a 5 GHz cpu... sure, no problem. Would it be any faster for general purpose code running on today's 3 GHz CPUs, unlikely.
Personally for my desktop use I don't care about best case performance (0% cache misses), or even average case performance (which is generally way more than I need). I care about worst case performance. Stutters, hangs, lagginess, etc. Linux, enough ram (32GB in my case), GTX 1070, and a M2 for /home. So I welcome the additional cores. Plex transcoding, compressing family videos, making new icons for 10,000 photos, large parallel compiles, etc. Hell sometimes I game while the kids are watching plex streaming and neither of us notice. Thank god for M.2 which killed off the rather serious bottlenecks with SATA.
Absolutely. I can double the speed* for a ~10-15X increase in power. But then you won't buy my core ;)
*In theory a massive ROB could do this, frontend bottlenecks notwithstanding.
The breakthrough has to come from software. Might need to reimagine our programming/system model altogether.
My computer has 130 processes on average, even if they are single threaded, they get all nicely distributed across all cores.
We are not in MS-DOS where a single application owns the hardware while it runs.
Sounds interesting but is that even possible?
Seems like it would have been the perfect fit for AMDs Bulldozer back then due to the chips worse single threaded performance and it's unused multi threaded performance in mainstream software.
I just did a little googling, and came across the Intel Optimation Reference Manual, and in Appendix C you can see how latency and throughput changed for architectures over time (e.g. table C-4). Being able to assess all the implications of these is beyond me currently, but if an operation goes from 5 clock cycles to 4 (or it ties up certain transisters less for other ops at the same time, my understanding is this stuff is super complex), that could be a huge boost for specific actions.
We can't make cores higher frequency, because the power consumption (and heat dissipation) is prohibitive.
Doing more per cycle? It's hard to see how more work could be squeezed out of most operations (maybe fdiv or div could be shrunk a few cycles), so you're left with introducing niche instructions or wider, more complex SIMD instructions.
Spending more of your time being productive? This requires more accurate branch prediction and other speculative execution, but everyone's freaking out and trying to rip out speculative execution because of a bug that lets you read memory you already have access to.
You might be able to squeeze out more performance, but you're looking at 10% gains, not 2× gains.
So sure Intel can make 32 cores go fast, by taking a $10,000 chip (literally, it's a relabel of the Intel Xeon 8180), a giant power supply, impressive cooling, and a 735 watt pump. Still only has 75% of the memory bandwidth/memory channels of the AMD epyc.
My main hope with threadripper is that they hit the lower price points that the HEDT intel chips do (I.e. $300-$500).
1900x sells for ~$450. It's just the 8 core part, though — might as well grab the Ryzen 8 core for way less cost instead unless you need the memory b/w or PCIe lanes.
i7-7800x (Skylake-X) is a sort-of last gen part (Kabylake-X briefly existed in the form of i7-7740X, although only in a 4-core SKU); 4 GHz is its maximum turbo clock (may be only single core); base clock is 3.5.
1900x is current gen, although about to become last-gen; its base clock is 3.8, 4.0 turbo (single core).
Same number of memory channels. 33% more cores. AMD's IPC lags Intel a little bit, but not by 25%. Is it worth the $60 (15%) premium? Maybe.
When TR2's version of 1900x comes out, TR1 1900x may experience a nice price drop, making it (even) more competitive/affordable.
A bit of Googling verifies that you are at least partially correct:
"Intel announced that it is releasing the Core i7-8086K, a special edition processor that commemorates the 40th anniversary of the 8086..." and "...reports speculated that Intel would only produce 50,000 units. Our sources have confirmed that this is a limited-edition chip, so Intel isn't positioning it against AMD's competing Ryzen processors."
But I think threadripper 2 should have faster clock speeds regardless. The zen+ 12nm cores that it uses are known to have higher clock speeds.
Also can it sleep/downclock individual cores to cool down when its full power is not needed?
Edit: To answer the second part of the parent question, since the dies are the same, it should have all the same power-saving features available in AMD's laptop and desktop processors.
The memory bandwidth does increase proportionally (it's 2x ryzen for threadripper and 4x ryzen for eypc).
Individual cores do tend to be lower on the threadripper and epyc because of power/heat limitations, so they tend to be lower clocked and tend to have less cache per core.
We don't know the Threadripper 2's clocks yet, but last gen TR 1950x could hit 4.2Ghz with Turbo, while the best mainstream Ryzen the 1800x could do 4.1Ghz with Turbo.
The Ryzen 2 top dog right now can do 4.25Ghz with Turbo.. so we could perhaps expect TR2 to hit 4.3Ghz+ on Turbo.
(2) TR4 mobile don't have video outputs, so a GPU would only be useful for compute.
If you can't fit a discrete GPU the Hades Canyon NUC is still the best you can do, although AMD is readying their own competitor called "Fenghuang" with a bigger+newer GPU - 28 CU Vega vs the 24 CU Polaris on the Intel (yes, despite Vega branding it's actually based on Polaris).
I already bought a Streacom F1CWS but I'm already regretting it as, without mods, there's no way that this case can have a decent airflow even if it's allegedly designed to support CPU with 65W TDP. ^__^;
I liked the form factor too much tho.
Really weird to just single out many cores as the cause. It is also just one application that barely has anything to do with windows anyway. There has never been a lack of multiple-cpus nor many-cores (and 16 isn't even many) on windows as the post implies either.
A surprisingly number of codes don't seem to benefit from the AVX performance. Most notably even on the most FP intensive code don't seem to benefit much from the higher end xeons (with two AVX512 units) over the lower models with a single AVX512 unit.
So sure if you compute mercenne primes all day go for it. If you are actually doing some real world work that's vector intensive you are often better off with AMD (per $.)
Keep in mind that today's chips have more performance per memory bandwidth then previous generations. So generally memory is a bigger bottleneck and AMD's 1.33x advantage in bandwidth/memory channels is an advantage on a wider variety of codes than the previous generation.
Xeon-W, HEDT, and consumer processors suffer this to a much smaller extent than Skylake-SP does, and this slowdown is already baked into the benchmarks (the deviation from benchmarks only really exists for mixed workloads). In most cases, AVX is still a huge speedup.
Also, on consumer+HEDT you can manually configure the AVX offset anyway. Want zero offset, and can handle the power/heat? Go hog wild...
> Most notably even on the most FP intensive code don't seem to benefit much from the higher end xeons (with two AVX512 units) over the lower models with a single AVX512 unit.
Intel's documentation on some of the HEDT chips was incorrect, the i7s actually have dual AVX512 as well as the i9s. If you were referring to HEDT as "Xeons" that would be why. Otherwise, it might be another documentation issue. The difference should be very clear in something like x265 encoding.
It would be nice if L3 wasn’t an eviction cache :/
This depends heavily on your workflow. If you find TR memory bound, then you want a server part. TR has 4 memory controllers that will each do DDR4-3200 in TR2 per TFA. That should be able to push ~95 GiB/s.
What do you compare that against?
Intel's highest end HEDT part, i9 7980XE with 18 cores, only supports DDR4-2666 and has the same number of controllers, so it should hit ~80 GiB/s. It retails for $2000.
If you want more memory bandwidth you're either buying an even more expensive Xeon, or an AMD Epyc part.
Wrong. You neither read the OP or my citation. Eypc can push 95GiB/s with its 8 channels. TR2 isn't increasing the number of channels, so 2 CCX's (8 cores, 16hyper threads) don't have direct memory access. They have to work with in the inter-chip CCX bus to get memory access. This is going to carry a lot of performance penalties.
What? Are you seriously claiming that a workflow that only uses 4 GiB/s of memory throughput is somehow memory bound on a controller that can push 95 GiB/s? Workflow matters and affects reasonable CPU choice.
> You neither read the OP or my citation. Eypc can push 95GiB/s with its 8 channels.
Um, also not true. Your link does not include any memory throughput figures supporting your contention that TR is "memory bound." I believe you are totally mistaken about the 95 GiB/s figure for Epyc; see below:
TR, quad channel: https://en.wikichip.org/wiki/amd/ryzen_threadripper/1950x#Me...
Epyc, octa channel: https://en.wikichip.org/wiki/amd/epyc/7551p#Memory_controlle...
Epyc can push 159 GiB/s with DDR4-2666. Per socket:
> In a dual-socket configuration, the maximum supported memory doubles to 4 TiB along with the maximum theoretical bandwidth of 317.9 GiB/s.
> TR2 isn't increasing the number of channels, so 2 CCX's (8 cores, 16hyper threads) don't have direct memory access.
Understood. I have never said otherwise.
> This is going to carry a lot of performance penalties.
The extra cores will have higher NUMA latency than the memory-local cores, yes. But it does not somehow decrease the total memory bandwidth available across the CPU.
Quad Memory Channel + 32 Core + NVMe Sounds perfect to me for Sever. And it should be priced similar to the Xeon-D 16 Core.
Would 4 DIMM slots should be enough to saturate all controllers?
I don't know about TR working without a chipset. This picture suggests the X399 part isn't absolutely essential, but I don't know if there are other considerations that make it essential.
> Unlike Ryzen, the base processor is not a true SoC as the term has evolved over the years. In order to get the compliment of SATA and USB ports, each Threadripper CPU needs to be paired with an X399 chipset. So aside from the CPU PCIe lanes, the 'new' X399 chipset also gets some IO to play with.
So, maybe? If you are willing to forgo all the chipset-attached IO, and most PCIe lanes. That could be useful for a niche workflow or maybe a small form-factor desktop workstation that just needs CPU compute.
At one instance per core, this ..... ok, I'm getting carried away, aren't I ?
What is the best time to buy a CPU? After how many months does the biggest drop occur?
There are big dips, buy not related to time since release. I guess it's more about when a new version comes out. Anyway - you can verify whether that happened or not from the chart.