Also, as a person who used to work at Intel, I don't know whose idea this was, but that person should probably have a long hard look at themselves -- hardware people are exactly the people that this kind of shit wouldn't fly with, because they'll almost always ask for details and can spot a hack from a mile away.
On the one hand I can sympathize with Intel -- seeing how tough it was to stay on the market year over year, trying to predict and start developing the next trend in hardware. But on the other hand... Why in the world would you do this -- Intel basically dominates the high end market right now, just take your time and make a properly better thing.
This is the opposite though?
The dedicated servers are turning into HEDTs. AMD 32-core EPYC has been available since last year, and Intel's 28-core Skylake (although $10,000) has been also available for a year.
So dedicated servers got this tech first, then HEDT got it a bit later. I guess Threadripper is Zen+ so its technically HEDT gets the 12nm tech first, but the 32-core infrastructure was in EPYC first.
In practice the only difference between the dedicated servers of yesteryear and HEDTs of today is becoming perception (well and some very specific features), and considering computational load of most things hasn't actually gotten that much bigger, in addition to a proliferation of langauges that can adequately use multiple cores, feels like everything is looking to get cheaper yet better -- that's what excites me.
Presenting it as an extreme overclocking demo would have been a much wiser option.
Don't assume they would. Plenty of purchasing in large companies is associated with some higher up hearing about something, wanting it, then buying it to 'help' in some obscure way.
I can imagine few cases where first-to-right-the-bell performance on a single core determines if you get a specific quote in HFT but that's that.
That is actually the case from what I've heard, A lot of them buy consumer chips then disable all but one core and overclock it to the max.
Here is a guy from optiver talking about their process at CppCon: https://www.youtube.com/watch?v=NH1Tta7purM
Edit: Plus, the TR4 socket is guaranteed to be supported for 4 years, per AMD's roadmap at https://community.amd.com/thread/226363
Regardless damn good on AMD.
Unfortunately they just state "... some first-generation X399 motherboards may not be able to deliver enough power..."
And not specifically which.
If their statement is false then that's on them.
It's also a big reason I'm not going with Intel, since I know I can upgrade to something significantly better without having to get a new motherboard.
Still a better scenario than changing the socket all the time, but it can catch you out if you're used to Intel's "socket = generation" philosophy.
Source: happened to me a few weeks back.
The two things I don't like is that their CPUs are pin based. It seemed kind of old fashioned after Intel CPUs. But this is really a minor thing. The other issue is memory compatibility is a bit finicky. Maybe it has to do with the CPU being so new. Not sure.
To me that's a win. If a CPU pin is bent it's typically fairly easy to straighten it. Fixing a bent pin a socket is a massive pain.
But it's much easier to protect socket pins with the cover. So there are pros and cons either way.
Good luck. You better have a loop or magnifier.
I buy Intel Mobile CPU boards for exactly this reason. Pins.
I have not bent a single one in years.
The Notebook CPU's are so thermally efficient it is a real selling point to me.
But in Threadripper (2-dies) or EPYC (4-dies), the "infinity fabric" bus is what connects the CPUs and Memory-controllers together.
Besides I wanted to replace the i7 920, so that it won't be that hot anymore in that room (130W TDP vs 65W). I think a threadripper would achieve the opposite.
Maybe I should just do seasonal CPUs... Threadripper in Winter and Ryzen in Summer.
Running 2 gaming VMs for me and my kids and one ubuntu WS on a 12 core Threadripper. Host is also doing file serving and runs Unifi controller.
One server to rule them all.
The LGA1366 motherboards still fetch some money too, if you'd rather sell it.
I can understand HPC applications where the high-speed interconnect on the chip would make a big difference.
But in business applications where the cores are dedicated to running independent VMs, or are handling independent client requests, what is really gained? There would still be some benefits from a shared cache, but how large quantitatively would that be?
On a TR 1920x system:
$ numactl --hardware
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
node 0 size: 32107 MB
node 0 free: 20738 MB
There's a BIOS setting. I personally enabled it using AMD's "Ryzen Master" program to setup NUMA mode (aka: "Local" mode in Ryzen Master).
 - https://www.anandtech.com/show/11697/the-amd-ryzen-threadrip...
e7-4860:~ Mon Jun 11
03:06 PM william$ numactl --hardware
available: 4 nodes (0-3)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 40 41 42 43 44 45 46 47 48 49
node 0 size: 16035 MB
node 0 free: 1306 MB
node 1 cpus: 10 11 12 13 14 15 16 17 18 19 50 51 52 53 54 55 56 57 58 59
node 1 size: 16125 MB
node 1 free: 3237 MB
node 2 cpus: 20 21 22 23 24 25 26 27 28 29 60 61 62 63 64 65 66 67 68 69
node 2 size: 16125 MB
node 2 free: 11004 MB
node 3 cpus: 30 31 32 33 34 35 36 37 38 39 70 71 72 73 74 75 76 77 78 79
node 3 size: 16123 MB
node 3 free: 12044 MB
node 0 1 2 3
0: 10 20 20 20
1: 20 10 20 20
2: 20 20 10 20
3: 20 20 20 10
On this server, CPU socket 0 is hardwired to ram slots 0-15
CPU 1 to ram slots 16-31
CPU 2 to ram slots 32-47
CPU 3 to ram slots 48-63
If CPU 0 wanted to read something outside of its local ram slots, it would have execute something on CPU n, then copy that segment to its local ram group.
I've got a Threadripper 1950x and got 2x NUMA nodes. You gotta enable a BIOS setting.
Second: "$ numactl --hardware " is a Linux command. The Windows equivalent is coreinfo.
In my usage case, the core/thread count really helps DB2's SQL implementation as an iSeries is effectively a giant DB2 database with extras added on. Hence query engine (SQE/CQE see old doc  on our machine can make great use of many cores/threads. When serving data to intensive batch applications as well as thousands of warehouse users and double that through web services access to data is the name of the game.
 https://www.ibm.com/support/knowledgecenter/en/ssw_i5_54/rza... <- that is quite a few years old but describes the query engines available - CQE is 'legacy' and SQE is modern
I've seen several mainframe companies dogmatically believing their sales rep their workload is special and needs a high-end system. But none of them I've talked to have actually tested for themselves.
First is that a single-socket motherboard is still a simpler design to produce with all the advantages that entails.
Second is that you’re allowed to stick two of these on a two-socket board for CPU-bound loads. Better density for when you have the thermal capacity to spare.
Databases are the big one I'm aware of.
Intel's L3 cache is truly unified. Intel's 28-core Skylake means that the L3 of a Database is TRULY 38.5MB. When any core requests data, it goes into the giant distributed L3 cache that all cores can access efficiently.
AMD's L3 cache however is a network of 8MB chunks. Sure, there's 32MB of cache in its 32-core system, but any one core can only use 8MB of it effectively.
In fact, pulling memory off of a "remote L3 cache" is slower (higher latency) than pulling it from RAM on the Threadripper / EPYC platform. (A remote L3 pull has to coordinate over infinity fabric and remain cohesive! So that means "invalidating" and waiting to become the "exclusive owner" before a core can start writing to a L3 cache line, well according to MESI cc-protocol. I know AMD uses something more complex and efficient... but my point is that cache-coherence has a cost that becomes clear in this case. ) Which doesn't bode well for any HPC application... but also for Databases (which will effectively be locked to 8MB per thread, with "poor sharing", at least compared to Xeon).
Of course, "Databases" might be just the most common HPC application in the enterprise, that needs communication and coordination between threads.
This is less true now. Intel's L3 cache is still all on one piece of monolithic silicon, unlike the 4 separate caches of the 4 separate dies on a 32-core TR. But the L3 slice for each core is now physically placed right next to the core and other slices are accessed through the ringbus or in Skylake and later, the mesh. Still faster than leaving the die and using AMD's Infinity Fabric, and a lot less complicated than wiring up all the cores for direct L3 access.
However when there is communication necessary - the length of the bus matters and having the dies next to each other does help a lot
On the CPU side, AMD has patched Linux way before Ryzen was available in shops and has been contributing various patches afterwards.
I'd say they are working to get a decent track record for their Ryzen and Vega lineups.
Being one of the top contributors in Linux/OSS/FLOSS doesn't allow them to come clean from their 28-core, inadvertently but conveniently miscommunicated demo.
Intel's iGPUs had the best Linux drivers for the longest time, while AMD just managed to mainline their GPU drivers into the 4.15 kernel.
I think AMD is "worse" at it but for understandable reasons. They're a smaller company, so it takes a bit longer for AMD to release low-level manuals. (Ex: still waiting on those Zen architecture instruction timings AMD!!). Even then, AMD releases "draft" manuals which allow the OSS world to move. So the important stuff is still out there, even as AMD is a bit slow on the documentation front.
Basically, Intel is a bigger company and handles the details a bit better than AMD. But both are highly supportive of OSS in great degrees.
Unsurprisingly, older generation models tend to dominate that list. Keep in mind that the thermal efficiency for Xeon v2 and v3 models is lower than v4; Passmark does not include power draw in this ranking. If you can afford the extra power usage and don't mind DDR3 RAM, high clockspeed + high core CPUs can be had relatively cheaply by going with V2.
Intel has done an amazing done stuffing 28 cores into one piece of silicon and extracting as much performance as possible all for the low price of $10k.
AMD took their 8 core part that they are selling essentially up and down their product line... and slapped 4 of them together.
Also, Intel's $350 8700K was cheaper at launch than AMD's $450 1800X even though the 8700K is faster in gaming.
When AMD fabs a TR2 they have to find 4 good Ryzen dies which are fairly small and they make a lot of them since they are part of the entire lineup. Once they have 4 good dies, they get glued together.
If they wanted 64 cores they'd just have to look for 8 good Ryzen dies, halving their yield compared to 32.
On the Intel side they have to increase the silicon area and then hope that all 64 cores are capable of full core speed in the current setup.
Glueing CPUs together makes it cheap to scale at little cost.
AMD's advantage of picking and choosing smaller parts still reigns supreme. If a Infinity Fabric component on the AMD die is defective enough to necessitate it is turned off the die is no longer able to participate in some of the more complex multi-die couplings. If the mesh component on an Intel die is defective the core(two actually because SKUs) has to be fused off and the part automatically bins as a lower SKU. As the mesh increases in complexity more and more of the design can be compromised.
And it is still $960 on Newegg: https://www.newegg.com/Product/Product.aspx?Item=N82E1681911...
The "price-competitive" i9-7900x is 10-cores for $799, and seems to be the best price-competitive comparison. Better single-thread, better at AVX512 tasks, but weaker in general purpose multithreading due to having fewer cores.
As of 2018-06-11 18:27 PDT (when I clicked on that link) the current Newegg price is $799.99.
AMD’s interconnect seems fast enough, and they don’t have the yield/cost problems from massive single die chips.
Edit: initial reports said that AMD was only planning to announce the 24-core CPU, and may have advanced the announcement of the 32-core chip due to Intel’a stunt. TFA doesn’t mention that, so possibly the initial reports were not accurate.
AMD will already launch their 7nm EPYC processor based on Zen 2 in 2019 (skipping Zen+ used by the new Threadripper and Ryzen 2xxx) which is expected to have 48 cores (some rumors even suggest 64 cores but that seems more likely for 7nm EUV instead of the first 7nm processes). So they will have no problem releasing more cores with Threadripper 3 next year (if they keep the yearly releases).
On top of that, in my layman eyes AMDs aproach of using infinity fabric to connect dies seems better suited to react to changes compared to Intels monolithic design.
Each core in those chips was seriously underclocked compared to a Xeon of similar vintage and price point (1-1.67 GHz; compared to 1.6 GHz to 3 or more), and lacked features like out-of-order execution and big caches that are almost minimum requirements for a modern server CPU. Sun hoped to make up for the slow cores in server applications with having more cores and having multiple threads per core (though with a simpler technology than SMT/hyper-threading).
However, Oracle eventually decided to focus on single-threaded performance with its more recent chips - it turns out that no OoO and < 2 GHz nominal speeds look pretty bad for many server applications. My suspicion is that even though the CPU-bound parts of games are becoming more multi-threaded, AMD will be forced to fix its slower architecture or lose out to Intel again in the server AND high-end desktop markets in a few years.
As always, if you really care about a single application, then you test it. But I wouldn't say that Intel wins on all single thread or few thread workloads anymore.
Especially if you consider that often a new Intel CPU requires a new Intel motherboard, and AMD often keeps motherboard compatibility across multiple generations, like the Ryzen and Ryzen 2.
Even Z170 can run 8700K, 
Z170 (and Z270) needs a cooked bios and the cpu needs a pin short (easy with a pencil 4B) --- and it can even be overclocked if the motherboard VRM is good enough.
Wasn't that 10-20% before meltdown? Or is there still some similar disadvantage in clock speed per Watt or IPC?
And I'm not sure it's possible to fairly discount Intel's performance with meltdown mitigations applied. I think the impact will vary depending on workload.
That's why I wanted to keep it out of the discussion and just mentioned Meltdown, AFAIK Spectre applies to both so it would be pure speculation to identify who'll be hit harder.
> And I'm not sure it's possible to fairly discount Intel's performance with meltdown mitigations applied. I think the impact will vary depending on workload.
I think we have this problem already all the time (with or without mitigations applied), that's why we (should) interpret benchmarks only as a proxy.
That makes sense. Many people conflate the two, so I just wanted to be explicit about what I was saying :-).
> I think we have this problem already all the time (with or without mitigations applied), that's why we (should) interpret benchmarks only as a proxy.
That's totally reasonable. I think there were some discussions of the impact of the Meltdown patch (on Intel performance) on the LKML list at the time the patch(es) were being reviewed. (Other OS may have different perf impact for their Meltdown mitigations, of course, but it helps ballpark.)
Here's some discussion on anandtech, although it doesn't measure Spectre mitigations alone vs Spectre+MD; only base, MD alone, and MD+Spectre:
Most of that clock hit comes from the 12nm LPP process that AMD is currently using too from what I can tell, low-power process typically equates to lower clocks (see mobile chips) so it's not surprising - and why Zen 2 being based on GloFo's 7nm process will hopefully close that gap.
Personally, consider the amount of random crap that i run, I'd rather have more cores. And more importantly, 80% of the performance for 50% of the price is just fine(tm).
More cores do benefit if you run stuff besides the game, which most people do.
Sadly for AMD that would be IF i needed a new machine. My 2012 Core i7 still seems to be enough for my needs. (Except the GPU, that I changed recently.)
That's not base at any rate, you can't get all cores at 4.7Ghz stock, 4.7 is a turbo-boost single core. No way you get that w/o a pretty decent cooler on a non-delidded CPU unless happen to have a chip that requires no extra voltage at all.
It's true that many games are predominantly single core, though but it's likely to get a single core 4.5Ryzen as well.
Intel is mainly faster because of a significant clock advantage. Clock for clock the advantage is small.