Desperate move by Intel. They are stuck with Cascade Lake until 2021, when their 'new' architecture will be available.
In order this Xeon to be under 300W TDP, they disabled the Hyper Threading and when they benchmark vs AMD Epyc they disabled AMD's SMT. Just wow.
Not only that, but they also apparently recompiled Linpack with the Intel compiler -- which is notorious for favoring Intel chips -- before running the benchmarks. Some really shady stuff going on here.
On the other hand, these shady practices are a testament to AMD's technical superiority, as the incumbent is showing himself to be very desperate to react although it has absolutely no answer to AMD's new line of products.
I like to hate on Intel as much as the next guy but as someone currently shopping for a new workstation, I still see Intel as having the better offerings across the board, at least if you're not taking price into consideration. Take for example their HEDT competitor, Threadripper. As much as I want to like it, the fact that two of the cores don't have direct memory access really is a problem. You can see the detrimental effects this can have: https://www.anandtech.com/show/13516/the-amd-threadripper-2-...
AMD does not have techinal superiority, as Intel cores have better single-thread performance. Look at the recent low-end i3-8100 - it is an amazing chip.
But Intel won't even give that to you. The i3-8100 doesn't have turbo boost, which means it's stuck at 3.6GHz and typically doesn't have faster single thread performance than AMD's faster chips. If you want Intel to sell you the one that hits 5GHz you have to pay for more cores which, if all you care about is single thread performance, you can't even use.
Single-threaded performance is a toss up and depends on work load. Overall AMD beats them on price/performance and multithreaded performance, which matter more for everything but some games and a few not easily parallelize-able tasks.
> Single-threaded performance is a toss up and depends on work load.
I would actually say the exact opposite is true. Single threaded performance is much more reliable and every single application can use it. Multithreaded performance is much more workload dependent, and there are many applications that can’t fully utilize it.
Did they run the icc-generated binaries on the Epyc processors? That's... questionable.
Apart from that, I'm fine with seeing numbers from icc because, if I'm shopping for the highest possible performance, that's probably the compiler I'll use for my code.
Performance results are based on testing or projections as of 6/2017 to 10/3/2018 (Stream Triad), 7/31/2018 to 10/3/2018 (LINPACK) and 7/11/2017 to 10/7/2018 (DL Inference) and may not reflect all publicly available security updates.
LINPACK: AMD EPYC 7601: Supermicro AS-2023US-TR4 with 2 AMD EPYC 7601 (2.2GHz, 32 core) processors, SMT OFF, Turbo ON, BIOS ver 1.1a, 4/26/2018, microcode: 0x8001227, 16x32GB DDR4-2666, 1 SSD, Ubuntu 18.04.1 LTS (4.17.0-041700-generic Retpoline), High-Performance Linpack v2.2, compiled with Intel(R) Parallel Studio XE 2018 for Linux, Intel MPI version 18.0.0.128, AMD BLIS ver 0.4.0, Benchmark Config: Nb=232, N=168960, P=4, Q=4, Score = 1095GFs, tested by Intel as of July 31, 2018. compared to 1-node, 2-socket 48-core Cascade Lake Advanced Performance processor projections by Intel as of 10/3/2018.
Stream Triad: 1-node, 2-socket AMD EPYC 7601, tested by AMD as of June 2017 compared to 1-node, 2-socket 48-core Cascade Lake Advanced Performance processor projections by Intel as of 10/3/2018. DL Inference:
Platform: 2S Intel Xeon Platinum 8180 CPU 2.50GHz (28 cores), HT disabled, turbo disabled, scaling governor set to “performance” via intel_pstate driver, 384GB DDR4-2666 ECC RAM. CentOS Linux release 7.3.1611 (Core), Linux kernel 3.10.0-514.10.2.el7.x86_64. SSD: Intel SSD DC S3700 Series (800GB, 2.5in SATA 6Gb/s, 25nm, MLC).
I really dislike Intel narketing practices. And their history on how they treat AMD. Intel is a reputable company, with strong products. Yet they sometimes act like kids.
That didn't turn out well. Once the Wintel standard got established, other CPU architectures were relegated to roles outside the desktop. Desktop and all but the highest end and specialized niches of servers is x86-land.
And Intel implemented the AMD64 instruction set for its own CPUs.
Are they using the ICC compiled Linpack on the AMD chips too? I think its totally fair as long as they used GCC or other best in class compiler for AMD.
One theory is that the new xeon comes with no HT to keep the power draw under 300W, which is why the footnotes mention no HT/SMT in test configuration...
This would also "fix" a lot of side channel attacks without new silicon being ready and Intel claims this CPU includes some hardware mitigations.
I'll be one of those to say that HT/SMT should probably die off unless the chipmakers learn how to make it secure. I'd rather have twice the physical cores for a little higher price anyway.
However, Intel starting to sell chips without HT has nothing to do with the insecurity of HT and everything to do with the fact that they are trying to sell chips that are cheaper to make (or more profitable), while also being a little more competitive (price-wise) with AMD's offerings.
But while they do this, they know they'll have a severe handicap in performance as long as AMD continues to sell competing products with SMT enabled. So their "solution" is to start making it "fair" in their benchmarks by using only half the cores or the SMT disabled for competing AMD products.
One of the first clues that tells us this move is not a response to the recent security vulnerabilities of HT is that they started offering chips without HT only a few months after those discoveries were made. Do you really think something like planning to design and ship a product without HT happened only a few months earlier? No, this was decided at least a year and a half ago.
Not everyone runs programs they don't trust on hardware they don't own. These sidechannel attacks are really only an issue for consumers with JIT JavaScript, and for enterprise with shared hosting. For HPC, or consumer/professional applications eg. gaming, rendering, encoding, ie. all those places where performance actually really matters, SMT is a great way to get a decent boost in a lot of workloads.
It seems like something that should be improved at the OS-level, where some workloads are untrusted and can'tb e HT'd next to others, whereas trusted software can be.
I don't think we can know for sure why they disabled HT. There's arguments for both. It's certainly possible that during the long Spectre embargo, someone was looking around and realized HT was going to be a problem. It's also not that hard to disable HT -- it's a flag somewhere in the on chip settings, in some parts of their product line, it's already done as part of binning and market segmentation. It's just a matter of deciding to have more SKUs with that set. Actually removing the die space used for HT would be different, but then we'd see chip families released without HT in any SKU.
L1 and L2 (IIRC) are usually tied to a single physical core and, also IIRC, are partitioned between logical cores when HT is enabled. With HT disabled, cache per logical core doubles (because there is only one) and cache misses go down. Many HPC installs run with HT disabled because of that (YMMV depending on specifics of your workload - if your hot data fits in the cache, you'll be happy).
Now that we mention it, I wonder if HT can be enabled or disabled per core. If that's possible, the machine would look like it has some logical cores with more cache than others and we could use cache miss counters as input to decide where a process should run.
>With HT disabled, cache per logical core doubles (because there is only one) and cache misses go down.
This is too naive to look at in such a way. Yes if there is less scheduling on the core (less concurrency) the likehood of misses is lower, but the idea is to pay less for each miss doing effectively something else meanwhile instead of being stalled for memory.
>Now that we mention it, I wonder if HT can be enabled or disabled per core.
OS scheduler can do that. Normally the guidance for hyperthreading is: consider each "thread" as a separate core. If an OS chooses to use one core at a time and not schedule anything, the hypertrheadind won't be used. That is imagine logical core0&core 1 represent the same physical one, if the OS schedules only to core0, hyperthreading is not in play.
Notes: controlling scheduling on linux is done via taskset. And cores info is available via "cat /proc/cpuinfo"
Back when the first CPUs with HT came into the market I got a developer workstation with them.
After trying out WebSphere developer tools (RAD) with HT enabled and disabled for several days, I eventually came to the conclusion that with them disabled everything just felt faster.
Also it is not clear if the security issues caused by HT, which affect other vendors as well, could ever be properly sorted out.
The first HT processors had smaller caches and that was before operating systems had SMT-aware schedulers.
The place where SMT really seems to benefit is in particular applications that can use arbitrarily many threads. This also points to a solution to the security issue, because you get most of the benefit even if you only schedule threads from the same process group together on the same core and it would be possible for programs to specify whether it's safe to do that (e.g. yes for a render, no for a web browser).
That would also improve SMT performance in general since SMT makes the performance of some programs worse, and they could use the same mechanism to opt out of it.
True, but the scheduler can't tell the CPU to allocate all of the partitioned cache to a thread if it the silicon thinks it belongs to another logical core, so you end with effectively half of your cache dark. If L1 and L2 are pooled between the logical cores, it makes no difference, but I'm not sure how HT behaves WRT to partitioning caches on current generations of Intel parts and I remember it used to be like that (because I turned off HT on a couple dozen boxes to improve single thread performance, which was what was bothering me).
Got back from Intel's website. It seems the cache was partitioned on early HT implementations, but now L1 and L2 are shared between the logical cores and, therefore, it should make no difference to disable HT.
The caches aren’t “partitioned” between logical cores. You have two cores generating memory requests, and the combined working set could be larger than the caches, but there is no partitioning going on, in the sense of allocating cache lines to one thread or the other.
> also IIRC, are partitioned between logical cores when HT is enabled.
A lot of things are partitioned. But my understanding is that L1 and L2 cache is competitively shared. Which means if one thread uses 512 bytes of cache, the OTHER thread can use the full 32kB for itself.
There are other benefits: Both hyperthreads can share L1-code cache for instance, if there's a tight loop that is shared between the two threads.
Lower-level things like the retirement engine (aka: shadow registers / out-of-order registers) are partitioned. I dunno whether or not the branch predictor is shared or partitioned. But... things like that.
Do you see any performance differences when enabling/disabling? Our cluster uses Sandy Bridge through Skylake chips and all of our nodes have HT disabled. When things were tested (before I got here) there apparently was a non-trivial difference for the applications running. Curious what your results are.
The only significant issues I’ve seen is that many libraries using threading are or were a little naive in that they’d take the core count to be thread count. Better libraries know the difference
But I’m not aware of any benchmarks that pin one execution thread per physical core and compare with and without hyperthreading. I guess I have some homework to do, but then it’s a lot of effort when the results could depend on generation, clock speed etc.
The most pertinent technical questions and concern seems to be around the interconnects, since inter-die bandwidth and latency are big factors in performance. I think the major surprise/disappointment there is that within the MCM they apparently are not using their Embedded Multi-die Interconnect Bridge (EMIB) which was supposed to be their next-gen interposer replacement. They have been pretty flowery about it, right up to on the official page [1] having fun with a Star Wars riff and calling EMIB "an Elegant Interconnect for a More Civilized Age". Looks like it's all UPI here though. According to Cutress it also isn't clear exactly how many UPI link are used between the two sockets.
It's not like MCM usage is some necessarily exotic thing, IBM has been doing it in various forms including for POWER forever. Even beyond TDP however Intel's approach here definitely seems kind of "odd" to put it neutrally. I think if this is going to be a major part of their strategy going forward they'll need a different approach, but for that very reason I wonder if this is more of a one-off placeholder?
most of the pins are vcc and gnd, but still impressive indeed. 6502 had 3500transistors, 40pins... and multiple address modes (with zero page+y being my fav)
I find the timing of this announcement a bit amusing. Given AMD's planned 'new horizon' announcements today, I'm assuming Intel rushed out this press release over the weekend in order to preempt AMD.
It allows all those going to be EPYC2 buyers know or at least have an upcoming Intel chip in their mind, hopefully delaying the purchase decision until later and gives Intel sales team time to do their job. Which so far their sales team has been doing spectacularly well and literally every one who has been complaining about Intel continues to buy Intel. Sad but true.
We should know more about Zen 2 in a few hours time.
>and literally every one who has been complaining about Intel continues to buy Intel
While I am sure this isn't LITERALLY true, there is some truth to it. I work at a smallish company, who bought 3 new Intel servers about 5 months before Epyc servers started to be available from OEM's. We plan on adding another server to our stack next year, I would LOVE to go with Epyc, but I am locked into Intel at this point (until we decided to replace the entire stack). VMWare doesn't allow you to mix CPU architecture and maintain HA. So, we stuck buying a crappy Intel Xeon instead.
Judging from the Server shipment and market share this is exactly what is happening. But to be fair, many fought for EPYC, but those who made the decisions, CEO / CFO / CIO / Purchasing Director etc are all well connected with Intel.
> Add that to AMD's recent surge, Intel's in a world of hurt.
Meh. Looking at the sales numbers of past quarters, AMD still hasn't significantly hurt Intel. They have made a lot of inroads among hobbyists and in small-store prebuilts, but the major brands still appear to be dominated by Intel.
It's a real shame that nobody seems to have made an actually good Raven Ridge laptop. The CPU is imho objectively better for laptop use than any Intel competitor, but all laptops built with it appear to have a major deficit of some kind or another.
Another mystery is lack of Epyc uptake. There are a lot of workloads where Epycs should be dominating Intel server chips when compared at the same price or power level, especially after all the recent chip vulnerabilities which are much less serious for AMD than they are for Intel, but Epyc sales are definitely lagging. Is AMD unable to make them, or are companies just wary of buying AMD?
There's a lot of interest in Epyc for certain use cases. It's just the server world moves slowly because when you're spending millions on massive amounts of hardware, and it needs to run 24/7 for years without defects, you want to make sure it works.
THIS! I am not sure why people are expecting Epyc to take off like a rocket, it won't. Companies maintaining 24/7 hardware don't jump on new tech. I think Epyc will eventually take a good chunk of the market, but it will take a few years.
Also, if you are expanding an existing VMWare stack you can't mix CPU architecture and have HA. So unless you are building a new stack, then Epyc might not even be an option, even if the Admin might prefer Epyc.
Cloud moves slowly because you don't want to force users to redeploy and, once racked, those boxes stay there for a good couple years. In-house ever-expanding large installs are much more agile. Look at how Google adopted POWER, to how Cray is offering ARM nodes in their supercomputers.
So, I work as what is effectively a sysadmin role for a decently large deployment in a decently large company and I can say that we're buying EPYC for some stuff but not everything _yet_.
Build farms, where there's a lot of compute needed and it doesn't really matter the reliability of the systems, for instance, have EPYC. Some workstations have Threadripper (nobody has Ryzen) as time goes on, with AMD beating Intel in the middle market, we will expand from builders to VM farms, from VM farms to bare metal hosting, from bare metal hosting to underpinning entire projects.
It will take another 2 years of AMD being on top for our company to favour AMD. But I think it's possible.
It's not the number of sales alone that tells the story. It's also the profit margin. Apple is well known for allowing very low margins to their suppliers. It's not clear to me how much of Intel's revenue comes from Apple.
> Don’t underestimate AMD’s ability to shoot itself in the foot.
Every single consumer-targeted Intel chip has integrated graphics capable of displaying a 4k movie on the screen/tv without so much as breaking a sweat. AMD has exactly 3 Ryzen chips with that capability, none of which fit into their "Lot's of cores!" strategy. I'm not sure that AMD understands what drives the majority of consumer purchases.
Non-technical, non-gaming consumers don't care about integrated graphics. They're fine with whatever's in the box. But for folks like me (and many others here) who build their own machines, we typically don't settle for integrated graphics. I have 2 4k monitor and I occasionally game, sure the latest and greatest Intel chips can probably do the multi-monitor setup, but I doubt you can play Call of Duty Black Ops 4 on them and expect a reasonable framerate.
It takes a long time to design and fab new layouts. So at best you would get firmware mitigations baked in. Or, as mentioned in other comments, the HT sidechannels might be fixed simply because HT is turned off.
Intel also did another nasty one, after being caught lying about it with another benchmark they did in the same way: they disabled EPYC's SMT, likely because this chip doesn't have SMT, so they don't want AMD's chip to benefit from it either, even though in real world scenarios it will.
The die shot looks to have 15 core modules at 2 cores each. I assume the 28-core variant lasers out 2 for yields and clockspeeds. I guess this one would then be cutting out 6 cores per chip. Big chips really aren't the way forward.
Per the caption it’s an Ivy Bridge, i.e. not the Xeon the article talks about.
There’s an annotated die photo of the CLAP (seriously, did nobody in Intel marketing predict “Intel gives you the CLAP”, “Intel CPUs have the CLAP” and khibosh this name?!) predecessor die at [1].
Intel Xeon Platinum 8180 is currently sold for $10k USD each, it comes with 28 cores. For 48 cores and 12 memory channels, you can almost be assured to see a price tag of close to $20k USD. So basically when they fail to come up with some better architectures to face the competition from AMD, they choose to glue up their out of date designs (out of date when compared with Zen2 based new Epyc line) and sell you as a bundle without significant price cut!
To be honest, I am not surprised - we are talking about a company that worked so hard for 10 years to fool its customers on why they don't need more than 4 cores on desktops. The whole joke is so INTEL!
If it costs twice as much per socket but the same per core, and allows me to use half as much space, half as much cooling, 25% less energy to run the same workloads for the next five years, I'm in.
It's wild that so many people here think that e.g. major cloud vendors use the same purchasing logic for multi-hundred-million dollar deals on tens/hundreds of thousands of units, that a 17 year old uses when deciding how to spend $700 on a new computer. "They're scared", "Zen is a superior product", "EPYC/Rome will blow them away", etc. It doesn't actually matter.
Not only do many customers not pay the list price you see, they get custom SKUs specifically designed for them -- which is a growing portion of their Xeon business and one AMD has no clear, large-scale answer to AFAICS. And they basically can't get enough of them. Xeons practically print money and Intel recently announced something like 16% revenue increases and even higher demands for Xeon than they've ever had. They just invested an extra billion in new fab production just because they're at capacity right now...
Intel is going to remain very strong in this market for a good, long time as far as I can tell, and things like list SKU prices are an incredibly, ridiculously small reason to believe otherwise.
That said I'm very happy with the price/power ratio of my TR 1950X.
> which is a growing portion of their Xeon business and one AMD has no clear, large-scale answer to AFAICS
Probably depends on the size of the company. AMD has their semi custom department which was responsible for the PS4 and Xbox One, as well as PS4 Pro, Xbox One X (the GPUs are based on Polaris but have features from Vega, which no normal Polaris GPU has) and Subor Z (Zen APU with GDDR5 as main memory like the consoles, but can run Windows).
AMD is probably the better vendor if you want custom SKUs (as the underdog they also have more incentive for custom SKUs).