Hacker News new | past | comments | ask | show | jobs | submit login
Intel announces Cascade Lake Xeons: 48 cores and 12-channel memory per socket (arstechnica.com)
149 points by rbanffy 7 months ago | hide | past | web | favorite | 104 comments

Desperate move by Intel. They are stuck with Cascade Lake until 2021, when their 'new' architecture will be available. In order this Xeon to be under 300W TDP, they disabled the Hyper Threading and when they benchmark vs AMD Epyc they disabled AMD's SMT. Just wow.

Not only that, but they also apparently recompiled Linpack with the Intel compiler -- which is notorious for favoring Intel chips -- before running the benchmarks. Some really shady stuff going on here.

On the other hand, these shady practices are a testament to AMD's technical superiority, as the incumbent is showing himself to be very desperate to react although it has absolutely no answer to AMD's new line of products.

That's not something you can cite or point to when justifying a purchasing choice to $pointy_haired_boss

I like to hate on Intel as much as the next guy but as someone currently shopping for a new workstation, I still see Intel as having the better offerings across the board, at least if you're not taking price into consideration. Take for example their HEDT competitor, Threadripper. As much as I want to like it, the fact that two of the cores don't have direct memory access really is a problem. You can see the detrimental effects this can have: https://www.anandtech.com/show/13516/the-amd-threadripper-2-...

AMD does not have techinal superiority, as Intel cores have better single-thread performance. Look at the recent low-end i3-8100 - it is an amazing chip.

But Intel won't even give that to you. The i3-8100 doesn't have turbo boost, which means it's stuck at 3.6GHz and typically doesn't have faster single thread performance than AMD's faster chips. If you want Intel to sell you the one that hits 5GHz you have to pay for more cores which, if all you care about is single thread performance, you can't even use.

Single-threaded performance is a toss up and depends on work load. Overall AMD beats them on price/performance and multithreaded performance, which matter more for everything but some games and a few not easily parallelize-able tasks.

> Single-threaded performance is a toss up and depends on work load.

I would actually say the exact opposite is true. Single threaded performance is much more reliable and every single application can use it. Multithreaded performance is much more workload dependent, and there are many applications that can’t fully utilize it.

You might be surprised what is single-threaded bound in practice.

Such as Video Editors: Adobe Premier in particular.


Did they run the icc-generated binaries on the Epyc processors? That's... questionable.

Apart from that, I'm fine with seeing numbers from icc because, if I'm shopping for the highest possible performance, that's probably the compiler I'll use for my code.

from this [0]

Performance results are based on testing or projections as of 6/2017 to 10/3/2018 (Stream Triad), 7/31/2018 to 10/3/2018 (LINPACK) and 7/11/2017 to 10/7/2018 (DL Inference) and may not reflect all publicly available security updates.

LINPACK: AMD EPYC 7601: Supermicro AS-2023US-TR4 with 2 AMD EPYC 7601 (2.2GHz, 32 core) processors, SMT OFF, Turbo ON, BIOS ver 1.1a, 4/26/2018, microcode: 0x8001227, 16x32GB DDR4-2666, 1 SSD, Ubuntu 18.04.1 LTS (4.17.0-041700-generic Retpoline), High-Performance Linpack v2.2, compiled with Intel(R) Parallel Studio XE 2018 for Linux, Intel MPI version, AMD BLIS ver 0.4.0, Benchmark Config: Nb=232, N=168960, P=4, Q=4, Score = 1095GFs, tested by Intel as of July 31, 2018. compared to 1-node, 2-socket 48-core Cascade Lake Advanced Performance processor projections by Intel as of 10/3/2018.

Stream Triad: 1-node, 2-socket AMD EPYC 7601, tested by AMD as of June 2017 compared to 1-node, 2-socket 48-core Cascade Lake Advanced Performance processor projections by Intel as of 10/3/2018. DL Inference:

Platform: 2S Intel Xeon Platinum 8180 CPU 2.50GHz (28 cores), HT disabled, turbo disabled, scaling governor set to “performance” via intel_pstate driver, 384GB DDR4-2666 ECC RAM. CentOS Linux release 7.3.1611 (Core), Linux kernel 3.10.0-514.10.2.el7.x86_64. SSD: Intel SSD DC S3700 Series (800GB, 2.5in SATA 6Gb/s, 25nm, MLC).


How bad is icc on EPYC compared to, say, gcc?

I really dislike Intel narketing practices. And their history on how they treat AMD. Intel is a reputable company, with strong products. Yet they sometimes act like kids.

I think they already repented multiple times to ever had licensed x86 designs to other companies.

Without AMD we would all be using some variation of Itanium.

Early on they had little choice. If they wanted to sell to the US government, they needed to allow other suppliers to compete.

Were the other suppliers required to compete with x86 clones?

They had to compete against Intel.

It's a quite sane policy: if you want to sell to me, make sure I'm not locked in with you.

They could compete with having better hardware instead of blank copies.

I remember our government had other computers besides PCs and none of them had CPU clones.

As far as I remember US government was a big consumer from DEC, Xerox and IBM hardware and they also did not have any CPU clones.

That didn't turn out well. Once the Wintel standard got established, other CPU architectures were relegated to roles outside the desktop. Desktop and all but the highest end and specialized niches of servers is x86-land.

And Intel implemented the AMD64 instruction set for its own CPUs.

True, but that is a complete different matter than having the Government requiring Intel to sponsor clones.

Itanium had a great potential as the next-gen CPU arch. Now we're stuck with the heavily emulated x86.

It even had an x86 emulator

Even with all the shade, AMD still beats them in price/performance. That and performance/watt matter the most if you're building a big data center.

Are they using the ICC compiled Linpack on the AMD chips too? I think its totally fair as long as they used GCC or other best in class compiler for AMD.

One theory is that the new xeon comes with no HT to keep the power draw under 300W, which is why the footnotes mention no HT/SMT in test configuration...

This would also "fix" a lot of side channel attacks without new silicon being ready and Intel claims this CPU includes some hardware mitigations.

I'll be one of those to say that HT/SMT should probably die off unless the chipmakers learn how to make it secure. I'd rather have twice the physical cores for a little higher price anyway.

However, Intel starting to sell chips without HT has nothing to do with the insecurity of HT and everything to do with the fact that they are trying to sell chips that are cheaper to make (or more profitable), while also being a little more competitive (price-wise) with AMD's offerings.

But while they do this, they know they'll have a severe handicap in performance as long as AMD continues to sell competing products with SMT enabled. So their "solution" is to start making it "fair" in their benchmarks by using only half the cores or the SMT disabled for competing AMD products.

One of the first clues that tells us this move is not a response to the recent security vulnerabilities of HT is that they started offering chips without HT only a few months after those discoveries were made. Do you really think something like planning to design and ship a product without HT happened only a few months earlier? No, this was decided at least a year and a half ago.

Not everyone runs programs they don't trust on hardware they don't own. These sidechannel attacks are really only an issue for consumers with JIT JavaScript, and for enterprise with shared hosting. For HPC, or consumer/professional applications eg. gaming, rendering, encoding, ie. all those places where performance actually really matters, SMT is a great way to get a decent boost in a lot of workloads.

It seems like something that should be improved at the OS-level, where some workloads are untrusted and can'tb e HT'd next to others, whereas trusted software can be.

I don't think we can know for sure why they disabled HT. There's arguments for both. It's certainly possible that during the long Spectre embargo, someone was looking around and realized HT was going to be a problem. It's also not that hard to disable HT -- it's a flag somewhere in the on chip settings, in some parts of their product line, it's already done as part of binning and market segmentation. It's just a matter of deciding to have more SKUs with that set. Actually removing the die space used for HT would be different, but then we'd see chip families released without HT in any SKU.

It's still rather telling that they compare only with the equivalent also off on the AMD chip.

Especially since AMDs SMT is better.

It also doubles the effective amount of cache per core. I kind of like that.

do you mean the L2 or L3 cache?

Hyperthreading is there to help reduce the times the CPU is stalled.

If you run less active threads, the cache efficiency would the same in that case, hypethreading or not.

L1 and L2 (IIRC) are usually tied to a single physical core and, also IIRC, are partitioned between logical cores when HT is enabled. With HT disabled, cache per logical core doubles (because there is only one) and cache misses go down. Many HPC installs run with HT disabled because of that (YMMV depending on specifics of your workload - if your hot data fits in the cache, you'll be happy).

Now that we mention it, I wonder if HT can be enabled or disabled per core. If that's possible, the machine would look like it has some logical cores with more cache than others and we could use cache miss counters as input to decide where a process should run.

>With HT disabled, cache per logical core doubles (because there is only one) and cache misses go down.

This is too naive to look at in such a way. Yes if there is less scheduling on the core (less concurrency) the likehood of misses is lower, but the idea is to pay less for each miss doing effectively something else meanwhile instead of being stalled for memory.

>Now that we mention it, I wonder if HT can be enabled or disabled per core.

OS scheduler can do that. Normally the guidance for hyperthreading is: consider each "thread" as a separate core. If an OS chooses to use one core at a time and not schedule anything, the hypertrheadind won't be used. That is imagine logical core0&core 1 represent the same physical one, if the OS schedules only to core0, hyperthreading is not in play.

Notes: controlling scheduling on linux is done via taskset. And cores info is available via "cat /proc/cpuinfo"

Back when the first CPUs with HT came into the market I got a developer workstation with them.

After trying out WebSphere developer tools (RAD) with HT enabled and disabled for several days, I eventually came to the conclusion that with them disabled everything just felt faster.

Also it is not clear if the security issues caused by HT, which affect other vendors as well, could ever be properly sorted out.

The first HT processors had smaller caches and that was before operating systems had SMT-aware schedulers.

The place where SMT really seems to benefit is in particular applications that can use arbitrarily many threads. This also points to a solution to the security issue, because you get most of the benefit even if you only schedule threads from the same process group together on the same core and it would be possible for programs to specify whether it's safe to do that (e.g. yes for a render, no for a web browser).

That would also improve SMT performance in general since SMT makes the performance of some programs worse, and they could use the same mechanism to opt out of it.

Yes I do agree the first HT generations were quite lame.

Regarding threads, I used to be very keen on them when they became available everywhere, yet nowadays I see them as yet another attack vector.

It is not only the security issues, also the overall application stability, as a buggy thread can bring the whole application down.

> OS scheduler can do that

True, but the scheduler can't tell the CPU to allocate all of the partitioned cache to a thread if it the silicon thinks it belongs to another logical core, so you end with effectively half of your cache dark. If L1 and L2 are pooled between the logical cores, it makes no difference, but I'm not sure how HT behaves WRT to partitioning caches on current generations of Intel parts and I remember it used to be like that (because I turned off HT on a couple dozen boxes to improve single thread performance, which was what was bothering me).

L1&L2 are exclusive to each designated core. L3 is shared (hence my very 1st question); also updated the info how to force scheduling on your own.

Got back from Intel's website. It seems the cache was partitioned on early HT implementations, but now L1 and L2 are shared between the logical cores and, therefore, it should make no difference to disable HT.

The caches aren’t “partitioned” between logical cores. You have two cores generating memory requests, and the combined working set could be larger than the caches, but there is no partitioning going on, in the sense of allocating cache lines to one thread or the other.

> also IIRC, are partitioned between logical cores when HT is enabled.

A lot of things are partitioned. But my understanding is that L1 and L2 cache is competitively shared. Which means if one thread uses 512 bytes of cache, the OTHER thread can use the full 32kB for itself.

There are other benefits: Both hyperthreads can share L1-code cache for instance, if there's a tight loop that is shared between the two threads.

Lower-level things like the retirement engine (aka: shadow registers / out-of-order registers) are partitioned. I dunno whether or not the branch predictor is shared or partitioned. But... things like that.

> Many HPC installs run with HT disabled

the ones I use (and admin) don't. Slurm for example handles threads vs cores explicitly, and smart defaults can be set.

Do you see any performance differences when enabling/disabling? Our cluster uses Sandy Bridge through Skylake chips and all of our nodes have HT disabled. When things were tested (before I got here) there apparently was a non-trivial difference for the applications running. Curious what your results are.

The only significant issues I’ve seen is that many libraries using threading are or were a little naive in that they’d take the core count to be thread count. Better libraries know the difference

But I’m not aware of any benchmarks that pin one execution thread per physical core and compare with and without hyperthreading. I guess I have some homework to do, but then it’s a lot of effort when the results could depend on generation, clock speed etc.

C'mon they removed 'only' the SMT, nothing like disabling half the cores of 2700x on the infamous 9900k launch game tests

Disabling half the cores can be (poorly) justified. "We enabled game mode on the AMD 2700x, wasn't that what we should have done?"

It's also supposedly not an actual benchmark, just a projection of what the (not yet actually existing) chip should perform like.

I would take the numbers with several bags of salt.

Someone watches AdoredTV..

Anandtech's article has more technical details and discussion:


The most pertinent technical questions and concern seems to be around the interconnects, since inter-die bandwidth and latency are big factors in performance. I think the major surprise/disappointment there is that within the MCM they apparently are not using their Embedded Multi-die Interconnect Bridge (EMIB) which was supposed to be their next-gen interposer replacement. They have been pretty flowery about it, right up to on the official page [1] having fun with a Star Wars riff and calling EMIB "an Elegant Interconnect for a More Civilized Age". Looks like it's all UPI here though. According to Cutress it also isn't clear exactly how many UPI link are used between the two sockets.

It's not like MCM usage is some necessarily exotic thing, IBM has been doing it in various forms including for POWER forever. Even beyond TDP however Intel's approach here definitely seems kind of "odd" to put it neutrally. I think if this is going to be a major part of their strategy going forward they'll need a different approach, but for that very reason I wonder if this is more of a one-off placeholder?


1: https://www.intel.com/content/www/us/en/foundry/emib.html

Well... Pentium Pro was an MCM and that was a looong time ago.

I also remember Unisys Micro-A boards (the Unisys equivalent to IBM's PC/370). IIRC, they had 6 or 7 chips within a single package.

" … currently believed to be a 5903 pin connector."

This thing has way more connector pins than the 6502 (the processor that ran inside the AppleII) had _transistors!_

most of the pins are vcc and gnd, but still impressive indeed. 6502 had 3500transistors, 40pins... and multiple address modes (with zero page+y being my fav)

I find the timing of this announcement a bit amusing. Given AMD's planned 'new horizon' announcements today, I'm assuming Intel rushed out this press release over the weekend in order to preempt AMD.

It allows all those going to be EPYC2 buyers know or at least have an upcoming Intel chip in their mind, hopefully delaying the purchase decision until later and gives Intel sales team time to do their job. Which so far their sales team has been doing spectacularly well and literally every one who has been complaining about Intel continues to buy Intel. Sad but true.

We should know more about Zen 2 in a few hours time.

>and literally every one who has been complaining about Intel continues to buy Intel

While I am sure this isn't LITERALLY true, there is some truth to it. I work at a smallish company, who bought 3 new Intel servers about 5 months before Epyc servers started to be available from OEM's. We plan on adding another server to our stack next year, I would LOVE to go with Epyc, but I am locked into Intel at this point (until we decided to replace the entire stack). VMWare doesn't allow you to mix CPU architecture and maintain HA. So, we stuck buying a crappy Intel Xeon instead.

Judging from the Server shipment and market share this is exactly what is happening. But to be fair, many fought for EPYC, but those who made the decisions, CEO / CFO / CIO / Purchasing Director etc are all well connected with Intel.

Just wait until Apple drops Intel. It's going to be a bloodbath.

Is Apple really a major customer of Intel? I’m not so sure.

I wouldn't say it's a particularly small number: https://www.statista.com/statistics/263444/sales-of-apple-ma...

Add that to AMD's recent surge, Intel's in a world of hurt.

> Add that to AMD's recent surge, Intel's in a world of hurt.

Meh. Looking at the sales numbers of past quarters, AMD still hasn't significantly hurt Intel. They have made a lot of inroads among hobbyists and in small-store prebuilts, but the major brands still appear to be dominated by Intel.

It's a real shame that nobody seems to have made an actually good Raven Ridge laptop. The CPU is imho objectively better for laptop use than any Intel competitor, but all laptops built with it appear to have a major deficit of some kind or another.

Another mystery is lack of Epyc uptake. There are a lot of workloads where Epycs should be dominating Intel server chips when compared at the same price or power level, especially after all the recent chip vulnerabilities which are much less serious for AMD than they are for Intel, but Epyc sales are definitely lagging. Is AMD unable to make them, or are companies just wary of buying AMD?

There's a lot of interest in Epyc for certain use cases. It's just the server world moves slowly because when you're spending millions on massive amounts of hardware, and it needs to run 24/7 for years without defects, you want to make sure it works.

> It's just the server world moves slowly

THIS! I am not sure why people are expecting Epyc to take off like a rocket, it won't. Companies maintaining 24/7 hardware don't jump on new tech. I think Epyc will eventually take a good chunk of the market, but it will take a few years.

Also, if you are expanding an existing VMWare stack you can't mix CPU architecture and have HA. So unless you are building a new stack, then Epyc might not even be an option, even if the Admin might prefer Epyc.

Cloud moves slowly because you don't want to force users to redeploy and, once racked, those boxes stay there for a good couple years. In-house ever-expanding large installs are much more agile. Look at how Google adopted POWER, to how Cray is offering ARM nodes in their supercomputers.

So, I work as what is effectively a sysadmin role for a decently large deployment in a decently large company and I can say that we're buying EPYC for some stuff but not everything _yet_.

Build farms, where there's a lot of compute needed and it doesn't really matter the reliability of the systems, for instance, have EPYC. Some workstations have Threadripper (nobody has Ryzen) as time goes on, with AMD beating Intel in the middle market, we will expand from builders to VM farms, from VM farms to bare metal hosting, from bare metal hosting to underpinning entire projects.

It will take another 2 years of AMD being on top for our company to favour AMD. But I think it's possible.

> Another mystery is lack of Epyc uptake.

AMD and AWS just announced EC2 instances with Epyc processors a few minutes ago: https://www.anandtech.com/show/13547/amd-next-horizon-live-b...

Edit: Now on the front page: https://news.ycombinator.com/item?id=18392834

We're looking at moving to Epyc but waiting for Epyc2.

Currently on 2 socket Gold 6126's across the board.

>I wouldn't say it's a particularly small number:

It's not the number of sales alone that tells the story. It's also the profit margin. Apple is well known for allowing very low margins to their suppliers. It's not clear to me how much of Intel's revenue comes from Apple.

Well, Googling, I see this:


Very rough math, but to the tune of $3B a year from Macs. It's not nothing, but is less than 5% of Intel's revenue.

This site says $4B: https://markets.businessinsider.com/news/stocks/intel-stock-...

Intel's revenue growth per year these days is around this amount.

Don’t underestimate AMD’s ability to shoot itself in the foot.

> Don’t underestimate AMD’s ability to shoot itself in the foot.

Every single consumer-targeted Intel chip has integrated graphics capable of displaying a 4k movie on the screen/tv without so much as breaking a sweat. AMD has exactly 3 Ryzen chips with that capability, none of which fit into their "Lot's of cores!" strategy. I'm not sure that AMD understands what drives the majority of consumer purchases.

Non-technical, non-gaming consumers don't care about integrated graphics. They're fine with whatever's in the box. But for folks like me (and many others here) who build their own machines, we typically don't settle for integrated graphics. I have 2 4k monitor and I occasionally game, sure the latest and greatest Intel chips can probably do the multi-monitor setup, but I doubt you can play Call of Duty Black Ops 4 on them and expect a reasonable framerate.

it's not any customer either, it's free bad press to be dropped by such a known company

Since most other companies tend to copy Apple like sheep, Apple dropping Intel would be a huge blow to Intel's image and brand.

Yes, all the radio is intel made

I know they use Intel components. I’m asking if it’s a large volume of components relative to Intel total revenue.

And for the high end of the market, there’s Power9 processors, which are truly incredible. Intel cannot compete with that.

Could you elaborate on that? For some reason that eludes me, realistic performance comparisons between different CPUs is a hard to find information.

do you have any references for power/price? I've used a power8 system and it was great but a little pricey.

With any luck, they'll use IBM POWER again!

(Please don't be ARM, please don't be ARM)

Why not ARM? It's seeming like it'd be a very sensible choice at this point, given the performance being delivered for iOS devices these days.

I hope they have solved the problems with latest security flaws before releasing these CPUs

It takes a long time to design and fab new layouts. So at best you would get firmware mitigations baked in. Or, as mentioned in other comments, the HT sidechannels might be fixed simply because HT is turned off.

Intel finally goes MCM! I wonder if their implementation works as well as AMD's though. Can it scale to smaller chips (which have good yields)?

So much glue. Glue everywhere!


Intel also did another nasty one, after being caught lying about it with another benchmark they did in the same way: they disabled EPYC's SMT, likely because this chip doesn't have SMT, so they don't want AMD's chip to benefit from it either, even though in real world scenarios it will.

How many side channels though?

Right out of Pentium D (remember that?) playbook:


It bought Intel some time until Core 2.

Finally, a CPU that can keep up with Google Chrome!

And Slack!

What's the difference? /s

IMHO, /s is superficial.

The die shot looks to have 15 core modules at 2 cores each. I assume the 28-core variant lasers out 2 for yields and clockspeeds. I guess this one would then be cutting out 6 cores per chip. Big chips really aren't the way forward.

Per the caption it’s an Ivy Bridge, i.e. not the Xeon the article talks about.

There’s an annotated die photo of the CLAP (seriously, did nobody in Intel marketing predict “Intel gives you the CLAP”, “Intel CPUs have the CLAP” and khibosh this name?!) predecessor die at [1].

1: https://en.wikichip.org/wiki/intel/microarchitectures/skylak...

Intel uses three-letter codenames, so it's probably something like CLX-AP, not CLAP.

From TFA:

> Intel is labeling them "Cascade Lake Advanced Performance,"

The Anandtech article suggests they’re de-emphasising the “Lake” so it’s more of a C-AP.

Does Intel's PR seem to be on point to you these days?

No doubt they’re scrambling but managing to name a product after an STI is more “Gerald Ratner” [1] than “on point”.

1: https://en.wikipedia.org/wiki/Gerald_Ratner

Intel Xeon Platinum 8180 is currently sold for $10k USD each, it comes with 28 cores. For 48 cores and 12 memory channels, you can almost be assured to see a price tag of close to $20k USD. So basically when they fail to come up with some better architectures to face the competition from AMD, they choose to glue up their out of date designs (out of date when compared with Zen2 based new Epyc line) and sell you as a bundle without significant price cut!

To be honest, I am not surprised - we are talking about a company that worked so hard for 10 years to fool its customers on why they don't need more than 4 cores on desktops. The whole joke is so INTEL!

If it costs twice as much per socket but the same per core, and allows me to use half as much space, half as much cooling, 25% less energy to run the same workloads for the next five years, I'm in.

That's the list price, not many customers pay that

It's wild that so many people here think that e.g. major cloud vendors use the same purchasing logic for multi-hundred-million dollar deals on tens/hundreds of thousands of units, that a 17 year old uses when deciding how to spend $700 on a new computer. "They're scared", "Zen is a superior product", "EPYC/Rome will blow them away", etc. It doesn't actually matter.

Not only do many customers not pay the list price you see, they get custom SKUs specifically designed for them -- which is a growing portion of their Xeon business and one AMD has no clear, large-scale answer to AFAICS. And they basically can't get enough of them. Xeons practically print money and Intel recently announced something like 16% revenue increases and even higher demands for Xeon than they've ever had. They just invested an extra billion in new fab production just because they're at capacity right now...

Intel is going to remain very strong in this market for a good, long time as far as I can tell, and things like list SKU prices are an incredibly, ridiculously small reason to believe otherwise.

That said I'm very happy with the price/power ratio of my TR 1950X.

> which is a growing portion of their Xeon business and one AMD has no clear, large-scale answer to AFAICS

Probably depends on the size of the company. AMD has their semi custom department which was responsible for the PS4 and Xbox One, as well as PS4 Pro, Xbox One X (the GPUs are based on Polaris but have features from Vega, which no normal Polaris GPU has) and Subor Z (Zen APU with GDDR5 as main memory like the consoles, but can run Windows).

AMD is probably the better vendor if you want custom SKUs (as the underdog they also have more incentive for custom SKUs).

because AMD joined the competition again with its EPYC product line.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact