I would actually say the exact opposite is true. Single threaded performance is much more reliable and every single application can use it. Multithreaded performance is much more workload dependent, and there are many applications that can’t fully utilize it.
Such as Video Editors: Adobe Premier in particular.
Apart from that, I'm fine with seeing numbers from icc because, if I'm shopping for the highest possible performance, that's probably the compiler I'll use for my code.
Performance results are based on testing or projections as of 6/2017 to 10/3/2018 (Stream Triad), 7/31/2018 to 10/3/2018 (LINPACK) and 7/11/2017 to 10/7/2018 (DL Inference) and may not reflect all publicly available security updates.
LINPACK: AMD EPYC 7601: Supermicro AS-2023US-TR4 with 2 AMD EPYC 7601 (2.2GHz, 32 core) processors, SMT OFF, Turbo ON, BIOS ver 1.1a, 4/26/2018, microcode: 0x8001227, 16x32GB DDR4-2666, 1 SSD, Ubuntu 18.04.1 LTS (4.17.0-041700-generic Retpoline), High-Performance Linpack v2.2, compiled with Intel(R) Parallel Studio XE 2018 for Linux, Intel MPI version 22.214.171.124, AMD BLIS ver 0.4.0, Benchmark Config: Nb=232, N=168960, P=4, Q=4, Score = 1095GFs, tested by Intel as of July 31, 2018. compared to 1-node, 2-socket 48-core Cascade Lake Advanced Performance processor projections by Intel as of 10/3/2018.
Stream Triad: 1-node, 2-socket AMD EPYC 7601, tested by AMD as of June 2017 compared to 1-node, 2-socket 48-core Cascade Lake Advanced Performance processor projections by Intel as of 10/3/2018. DL Inference:
Platform: 2S Intel Xeon Platinum 8180 CPU 2.50GHz (28 cores), HT disabled, turbo disabled, scaling governor set to “performance” via intel_pstate driver, 384GB DDR4-2666 ECC RAM. CentOS Linux release 7.3.1611 (Core), Linux kernel 3.10.0-514.10.2.el7.x86_64. SSD: Intel SSD DC S3700 Series (800GB, 2.5in SATA 6Gb/s, 25nm, MLC).
Without AMD we would all be using some variation of Itanium.
It's a quite sane policy: if you want to sell to me, make sure I'm not locked in with you.
I remember our government had other computers besides PCs and none of them had CPU clones.
As far as I remember US government was a big consumer from DEC, Xerox and IBM hardware and they also did not have any CPU clones.
And Intel implemented the AMD64 instruction set for its own CPUs.
This would also "fix" a lot of side channel attacks without new silicon being ready and Intel claims this CPU includes some hardware mitigations.
However, Intel starting to sell chips without HT has nothing to do with the insecurity of HT and everything to do with the fact that they are trying to sell chips that are cheaper to make (or more profitable), while also being a little more competitive (price-wise) with AMD's offerings.
But while they do this, they know they'll have a severe handicap in performance as long as AMD continues to sell competing products with SMT enabled. So their "solution" is to start making it "fair" in their benchmarks by using only half the cores or the SMT disabled for competing AMD products.
One of the first clues that tells us this move is not a response to the recent security vulnerabilities of HT is that they started offering chips without HT only a few months after those discoveries were made. Do you really think something like planning to design and ship a product without HT happened only a few months earlier? No, this was decided at least a year and a half ago.
Hyperthreading is there to help reduce the times the CPU is stalled.
If you run less active threads, the cache efficiency would the same in that case, hypethreading or not.
Now that we mention it, I wonder if HT can be enabled or disabled per core. If that's possible, the machine would look like it has some logical cores with more cache than others and we could use cache miss counters as input to decide where a process should run.
This is too naive to look at in such a way. Yes if there is less scheduling on the core (less concurrency) the likehood of misses is lower, but the idea is to pay less for each miss doing effectively something else meanwhile instead of being stalled for memory.
>Now that we mention it, I wonder if HT can be enabled or disabled per core.
OS scheduler can do that. Normally the guidance for hyperthreading is: consider each "thread" as a separate core. If an OS chooses to use one core at a time and not schedule anything, the hypertrheadind won't be used. That is imagine logical core0&core 1 represent the same physical one, if the OS schedules only to core0, hyperthreading is not in play.
Notes: controlling scheduling on linux is done via taskset. And cores info is available via "cat /proc/cpuinfo"
After trying out WebSphere developer tools (RAD) with HT enabled and disabled for several days, I eventually came to the conclusion that with them disabled everything just felt faster.
Also it is not clear if the security issues caused by HT, which affect other vendors as well, could ever be properly sorted out.
The place where SMT really seems to benefit is in particular applications that can use arbitrarily many threads. This also points to a solution to the security issue, because you get most of the benefit even if you only schedule threads from the same process group together on the same core and it would be possible for programs to specify whether it's safe to do that (e.g. yes for a render, no for a web browser).
That would also improve SMT performance in general since SMT makes the performance of some programs worse, and they could use the same mechanism to opt out of it.
Regarding threads, I used to be very keen on them when they became available everywhere, yet nowadays I see them as yet another attack vector.
It is not only the security issues, also the overall application stability, as a buggy thread can bring the whole application down.
True, but the scheduler can't tell the CPU to allocate all of the partitioned cache to a thread if it the silicon thinks it belongs to another logical core, so you end with effectively half of your cache dark. If L1 and L2 are pooled between the logical cores, it makes no difference, but I'm not sure how HT behaves WRT to partitioning caches on current generations of Intel parts and I remember it used to be like that (because I turned off HT on a couple dozen boxes to improve single thread performance, which was what was bothering me).
A lot of things are partitioned. But my understanding is that L1 and L2 cache is competitively shared. Which means if one thread uses 512 bytes of cache, the OTHER thread can use the full 32kB for itself.
There are other benefits: Both hyperthreads can share L1-code cache for instance, if there's a tight loop that is shared between the two threads.
Lower-level things like the retirement engine (aka: shadow registers / out-of-order registers) are partitioned. I dunno whether or not the branch predictor is shared or partitioned. But... things like that.
the ones I use (and admin) don't. Slurm for example handles threads vs cores explicitly, and smart defaults can be set.
But I’m not aware of any benchmarks that pin one execution thread per physical core and compare with and without hyperthreading. I guess I have some homework to do, but then it’s a lot of effort when the results could depend on generation, clock speed etc.
I would take the numbers with several bags of salt.
The most pertinent technical questions and concern seems to be around the interconnects, since inter-die bandwidth and latency are big factors in performance. I think the major surprise/disappointment there is that within the MCM they apparently are not using their Embedded Multi-die Interconnect Bridge (EMIB) which was supposed to be their next-gen interposer replacement. They have been pretty flowery about it, right up to on the official page  having fun with a Star Wars riff and calling EMIB "an Elegant Interconnect for a More Civilized Age". Looks like it's all UPI here though. According to Cutress it also isn't clear exactly how many UPI link are used between the two sockets.
It's not like MCM usage is some necessarily exotic thing, IBM has been doing it in various forms including for POWER forever. Even beyond TDP however Intel's approach here definitely seems kind of "odd" to put it neutrally. I think if this is going to be a major part of their strategy going forward they'll need a different approach, but for that very reason I wonder if this is more of a one-off placeholder?
I also remember Unisys Micro-A boards (the Unisys equivalent to IBM's PC/370). IIRC, they had 6 or 7 chips within a single package.
This thing has way more connector pins than the 6502 (the processor that ran inside the AppleII) had _transistors!_
We should know more about Zen 2 in a few hours time.
While I am sure this isn't LITERALLY true, there is some truth to it. I work at a smallish company, who bought 3 new Intel servers about 5 months before Epyc servers started to be available from OEM's. We plan on adding another server to our stack next year, I would LOVE to go with Epyc, but I am locked into Intel at this point (until we decided to replace the entire stack). VMWare doesn't allow you to mix CPU architecture and maintain HA. So, we stuck buying a crappy Intel Xeon instead.
Add that to AMD's recent surge, Intel's in a world of hurt.
Meh. Looking at the sales numbers of past quarters, AMD still hasn't significantly hurt Intel. They have made a lot of inroads among hobbyists and in small-store prebuilts, but the major brands still appear to be dominated by Intel.
It's a real shame that nobody seems to have made an actually good Raven Ridge laptop. The CPU is imho objectively better for laptop use than any Intel competitor, but all laptops built with it appear to have a major deficit of some kind or another.
Another mystery is lack of Epyc uptake. There are a lot of workloads where Epycs should be dominating Intel server chips when compared at the same price or power level, especially after all the recent chip vulnerabilities which are much less serious for AMD than they are for Intel, but Epyc sales are definitely lagging. Is AMD unable to make them, or are companies just wary of buying AMD?
THIS! I am not sure why people are expecting Epyc to take off like a rocket, it won't. Companies maintaining 24/7 hardware don't jump on new tech. I think Epyc will eventually take a good chunk of the market, but it will take a few years.
Also, if you are expanding an existing VMWare stack you can't mix CPU architecture and have HA. So unless you are building a new stack, then Epyc might not even be an option, even if the Admin might prefer Epyc.
Build farms, where there's a lot of compute needed and it doesn't really matter the reliability of the systems, for instance, have EPYC. Some workstations have Threadripper (nobody has Ryzen) as time goes on, with AMD beating Intel in the middle market, we will expand from builders to VM farms, from VM farms to bare metal hosting, from bare metal hosting to underpinning entire projects.
It will take another 2 years of AMD being on top for our company to favour AMD. But I think it's possible.
AMD and AWS just announced EC2 instances with Epyc processors a few minutes ago: https://www.anandtech.com/show/13547/amd-next-horizon-live-b...
Edit: Now on the front page: https://news.ycombinator.com/item?id=18392834
Currently on 2 socket Gold 6126's across the board.
It's not the number of sales alone that tells the story. It's also the profit margin. Apple is well known for allowing very low margins to their suppliers. It's not clear to me how much of Intel's revenue comes from Apple.
Well, Googling, I see this:
Very rough math, but to the tune of $3B a year from Macs. It's not nothing, but is less than 5% of Intel's revenue.
This site says $4B: https://markets.businessinsider.com/news/stocks/intel-stock-...
Intel's revenue growth per year these days is around this amount.
Every single consumer-targeted Intel chip has integrated graphics capable of displaying a 4k movie on the screen/tv without so much as breaking a sweat. AMD has exactly 3 Ryzen chips with that capability, none of which fit into their "Lot's of cores!" strategy. I'm not sure that AMD understands what drives the majority of consumer purchases.
(Please don't be ARM, please don't be ARM)
Intel also did another nasty one, after being caught lying about it with another benchmark they did in the same way: they disabled EPYC's SMT, likely because this chip doesn't have SMT, so they don't want AMD's chip to benefit from it either, even though in real world scenarios it will.
It bought Intel some time until Core 2.
There’s an annotated die photo of the CLAP (seriously, did nobody in Intel marketing predict “Intel gives you the CLAP”, “Intel CPUs have the CLAP” and khibosh this name?!) predecessor die at .
> Intel is labeling them "Cascade Lake Advanced Performance,"
The Anandtech article suggests they’re de-emphasising the “Lake” so it’s more of a C-AP.
To be honest, I am not surprised - we are talking about a company that worked so hard for 10 years to fool its customers on why they don't need more than 4 cores on desktops. The whole joke is so INTEL!
Not only do many customers not pay the list price you see, they get custom SKUs specifically designed for them -- which is a growing portion of their Xeon business and one AMD has no clear, large-scale answer to AFAICS. And they basically can't get enough of them. Xeons practically print money and Intel recently announced something like 16% revenue increases and even higher demands for Xeon than they've ever had. They just invested an extra billion in new fab production just because they're at capacity right now...
Intel is going to remain very strong in this market for a good, long time as far as I can tell, and things like list SKU prices are an incredibly, ridiculously small reason to believe otherwise.
That said I'm very happy with the price/power ratio of my TR 1950X.
Probably depends on the size of the company. AMD has their semi custom department which was responsible for the PS4 and Xbox One, as well as PS4 Pro, Xbox One X (the GPUs are based on Polaris but have features from Vega, which no normal Polaris GPU has) and Subor Z (Zen APU with GDDR5 as main memory like the consoles, but can run Windows).
AMD is probably the better vendor if you want custom SKUs (as the underdog they also have more incentive for custom SKUs).