Sure, Intel's clocks are slightly higher than before, and their parts use less electricity, but they've been standing still in virtually all other areas.
AMD managed to innovate at a price point where most people can afford. But its a bit disingenuous to ignore the advances that Skylake-X has brought forward.
Skylake-client is very similar to Haswell, but still has larger ROPs, decoder, and branch predictor. It doesn't lead to much better clocks, but IPC is still up over the last few cores.
I do like AVX512 in principle, but in practice the runtime cost of using it is so high that I've offloaded anything highly parallel to other hardware (i.e. GPGPU) -- there's always work enough to keep the CPU busy anyway.
Have you found AVX512 great in practice? I'd love to hear about it!
But what I can say, is that AVX and AVX2 are rather limiting. And I want those new instructions that are found only in AVX512.
Not even to run 512-bit computations mind you. I want to run 128-bit computations in AVX512. You can use XMM registers in AVX512 ya know!!
The main benefits of AVX512 are the mask registers, scatter-instructions (AVX / AVX2 have gather, but no scatter), and an extension all the way to 32 XMM / YMM / ZMM registers
True, using ZMM registers causes severe downclocking issues. But XMM (128-bit) and YMM (256-bit) registers are still sufficient for CPU-based tasks.
Practical, pragmatic use of XMM registers include:
* Cuckoo Hashing (See: https://github.com/stanford-futuredata/index-baselines/blob/...) -- Have 8-bins per hash value for a total of 16-bins. Use SIMD to perform 8x comparisons at once.
* Huge number of database applications: http://www.cs.columbia.edu/~orestis/sigmod15.pdf
Bloom Filters are the most obvious "SIMD" data-structure, with applications to databases and many other tasks. Sorting networks are best implemented in SIMD.
* See this discussion: https://news.ycombinator.com/item?id=16171806
In effect, you have Base64 encoding / decoding that is faster than a PCIe x16 slot. You can encode / decode Base64 using AVX registers faster than you can even pipe data to the GPU.
True, the GPU is the ultimate SIMD processor. But in many cases, the SIMD task is too fast to go to the GPU. In particular, the Base64 encoder was encoding 20GB/s, while PCIe can only transport 15.6GB/s!! The CPU is done before the data even gets to the GPU.
Ditto with latency-specific code, like Cuckoo Hashing. SIMD speeds up the overall hash-table, but you're only parallel by 8x. There's no point to actually offload a Cuckoo Hash to the GPU.
In effect, you look for "small" parallelism of size 8 to 16 or so, and that's where AVX / AVX2 / AVX512 really shines. Its too expensive to move to the GPU, but you still get a HUGE speedup when you process it on the CPU.
That seems like a good case for using integrated graphics that has access to the CPU's memory controller and so can access data directly without copying it over PCIe.
Though if you're running that kind of volume you may have PCIe as the bottleneck in any case to access the data on whatever storage/network device.
The Xeon Phi (KNL) used a mesh topology and the KNL was released before Skylake-X.
So you do all the lithography, and vapor deposits on a wafer. That wafer has ~100 physical processors on it (100 just to make rounding easier). You split them (into individual chips), and you start testing.
Say on ~10 Hyper Threading, all the cache cells, and all the magic virtualization stuff works. These become some pro-sumer Xeon type deal.
On another ~10 Hyper Thread, and all cache cells works. This is your i9's
On another ~30 no Hyper Threading, and only some cache cells works. This is your i7's
The rest there is no Hyper Threading, only some cache cells work, and wow only 4 physical cores work. This is your i5's and i3's (kind of).
The idea is yeah, whole parts of a CPU are defective, or unperforming. So they just get disabled and "binned" as another lower tier CPU of the same micro-architecture. All of these get solid at >100-5000x markup to offset the $50bil+ in R&D Intel spends each year. Yes their margins, are... amazing.
Also, the reality is that the consumer market is a massive beneficiary of this whole scheme. The server market is effectively subsidizing consumer processors to the tune of billions, if consumers had to pay full freight on their dies prices would be several times higher than they are.
if consumers had to pay full freight on their dies prices
would be several times higher than they are.
The fundamental processes of producing an equal die space SoC as a Xeon on the same (node) is likely roughly equal cost (or I imagine the fabs as a service would go out of business). So saying Intel -needs- the consumer market to subsidize their server line is a total lie.
Intel puts an extreme markup on their server class processors, and a milder mark up on the consumer segment
Looks like they could just be trying to brute force the loss of speculative execution simply by augmenting the number of cores.
But I'm a developer on Linux who uses a couple of languages that compile to machine code. Which means that I don't care if AMD has something twice as fast, I'm going to buy Intel for one reason alone: their performance counters support running rr.
That's at least a factor of two improvement in my productivity right there.
Last I heard, Ryzen's perfcounters weren't deterministic enough for rr, so it's dead to me.
Core i9-9900K 8/16 Cores 3.6-.5.0 GHz - $488
Core i7-9700K 8/8 Cores 3.6-.4.9 GHz - $374
Ryzen 7 2700X 8/16 cores 4.3 GHz - $320
Easy pick for me, Intel isn't even a consideration.
The Intel is probably faster... but enough to justify literally double the money? I seriously doubt it.
Only in specific IPC heavy workflows.
In any case, bit flips are much more common than were suspected: https://arstechnica.com/information-technology/2009/10/dram-...
I believe strongly that ECC should be standard, because you can't safely assume that your users are doing worthless work. Apple got this right on (non-Mini) desktops a long time ago. Not yet on laptops, unfortunately.
EDIT: At -3 so far, does anyone want to explain the downvotes? I saw the google slides first hand, and there are comments from 2009 in that article saying the same thing.
Your comment provided no substantiation of your claim, merely hand-waving, while casting aspersions on someone else's work.
Like I said though, just a guess.
Bitsquatting: DNS Hijacking without exploitation
When bit-errors occur they can change memory content. Computer memory content has semantic meaning. Sometimes, that meaning will be a domain name. And applications utilizing that memory will use the wrong domain name.
Most people don't buy desktops with 16 or 32 or 64-thread processors designed to maximize throughput.
Those who do tend to want to max out how much RAM they can shove in their box.
Also, are cosmic rays really the main source of single bit flips as apposed to just bad ram maybe?
Some of the time the instructions don't match up, indicating corruption _somewhere_.
For the specific case of crashes in JIT-generated code, the contents of registers and the instructions can be related in various ways (e.g. if you have a jmp instruction the register better contain your code location). And if you know where your code locations might be (because you're a JIT, and are generating the code and aligning it in memory yourself) and the register with the code location looks like the sort of address you would end up with but with one extra low bit set, say...
I am having trouble right now finding the bug report where some of the JIT engineers were analyzing crashes in jitcode, but about 1/3 of those were due to bitflips if I recall correctly. What that means in terms of absolute numbers (or numbers per user-hour, which would be even more useful), I don't know.
Note, by the way, single-bit flips can be a consequence of a bad memory chip, not just of cosmic rays.
I am not fanboying AMD here either. The facts for me are clear, AMD is going to continue to eat Intel for breakfast for the next 2 years at least. Intel isn't going to be competitive until they release a modular designed chip, which although they haven't announced they ware working on, I am 100% certain they are. Why am I certain, first they now have Jim Keller, who designed the Ryzen chip. Second, they acquired NetSpeed, which has IP around modular CPU design (likely a play to ensure they don't get sued by AMD when they release a modular chip). I know a lot of people are looking at Intel right now, thinking it is a bargain prices and a good time to buy. I think they have a long way more to fall. When the chips that Jim Keller is working on are about to hit the market, that is when it will be time to buy Intel. Until then I have no interest in anything Intel has to offer.
The tech is called "infinity fabric", you can look it up there is lots out there about it. It basically allows AMD to make several smaller dies and have them function together as one CPU. Here is an excellent video that explains it all (and also goes into why this is such a huge advantage that allows AMD to have significantly better yields with their wafers)
AMD is offering a 16-core / 32-thread 4.4GHz monster for $899, and a 32-core / 64-thread for $1799.
If you wanted the best under $5000 (total cost of a computer), it seems like AMD Threadripper is the best. If you wanted the best below $1000, it seems like AMD Ryzen is the best.
Only if you are single-thread bound (5GHz clocks!!), AVX2 or AVX512 bound, or PEXT / PDEP bound (Stockfish 9) should you consider Intel. But otherwise, AMD is offering more performance at all price points up to the EPYC 7601 (~$3000 CPU: 32-core/64 threads / 8-memory controllers / 64 PCIe lanes direct to CPU, support for dual-socket).
With that being said, I'm definitely interested in Intel's Xeon Silver platform. If Intel pushed dual-socket out better, then they would be cheaper AND faster. IMO, its a bit weird that Intel isn't taking advantage of their dual-socket solutions to counter AMD Threadripper (I mean... Threadripper really is just a dual-socket or quad-socket NUMA chip combined into a single socket).
As it is, Xeon Silver is hard to find and seems to be ticking up in price unfortunately. But their nominal prices are actually quite good, although their clocks are kinda low. But Xeon Silver really seems like Intel's price/performance champ (even if its still a bit more expensive than Threadripper or EPYC).
If you're only doing primarily single threaded things (i.e. gaming) then the intel chips will give you better performance.
Lets say you can split up your threads to AI, Physics, Rendering, Networking, etc. etc. But lets say Physics dominates: then your game is still single-threaded bound. You only get faster if you make your physics faster.
You can split your game up into work-queues, thread pools, and such. Except not everyone is up-to-date with the latest techniques yet. Furthermore, thread-pools aren't always cache friendly and may hamper your performance. (If Core1 works on something, then Core4 completes the work, you have to transfer all the data out of your L1 and L2 cache to continue working on it on Core4).
So its not exactly an easy thing to program.
But many games are still single-thread bound (despite being multithreaded). As such, Gamers should still prefer single-thread performance.
Games make use of up to 6 threads, but single-threaded performance still determines overall framerate to a large extent. Games are not exempt from Amdahl's law: as long as you've got enough threads to offload work, it comes down to the single-threaded portion of the workload. The faster you can run the main game loop the faster the game runs.
Gamer's who prefer cutting edge performance should prefer multi-thread performance.
Optimizing for MT-heavy games is the gaming equivalent of premature optimization - everybody can run Doom at a million FPS, but Fallout 4 or PUBG shit all over every system and you'll be begging for every frame you can get.
You can "prefer" whatever you want but developers don't care. This site of all places knows that time-to-market is what really matters. If you want to play those titles, you have to deal with it. Either optimize for the shitty titles, accept that you're going to be losing a fairly significant amount of frames (the 2700X is behind as much as 30% in some titles), or don't play those titles.
I mean that if the software is implemented in the correct manner, "more can be done" by utilizing multiple threads.
That's because once you are down the critical path, the only way to go faster is by improving single core performance.
There is a reason reason why Core i5 are so popular with gamers. They have just enough cores not to limit a game engine, excellent single-thread performance, and they are affordable.
But what application are you needing more than 10Gbps externally?
Laptop docking. Running a 4K monitor over a USB Type-C cable leaves you with only the USB 2.0 lanes available for data. (And 5K is impossible.)
[EDIT - removed AVX-512 claim that was wrong]
Overall perf is likely a lot better with the i9 for most workloads.
It's true that Coffee Lake has a performance edge over Ryzen when running AVX/AVX2 code though.
Ryzen doesn't have 256-bit wide units, so 256-bit instructions take 2x as long.
The benefit of that is that Ryzen doesn't throttle the clock speed of the whole chip when executing AVX2 instructions :P
Intel's widening of units has brought a giant downclocking problem: https://blog.cloudflare.com/on-the-dangers-of-intels-frequen...
What do you think it's more profitable to sell: one CPU with many PCIe lanes that you can attach 8 GPUs to (8 NVIDIA GPUs for 1 Intel CPUs), or more CPUs with fewer PCIe lanes (8 NVIDIA GPUs for 2 Intel CPUs)
That said, the next gen Ryzen is rumored to be 4.5 GHz chip on turbo. So, definitely the steam is picking up.
If you look at the cost difference, it is pretty dead on placed with ~$130 higher than Ryzen 8 core. People trash around AMD/Intel like its their home town sports team. I find that everywhere including on HN.
Let's wait for the benchmarks, specific load comparisons to see if the price difference makes sense. The 8700k is ridiculously fast and it beats the hell out of Ryzen's 1600X.
This is a critical security bug that was discovered over a year ago. Intel just ignoring the problem and casually releasing yet another (underwhelming) upgrade that doesn't address it all...
The fact anyone accepts that just goes to show how low our expectations have dipped when it comes to Intel.
See for example https://en.wikipedia.org/wiki/Spectre_(security_vulnerabilit... :
There were also stories they were aware of the problem even before, and basically ignored it. If you check out that history section, there were multiple public presentations about the feasibility of an attack for years before the practical exploit was discovered. The only way Intel didn't at least suspect it is if it was very sloppy, or didn't care at all.
The i7 has the same amount of cache as its predecessor and the i9 has more.
9-series has hardware fixes for Meltdown variant 3 as well as the L1 terminal fault fix.
I will only buy Intel if I'm forced to by other people (think: employer, laptop). By my own choice, I will buy Ryzen/Threadripper whenever I can. And I will do so gladly, knowing I give my money to the David who's fighting Goliath and giving me a much better bang for the buck at the same time. Win/win.
Workloads limited by a single thread.
That's about all I see.
Core Generation | Microarchitecture | Process Node | Release Year
2nd | Sandy Bridge | 32nm | 2011
3rd | Ivy Bridge | 22nm | 2012
4th | Haswell | 22nm | 2013
5th | Broadwell | 14nm | 2014
6th | Skylake | 14nm | 2015
7th | Kaby Lake | 14nm+ | 2016
8th | Kaby Lake-R | 14nm+ | 2017
| Coffee Lake-S | 14nm++ | 2017-2018
| Kaby Lake-G | 14nm+ | 2018
| Coffee Lake-U/H | 14nm++ | 2018
| Whiskey Lake-U | 14nm++ | 2018
| Amber Lake-Y | 14nm+ | 2018
| Cannon Lake-U | 10nm | 2017*
9th | C. Lake Refre | 14nm% | 2018
% Intel '14nm Class'
 - https://en.wikichip.org/wiki/intel/core_i3/i3-8121u
There have always been U, H, Y, R, and sometimes C variants for most of those prior generations. Each die gets a codename, just like now.
Let's say you want to run 4 GPUs off your CPU for a powerful workstation or small server (e.g. for a research group). There's debate over whether you need to run cards in x8 or x16 mode for most deep learning applications, so let's say 8x as a conservative choice. That means you need 32 lanes (or 16 for just two cards). But your drive and other peripherals might take some lanes too. So 40 looks like a safer number. Easy enough to find that on a motherboard...
Most of the mid-high range Intel desktop CPUs only have 16 lanes total. Base model Threadripper (1900x) which is price-competitive with those has 64. You can go to Xeon but that (AFAIK) can be problematic in several ways for midrange or mixed use workstations (no integrated graphics, less mobo selection for the needed features).
I think this is pretty important. If enough researchers go to Ryzen then math libraries will get written there and the lead that libs like MKL provide could be nullified. This will filter into the server market, where CPU is used even in deep learning production deployments (e.g. for inference servers). And having lower end processors is important because there are lots of independent researchers doing important work in the field who don't have huge budgets, as well as academics who can't get budget and might be spending out of pocket on their workstation.
 I'll admit I don't know enough about the math hardware in Ryzen vs Intel to say if this is possible.
More importantly for deep learning (this price difference is lost in the cost of GPUs), AFAIK all X399 motherboards for Threadripper are set up as 16x/8x/16x/8x, whereas there are X299 motherboards with PCIe switches that give you 16x/16x/16x/16x with neighboring GPUs sharing bandwidth. For deep learning, each individual GPU having 16 lanes of bandwidth is more important than having to share with another GPU.
Intel stepping up the lane counts in HEDT is definitely a good reactionary move but won't affect the budget-constrained scenarios I discussed until a few years from now because you can just buy a couple generations back. I might be wrong but I believe many X299 chips before gen 9 had 16-24 lanes?
I also don't think your point about price difference insignificance is universally applicable because buying extra GPUs over time is quite common and having an extensible box with 1-2 GPUs at the start is potentially a good move. As for the PCIe switches on the X399 chipset IMO this depends on the assertion that 16x is X% better than 8x. This depends on use case and analysis of the penalty you get at 8x vary but many people would take a 10-15% penalty down the road @ 4 cards (really 5-7.5% because it only affects 2/4 cards) to save money now.
I built a box with Intel because I had the capital, and most people at an industry job probably should, but if you read forum and mailing list threads many people are faced with this economic decision and are going TR - I see quite a few academics doing this. Math libraries are a good thesis topic :)
Oh how the tables have turned.
Speaking of which, time for a new gaming rig...
GPUs? I have a few hundred old unplayed games on Steam that I'm working through. See you next gen!
Will probably do a new system in the spring with my income tax return. Leaning towards R7, maybe Threadripper, currently. They're finally getting faster enough to justify the cost.
aside: switched to hackintosh for my OS last year, been very happy. Hoping the osx-amd support isn't too difficult when I upgrade.
Each year the new CPU it's about 20% faster then the old one. After 4 years, it adds up. A top Intel CPU will now deliver twice the same single-thread performance than a 5 year old one of same price.
Source? I suspect you're being shown a few choice benchmarks which demonstrate some improvement, perhaps where something became newly vectorizable...but realistic and consistent benchmarking doesn't show nearly that level of improvement.
One area in which you may benefit is in peripherals - an M.2 SSD and DDR4 RAM will help, and those aren't available on 5 year old motherboards. Also, power savings may be found with newer systems if you're in a laptop form factor where that matters.
But let's not pretend that single-threaded performance has doubled in 5 years. Here's one article that goes back quite a ways....and 2012 SPECint comes in at about 60,000 while 2017 comes in at 83,600.
But in one thing you err: After a small custom BIOS mod, my home PC now boots from nvme - with a Z68 board (Sandy Bridge era), the module sitting in a PCIe adapter card. Was just a matter of adding some UEFI module.
The main thing lately is when I'm running a lot of containers/vms for workloads it tends to box down a little. Outside of that and 4k 3d, don't notice it ever. But the new stuff is only just starting to get better enough that I may upgrade next year sometime.
I had the same thought coming into this thread-what a great time to be in the market for a new machine. I welcome this competition, I want AMD to get better in the CPU game, but will gladly scavenge the spoils of this.
Edit: clarified that I don't see Intel not ruling laptop market.
If you had a top of the line AMD CPU from 4 years ago, you'd have a nice toaster.
AMD fans have gotten louder than ever (and quieter, heh), but the fact remains, AMD can barely compete at the top end as far as gaming goes.
For a short while, AMD had finally met up with Intel at the top end of gaming CPUs (i7 8700k vs Ryzen 7 2700x).
Now the i9 9900K clobbers that comparison.
And yes, some people will immediately yell no fair because the 9900k will cost up to 200$ more, but if you're building a PC to last years, who cares?
My gaming PC has a 4790k that's been running at 4.9 Ghz for over 4 years now. I upgraded the GPU to a newer card, Vega 64 (would have been a 1080ti but all the 38" ultrawides only support Freesync), and that 4 year old CPU isn't a bottleneck yet.
Saving 200$ back then would have been the difference between needing a new motherboard, CPU and RAM today, easily a justified investment.
My whole comment is about how if you’re trying to build a PC that lasts, you’re easily justifying the only drawback, the 200$ increase
I remember a recent review stating whether you could still use an FX chip for modern games and the answer was essentially, technically, if you only play at 1080p, have a very beefy GPU (which will then be gimped by the FX) and are careful with settings.
The 7700k was getting almost double the FPS in the same rig...
Meanwhile benchmark videos show the 4790k at most 10 FPS of the 7700k in almost every single game tested, and never the difference between playable and never struggling to exceed 60 FPS, even at higher resolutions.
Do most gamers who shelled out dollars for the top of the line gaming CPUs of their day care about framerates? Yes.
Does that mean they don’t care “actually playing games”? Apparently you think so...
I am not really being offered Ryzens at work though. I get sent lots of Dell and HP offers and they are all Intel atm. I wonder if this is about to change?
"What makes this a little different are the eight-core products. In order to make these, Intel had to create new die masks for the manufacturing line, as their previous masks only went up to six cores (and before that, four cores). This would, theoretically, give Intel a chance to implement some of the hardware mitigations for Spectre/Meltdown. As of the time of writing, we have not been given any indication that this is the case, perhaps due to the core design being essentially a variant of Skylake in a new processor. We will have to wait until a new microarchitecture comes to the forefront to see any changes."
> * Speculative side channel variant Spectre V2 (Branch Target Injection) = Microcode + Software
> * Speculative side channel variant Meltdown V3 (Rogue Data Cache Load) = Hardware
> * Speculative side channel variant Meltdown V3a (Rogue System Register Read) = Microcode
> * Speculative side channel variant V4 (Speculative Store Bypass) = Microcode + Software
> * Speculative side channel variant L1 Terminal Fault = Hardware
Edit for clarification: 100-200 over 9900k price
More cores/threads are good at least to the point where you have as many hardware threads as software threads.
Even a pile of single-threaded apps running simultaneously benefits from multiple hardware threads.
"The only significant difference between the two platforms is that,
although both platforms offer 2-way Simultaneous Multi-Threading
(SMT), Intel’s hyper-threading implementation seems to be much
better. On the Skylake, we see a performance boost from hyper-
threading for all queries. On the AMD system, the benefit of SMT
is either very small, and for some queries the use of hyper-threads
results in a performance degradation."
> On average, both Xeons pick up about 20% due to SMT (Hyperthreading). The EPYC 7601 improved by even more: it gets a 28% boost on average.
These results are the same as I've seen everywhere except the quote you pulled out. Zen's SMT is more effective than Intel's HT.
Or I can get a CPU that traded blows with your last generation CPU for 320$.
I'll take my 500$ CPU please and thank you.
If you're stretching out of your comfort zone to afford a 2700X it's one thing, but the moment you're the type of consumer comparing the 9900k to a 2700X, it's not even a contest. Everyone accepts PC parts are a case of diminishing returns, the 9900k is delivering beastly, even if diminishing, returns for what it costs.
I would suspect that the 9900K will also end up being priced way above the official price.
But for single core perf/gaming... these things are going to be unmatched. Stock turbo frequency of 4.9 ghz? You could OC to over 5ghz with ease.
I think I'm beginning to understand Intel's market positioning with these. They know that for most consumers, even including gamers, single core perf dominates everything. So if they can drop some multicore perf by killing hyperthreading, and trade that for a higher turbo, that's a huge win for their bread and butter. And they'll sell you the i9 or Xeon if you need the higher core counts, though maybe they recognize that the HEDT segment is already being dominated by AMD so its back to the labs to R&D a strategy for that.
(Cache per core reduction might be negative though)
The boost clocks are higher, you're probably gonna run at this more often than not when you care about it.
There maybe some work loads where having more cores will actually care, but HT isn't all that huge if a performance increase.
It would be nice to see some low voltage cpus that aren't just reserved for OEMs only.. unfortunately this is one area AMD hasn't pressured intel (amds offerings all > 100W)
The marketing team does whatever the customers demand and value.
I go to an event called PDXLAN twice a year. It was previously a 550-person event, until this next one and future ones where we've expanded into a new venue and will have 800 gamers present with the possible option of further expanding to 3,200 in the future. It's one of the largest bring-your-own-computer events in the USA, sponsored by nVidia, AMD, Intel, and several other companies that produce hardware for gamers.
We RGB our cases and equipment because it's fun to personalize your stuff. There are over 500 people there, and seeing so many unique setups is entertaining. We've had someone make their computer look like a boombox , a centipede , or Rey's Speeder from Star Wars , complete with custom painted mouse and keyboard.
You might think it's all silly and stupid, but we get a lot of entertainment out of it.
Im a professional software engineer, that likes to game, but doesn't wish to outwardly identify with it, I like interior design, and I like nice clean hardware that fits in with my flat.
This excludes me from having the top end Acer Predator Monitor, because its made to look like a throbbing red quantum spaceship component, which it isnt, its just a damn display, and it would look ridiculous.
If you want a laptop with decent GPU hardware, most of the options have all these edges, and lights, dressed up to look like some kind of weapon or something. If I need to bring my laptop to a meeting, I can't have that and and be taken seriously.
It just feels childish, and its hard to believe that the number of people like me isn't enough for the manufacturers to support that.
Frequency is mostly determined by process node and can't be increased without a die shrink. In fact, early 10nm nodes will actually be slower than what Intel can do on their highly-refined 14nm node. Jokes aside, 14+++ is actually amazingly fast.
yeah right, if that was the case I'd rig up a compile farm with iphones
iPhone XS: 4794 scores
The newest Xeon E-2176M: 4900 scores
the difference is only 2%...