This article really drives home the commonly ignored point about Intel CPUs: Sandy Bridge is a P4 derivative with all the faults fixed and the sharp edges filed off, not a P6/core 2 derivative.
> Sandy Bridge is a P4 derivative with all the faults fixed and the sharp edges filed off, not a P6/core 2 derivative.
That SB reintroduced netburst features doesn't change the history of the µarch. That's like saying Netburst not using a ROB means it was an evolution of Alpha rather than P6.
SB was not a renaissance of Netburst, it was a reintroduction, into the Core family, of features which Intel had first introduced in Netburst. Something which had already happened previously (e.g. hyperthreading in Nehalem) and would continue in the future (e. reintroduction of non-unified schedulers).
The "faults and sharp edges" of netburst were all caused by features which were at the core of netburst's identity, that's why it was a dead end and intel had to reset back to a sane working slate. As the article notes:
> Some of the fundamental ideas behind the architecture were definitely flawed.
And that Core was able to successfully integrate over time features first attempted in Netburst is exactly what you'd expect: many (though not all) of the individual features were not intrinsically bad and in fact several had been deployed successfully by other companies.
As the article specifically notes, the issues were introducing them all at once:
> Netburst debuted a mountain of new microarchitecture techniques. Some of these had been implemented by other companies, but were new to Intel. In general, the more you depart off the known path, the more risk you take. More changes mean more moving parts to tune and validate. Intel clearly took on way too much risk.
Especially in an architecture whose fundamental concepts and tradeoffs turned out to be unsound.
>As the article specifically notes, the issues were introducing them all at once:
And you could say the same thing about Itanium. (In fact, Gordon Bell did.)
The real problem was Intel's focus on frequency above all else at the time. They were demoing I think a 10 GHz processor at Intel Developer Forum at one point.
A few years later as the world had pretty much entirely shifted to multicore, a senior Intel exec I was doing some work for basically told me that of course they knew the challenges around ramping frequency. But Microsoft was apparently very concerned about the ability of Windows to use multiple cores.
There was a lot of concern in the industry at the time around how well software--especially on the desktop--would be able to take advantage of multiple cores.
>The real problem was Intel's focus on frequency above all else at the time. They were demoing I think a 10 GHz processor at Intel Developer Forum at one point.
I don't know if they demoed that or not, but I can remember early claims in the press that they expected the P4/Netburst architecture to scale to 10GHz.
Intel was surfing the Moores Law frequency curve. The message they were trying to drive home is that if you stick with us, your existing code will get faster with each release.
They knew they couldn’t keep it up. But when you have a process advantage you want to maintain it. To put it another way, they wouldn’t be sacrificing and advantage to make multi core chips.
This is why I think intel parts are so power hungry, this pathological focus on single core perf.
I don't know if you can say that without the trace cache and super long pipeline (31 stages!). More in general, the problem with Netburst was that it was extremely inefficient (high L1d latency, steep branch prediction cliffs, smallish L1i trace cache) and tried to solve the inefficiencies just by throwing transistors and hertz at it.
That's absolutely not how Sandy Bridge is designed. It's clearly a Core derivative with a uop cache, if you look at the overall structure of the pipeline. Yes, other ideas have been successfully integrated into Core derivatives such as SMT, better branch predictors and the Alpha-like OoO engine, but then even the first in-order Atoms had SMT.
> It's clearly a Core derivative with a uop cache, if you look at the overall structure of the pipeline.
Er, no. The structure of the pipeline is exactly what it took from the P4. P6 derivatives, including core, store in-flight results in the ROB, and then write them back to the RRF at retire. RRF read ports are a limited, shared resource, with full throughput only achieved when most instructions get their operands from the ROB.
P4 and SNB store only pointers in the ROB, putting all results directly into the PRF, with no work done on retire. The PRF has enough ports to serve all requests. This is an entirely different way to organize the cpu.
You are completely over-complicating things. P6 derivatives stuck with “traditional” Tomasulo algorithm for way too long. Everyone else moved to pointers for arch regs long before Intel did: MIPS 10k, DEC, AMD, Intel P4, etc, it isn’t really that big of a deal and is actually a simplification from the design aspect. There hasn’t been a total overhaul of CPU arch in decades.
For whatever reason, Intel called this mechanism “marbles”. There are a bunch of names in the industry for this: PRF, arch pointers, etc… its all the same thing.
By the way, moving to arch pointers is precisely what adds additional work on retirement: you need to do a walk or checkpoint restore to revert the arch state back to where it needs to be. With P6/Tomasulo, it is free. You got it backwards.
The PRF is what I meant by "the Alpha-like OoO engine". It was not an innovation specific to Netburst, AMD uses it too nowadays and it's not really possible to do otherwise with 256- and 512-bit AVX registers.
The structure of the Netburst front end depended heavily on the trace cache while the Sandy Bridge uop cache is more like an "L0" that can miss sometimes. The back end of Sandy Bridge doesn't need the messy replay system that was in Netburst. So this is what I meant by the pipeline being more similar to Nehalem.
I love about HN that we can be having the (totally normal) speculation food fight and then someone just casually drops by with: “It was X, I was in the room when it happened”.
I remember, how many hate calls appear of "near same" cpus, with near same frequency and near same cache size, but one on P3 microarchitecture and other - netburst :-E
They had similar superscalar abilities (2 instructions per cycle, in order) but it was not based on the P5 as far as I know. It was a pretty deep pipeline, unlike the one in the P5, and didn't have any AGU stalls.
That did keep some previous aspects of AMD design, notably subclusters of shared last level cache on a single chip. Zen1 and Zen2 used 4 core clusters of shared L3, with accesses to the other 4 core cluster requiring a bit more latency as it was going outside of the CCX. Zen3 changed this, so it no longer applies in the same way.
If failure really was a foundation for success, we reimplement all these technologies we software emulate now and where the patent has expired so we could implement them in hardware, including the firmware by hardbaking it as an ASIC (certainly ASIC memory is more dense than flash and obviously more so than SRAM).
For example, putting a pre-Y2K pentium on a chip with soundblaster, using diodes for power, and using wireless communication. It'd make the RISC-V ESP32 look wimpy.
I don't know if they put a soundblaster, but Intel's first generation of Edison boards is basically a 32nm pentium (complete with LOCK prefix bugs) combined with wireless stuff in the footprint of an SD card.
I wonder where the cutoff is where the value in the existing (legacy) x86 ecosystem beats the performance offered by a new platform like RISC-V.
The software stacks available are probably going to be more battle-tested. I'm also curious if there's more potential in resuscitating legacy code at the "application" level, if you knew you could run it in an embedded box which drew a fraction of a watt. Maybe all you really need is something that already existed as a DOS application.
However, it's comparing apples and freight locomotives. The main part of the x86 historic story revolved around desktop computing and keyboard-mouse-screen interactions[1], and the ESP32's is about internet-of-things and GPIOs, so even if the x86 module is beefier, it's tooling and connectivity may not be what you want out of the box if you were considering an ESP32.
On the other hand, we do see some places where they're clearly trying to hammer things like low-end ARM and ESP32 into vaguely more screen-and-keyboard things. I'm thinking of things like dedicated word-processor appliances (Alphasmart, etc.) or 300-in-1 knockoff consoles, where you could reasonably implement them as a 586-class CPU running off-the-shelf-software.
[1] Yes, I know full well there are loads of embedded x86 environments that look nothing like a PC. In university, our assembly language course involved setting up an 80186 developer board tethered to an ADM-3 terminal to be the world's most expensive digital clock.
Apple’s M1 instantly made Windows laptops obsolete. Why go back to much worse battery life and loud fans?
I checked Intels roadmap and it looks promising. They seem to aim for parity with TSMC before 2025. It’s going to be interesting if they win some of Apple’s business and make ARM chips for them.
Actually good software and choice? I don't care about having to plug in twice a day because I'm not working in remote mountain tops and I don't care about occasional fan noise.
I'd rather be able to run the OS I want with the software I want than the crap Apple wants me to run with them calling all the shots. Apple silicon could be literally twice as fast and I wouldn't care. I'm not that impatient and freedom is IMO more important than speed.
Because it's not like windows laptops stood still in the face of the m1, and we're already seeing them get ahead in many use cases for lower power chips than the m1 pro when you look at the comparisons with the 1260P/6800U:
Historically speaking, Intel's roadmaps have always looked quite promising. However, they're also known for not really delivering on them. I seriously doubt they can catch up to TSMC by 2025 (if ever).
Nah. Just having tons of manufacturing capacity on 14nm and 10nm and being able to sell cheap can guarantee Intel a place at the table for a while. They don’t compete for manufacturing at TSMC and this is their current advantage.
Apple M1 and Intel have continued to do well, because it was silicon you could actually get for the last two years.
Short to mid term: they can't compete on performance per watt, but they can still sell fast power hungry chips.
Mid to long term: they can't compete on absolute performance or price when their transistors are a lot larger than TSMC's. Imagine AMD at 2nm by 2025 and Intel still stuck at 10nm++ whatever.
The high end semiconductor industry is a winer takes it all game.