Hacker News new | past | comments | ask | show | jobs | submit login

From the great article:

"x86_64 is the 64-bit extension of a 32-bit extension of a 40-year-old 16-bit ISA designed to be source-compatible with a 50-year-old 8-bit ISA. In short, it’s a mess, with each generation adding and removing functionality, ..."

Nice way of wording that! :)

It also explains the complexity of the following 10 pages of text.




See also this old Microsoft Windows 95-era joke:

32 bit extensions and a graphical shell for a 16 bit patch to an 8 bit operating system originally coded for a 4 bit microprocessor, written by a 2 bit company, that can't stand 1 bit of competition.


> “32 bit extensions and a graphical shell for a 16 bit patch to an 8 bit operating system originally coded for a 4 bit microprocessor, written by a 2 bit company, that can't stand 1 bit of competition.”

DOS was a 16 bit operating system. The 8088 (the processor of the IBM-PC) was an 16 bit (if you consider the instruction set) or 8 bit (if you consider the width of the data bus) processor.


CP/M was an 8-bit operating system.


to be fair the 8088 was essentially the same die as an 8086 with an 8-bit bus .... memory was expensive back then


The OP referenced is what is called a 'joke'. Oddly, 'jokes' are not always intended to be pedantically accurate.


Every time Intel has tried to more away from x86 (i960? Itanium? Maybe others...) they end up coming back. The years of backwards compatibility are a big selling point.


There is value in the fact that the instruction set is an abstraction. CPU's built with more low-level instruction sets became obsolete faster because they couldn't adapt to new features and maintain compatibility as easily.


Sounds like survivorship bias.

x86's longevity is due to the amount of money thrown at the problem. You could surely start with a much cleaner instruction set like the M68k and wind up with a same-or-better result after spending billions on multiple projects to invent new ways of ameliorating the complexity of the ISA, some in parallel, over time.

Or you can start by eliminating most of the decode complexity and not spend those billions, like ARM.


As much as I like the M68k, I don't think it would be easy to extend it to 64 bits. I'm looking at the MOVE instruction, encoded as:

    00ssRRRmmmMMMrrr

    ss = size
    01 = 8 bits
    10 = 32 bits
    11 = 16 bits
    RRR = src register
    mmm = src mode
    MMM = dest mode
    rrr = dest register
The obvious choice of setting the size bits to '00' for 64 bits is out of the question, because that overlaps many instructions (bit manipulations, bounds checking, several specialized move instructions). The whole instruction set is like this---where you would expect 64-bits to specified are a bunch of instructions instead.


The decoder doesn't actually take all that much space in the hardware, though. It's going to be smaller than the normal OoO logic, which means it's a pretty minor tax at best for actual hardware.


> It's going to be smaller than the normal OoO logic, which means it's a pretty minor tax at best for actual hardware.

But it's a major tax for designing that hardware, and a potential source of bugs (the more complex, the more difficult to debug and verify).


Instruction complexities can be used to save bandwidth/delay (and related energy consumption) at the cost of decoder size. Communication limitations are increasingly dominant in processors afaik; so the analysis is not so simple as to decoder size either.


Compressed RISC ISA's (e.g. RISC-V C extension) achieve density comparable to x86, with a much simpler decoder.


Saw die shots of Atom? Decoder makes more than half of the core, and there is no instruction cache.

In a loop, you will be spending more joules on decoding than actual computations.


What die shots are you looking at? I'm looking at a Silverthorne die and most of the frontend is taken up by the L1I$.

https://en.wikichip.org/wiki/intel/microarchitectures/bonnel...


My bad


I really wish Itanium had taken off. IMO it is a superior architecture that was simply ahead of it's time.

Wouldn't it be great if software instead of hardware, had complete control of instruction ordering? Wouldn't it be great to not be limited by the current SIMD restrictions? Wouldn't it be nice if you could choose to spend more compile time to get even faster programs (vs relying on the hardware to do it JIT)?

I mean, I get why it didn't happen. Stupid history chose wrong (Like making electrons have a negative charge).


Itanium was one of those scenarios where theory blew up in practice. In theory it’s great for software to have complete control of instruction ordering. In practice, software simply doesn’t have enough information at compile time to do that. As proven by the fact that even Itanium moved to an OOO architecture in Paulson.

It comes down to memory latency. Even an L3 cache hit these days is 30-40 cycles. It’s hard to predict when loads and stores will miss the cache, so there is little a compiler can do to account for that in scheduling. OOO can cover the latency of an L3 cache hit pretty well. And once you add it for that, why not just pretend you’ve got a sequential machine?


Right. Generally, memory accesses in real-world software are unpredictable enough (no matter how good the compiler is) that single-threaded execution is always going to get a big boost from OOO.

An interesting question is why Intel believed otherwise when they created IA64. I think there's a strong case that publication bias and other pathologies of academic compuer science destroyed billions of dollars in value, and would have killed Intel had it not had monopoly power.


> An interesting question is why Intel believed otherwise when they created IA64.

For numerics code, VLIW can indeed offer huge advantages. Unluckily, computer programs from different areas have quite a different structure and thus do not profit from VLIW so much.


I'm not too far in the know for how the OOO stuff works/worked within Paulson. Did OOO migrate instruction execution between batches? Did it simply ignore them all together?

I'd still imagine you'd see benefits for the same reason you see SIMD benefits (assuming you aren't doing a whole bunch of pointer chasing).


The stupid quip about sufficiently advanced compilers has actually been true until relatively recently, and shipping shared libraries has also been a thing for a while until recently (we basically ship shared libraries as statically linked these days, aka containers)


I'd say roughly around 2010 maybe 2015, compilers got to the point where they didn't totally suck at vectorization.

Before that point, yeah, they were just too dumb to be able to make Itanium fast.


Polyhedral optimization is going to be exciting too.


> I really wish Itanium had taken off. IMO it is a superior architecture that was simply ahead of it's time.

Itanium was an architecture that was designed for "big iron", i.e. fast, powerful computers. It is thus, in my opinion, much harder to "scale down" to, say, mobile devices than x86.


x86 hasn't really proven that it scales down well for mobile devices either.


Intel produced SoCs for mobile devices, concretely Speatrum SC9853i and Spreadtrum SC9861G-IA.

If you want to scale even further down, consider Intel Quark (https://en.wikipedia.org/wiki/Intel_Quark). For an analysis why it failed in the market, consider http://linuxgizmos.com/who-killed-the-quark/

To me, it seems that the central reason why these chips failed commercially is that at the lower end, SoCs offer a much thinner margins than CPUs for laptops, desktop PCs and servers.


There are some x86 Android phones, and Windows 10 Mobile's Continuum looked designed to be run on an x86 phone, but Intel killed that line of processors before Microsoft built a device.


Interestingly enough, the baseband for iPhone XS runs x86.


My friend uses Asus Zenfone which runs on Intel CPU. It works fine. Intel might have failed with economics or marketing, but engineering works.


I'd say the opposite is true. There isn't anything about Itanium that makes it worse for mobile. In fact, the opposite is true, it would be better for mobile because it was designed to push more of the optimizations into the compiler vs the hardware. That means less power required to do optimizations against running software.

Because Itanium fits with mobile just as well as ARM does for much of the same reasons. After all, Itanium is essentially a RISC architecture.

It never touched mobile because it was dead before mobile computing was really taking off. Heck, it was dead before ARM got a stranglehold on the market.


> There isn't anything about Itanium that makes it worse for mobile. In fact, the opposite is true, it would be better for mobile because it was designed to push more of the optimizations into the compiler vs the hardware. That means less power required to do optimizations against running software.

There's a big problem with that: the VLIW layout is not as memory efficient so programs were larger and the instruction cache needed to be larger to compensate. Mobile architectures have traditionally had smaller caches and less memory bandwidth to save power.

There is an interesting what-if question here: one of the big things which killed Itanium was the poor x86 compatibility meaning that while it was not entirely uncompetitive when running highly-optimized native code, it was massively slower for legacy apps even before you factored in the price. Compiler technology has improved by a huge degree since the 90s and in particular it's interesting to imagine would could happen in an Apple AppStore-style environment where developers ship LLVM bitcode which is recompiled for the target device, substantially avoiding the need to run legacy code.


Mobile is expected to run JIT translated code.

Itanium's design was finalized before JIT became important. Suddenly JIT was everywhere and spreading: Java, C# .net CLR stuff, Javascript, and more.

JIT output can not be well-optimized because that takes too long. The code must compile while the end-user waits, so there is no time to do anything good. The code will be terrible. Itanium can't handle that.


> There isn't anything about Itanium that makes it worse for mobile.

Shit code density?


VLIW has ultimately failed several times outside of IA-64. It was briefly tried for GPUs too.


It's alive and kicking on the Texas Instruments DSP chips. You can get incredible performance out of them, but you pay with horrible compile times.

To give you a taste what these chips do:

- 64 registers, 8 execution units, so 8 instruction can execute per cycle. Each instruction executes in a single cycle but may writes back the result later (multiplications do this for example). It's your responsibility to make sure you don't generate a conflict.

- for loops the hardware has a very complicated hardware scheduling mechanism that effectively lets you split the instruction pointer into 8 different pointers, so you can run multiple instances of a loop at the same time.

I wrote assembler code for that. Sudoku is a piece of cake compared to it.


I'm feeling extremely masochistic. What's the model number of one of these chips and/or a pointer to its instruction set reference?


For me it was the TMS320C64x+ There are newer versions of that chip out there with floating point support.

The glory instruction set reference is here:

https://www.ti.com/lit/ug/spru732j/spru732j.pdf



ARM64 (aka AArch64) is the best version of x86 yet.

It's clean, very little warts, they learned from their mistakes with ARMv7/Thumb2 (specifically the IT instruction).

It helps that Apple controls the whole ecosystem and could seamlessly move to ARM64. The Android transition has been making progress also.

Maybe someday we'll drop 32bit and 16bit support in x86 systems (and also in "modern" programming languages!).


Thumb2 was good. The main issue with AArch64 is that they dropped the variable instruction length. As such all instructions are huge and this significantly slows down code, especially after a mispredicted branch. I'm observing on average a 20% performance loss from thumb2 to aarch64 on the exact same CPU and same kernel, just switching executables, an d 40% larger code or so. Also something to consider, an A53 can only read 64 bits per cycle from the cache, i.e. just two instructions. That doesn't even allow it to fetch a bit more and start to decode in advance.


> Maybe someday we'll drop 32bit and 16bit support in x86 systems (and also in "modern" programming languages!).

You do realize that there exists a world beyond desktop and server CPUs, right? There are plenty of 32-bit embedded microprocessors, and plenty of applications where a 64-bit processor would be overkill.


> ARM64 (aka AArch64) is the best version of x86 yet.

Huh? Isn't that ARM and not even remotely compatible with x86?


ARM64 is what Intel would design if they learned from the lessons of x86 and got a chance to restart.


I'm sure that's not true. I've heard a great rant from an Intel CPU engineer about how CISC is a great fit for large OoO cores. Like how memory RMW instructions can be thought of as allocating physical register file resources with no architectural register file requirements, no extra instruction stream bits required, and no confusions inside the core about register data dependencies of the instruction.

It'd be fun to throw together a modern CISC-V or something that does a better job than x86 from an instruction encoding efficiency perspective, and see how it stacks up against modern RISCs.


>It'd be fun to throw together a modern CISC-V or something that does a better job than x86 from an instruction encoding efficiency perspective, and see how it stacks up against modern RISCs.

It'd probably be about the same since the combination of microcode and macroop fusion makes RISC and CISC essentially the same thing internally. You're basically just trading complexity of the instruction decoder for code density.


Well, actually they did try a few times. Itanic and the Intel iAPX 432 come to mind...



Every mobile app: "Sorry your version is incompatible with the service as it existed three months ago, you have to update."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: