I guess we should be thankful they aren’t still calling it EM64T :) I’ve also heard that the term AMD64 also originated from marketing—the internal name during development was x86-64 (which everybody but Microsoft ended up calling it anyway).
"amd64" is the official name, per AMD; just as "ia32" is the official name of x86-32, per Intel.
The fact that ia32 (and it's sub-classifiers: i386, i486, etc) is still somewhat used, while amd64 has almost completely disappeared is 100% due to Intel's marketing shenanigans in the 00s.
For those truly too lazy to click through to the article, x86-S stands for "simplified," with the idea being to boot directly into 64-bit mode instead of booting into 16-bit and bootstrapping to 64-bit mode. 16-bit mode would be removed entirely. It's not clear to me if 32-bit mode would be axed as well, or if it would be retained (maybe partially).
> It's not clear to me if 32-bit mode would be axed as well, or if it would be retained (maybe partially).
I've been reading the detailed pdf to see more details. It looks to me like 32-bit mode, specifically in the sense of a 32-bit distribution of modern software, is being retained. Segments would be converted more towards their 64-bit extremely attenuated mode, rather than the more robust functionality they have right now. So most 32-bit software shouldn't be affected, unless you're getting really dirty with the segmentation features or trying to support something like Win16.
> Segments would be converted more towards their 64-bit extremely attenuated mode, rather than the more robust functionality they have right now.
Which, ten years down the line, should free up a bit more opcode space when they decide to no longer support CS/DS/ES/SS segment prefixes. The space could be used for instructions that are the same in 32-bit and 64-bit modes.
The primary ("first byte") opcode space is quite busy in 32-bit mode. Intel and AMD figured out a way to "hide" extra instruction prefixes in illegal modes of certain 16/32-bit instructions. Intel did this with the VEX prefix and AMD did it with the XOP prefix. The instructions they chose were LES/LDS and POP. This is the reason why some of the bits in those multi-byte prefixes have to be inverted, btw.
Having 4 clean bytes available with no need for such silly games should prove useful in the future.
The 4 string I/O instructions they suggest removing will be useful immediately.
Removing all ring 3 I/O instructions (also part of the plan) will free up 8 further bytes in the primary opcode map for user space code. This could perhaps be used to provide shorter aliases of certain existing encodings -- so the instructions in question would still be available for kernels but applications would see better encoding density.
And then there is the 67h byte being freed as well due to removing address size overrides.
If they are sensible, they will also remove the BOUND instruction from 32-bit mode, but that is not in the current version of the plan.
So what are the benefits?
Free space in the primary opcode map => better encoding density... and they might not be doing this to give us shorter programs. They might be doing this so the parallel instruction decoders can handle more instructions per cycle.
Getting rid of 16-bit data will make the data flow engine in future CPUs simpler (no more partial register stalls).
They also get to remove lots and lots of special cases that undoubtedly cost engineering time and validation time.
> Which, ten years down the line, should free up a bit more opcode space when they decide to no longer support CS/DS/ES/SS segment prefixes. The space could be used for instructions that are the same in 32-bit and 64-bit modes.
The document explicitly says that segment prefixes are _ignored_.
> Removing all ring 3 I/O instructions (also part of the plan) will free up 8 further bytes in the primary opcode map for user space code.
It does not unless the same codes are removed in ring 0 as well. But if ring 0 IO commands are converted to multibyte, why ring 3 IO commands can't get the same?
> And then there is the 67h byte being freed as well due to removing address size overrides.
Again, not ignored - unsupported styles get GP instead.
> Free space in the primary opcode map => better encoding density...
Won't happen. More so, most encodings are shared with 32-bit mode and still developed this way.
> Getting rid of 16-bit data will make the data flow engine in future CPUs simpler (no more partial register stalls).
This is missed as well. 16-bit data are still handlable. Only addressing is disallowed.
Looks like you will still be able to run 32 bit user mode applications under a 64 bit kernel. I'm not sure, but I think 32 bit guest kernels get the ax as well.
"
3.3 Removal of 16-Bit and 32-Bit Protected Mode
16-bit and 32-bit protected mode are not supported anymore and cannot be entered. The CPU
always operates in long mode. The 32-bit submode of Intel64 (compatibility mode) still exists.
...
"
> I'm not sure, but I think 32 bit guest kernels get the ax as well.
This is somewhat beyond my knowledge of system programming, but my read is that it's just possible to support 32-bit guest kernels in the VM with some more emulation of the de-supported instructions. I don't know how many of those instructions already cause VM exits.
If I had to list one hundred ways x86 could be improved, not booting in 16-bit mode wouldn't be on it. It's just an incantation. There's no fallout from it.
If we're talking firmware, the gigantic mess that used to be called ACPI, still leave users suffering every day for example.
If we're talking firmware, the gigantic mess that used to be called ACPI, still leave users suffering every day for example.
And replace it with what? APM? I'm completely unaware of another power management abstraction that has even a fraction of the capabilities and proven track record. Sure if your apple or IBM you can build a machine with a custom tightly bound firmware/OS/hardware interface that works well. But, if you look at for example the Arm ecosystems attempts to use DT in the linux kernel and manage everything from a heavyweight kernel it is a fools errand. It has successfully been failing for fully 1/2 the lifetime of ACPI. Never mind no one in their right mind designs a system these days where the big cores try to manage their own power/perf curves, nor even modeling the system power/cooling without using a small low power core continuously monitoring and adjusting the hundreds of fine grained power domains on even smaller machines needed to achieve modern perf/power parity.
I think that depends on if the existence of 16 bit mode has effects on IPC rates of the processor.
If it's 2% of instruction cost, then kill it with fire. If it's raising the development cost by 1%, kill it with fire. If it's a fixed cost and isn't hurting anything, then meh.
Unless they want to complete eliminate 16-bit operations, it won't matter.
...and eliminating 16-bit operations would be pure ideologically-driven stupidity, given how many classic RISCs with word-width-operations-only suffered greatly and the ones that survived (like ARM) evolved to add them back in.
What they're getting rid of is all of the extra fun memory management modes the x86 processor has, condensing everything down into just the paging mode with extremely gimped segment support that's all that's available in x86-64. That suggests to me that someone within Intel is thinking about basically rewriting the memory subsystem, where having to support all the fun things like real mode or virtual 8086 mode is much more painful. The 16-bit data operations are still around, with the 0x66 prefix, and even the ability to default a segment to not needing the 0x66 prefix to get 16-bit registers may still be around, though I'm not savvy enough on the retained features to say for certain.
To be honest, there are two legacy x86 features which I suspect do have a noticeable tax on the support in the execution units: the x87 FPU stuff (due to the 80-bit floating-point types, and a variety of hardware-implemented transcendental instructions that need to be supported), and the legacy addressing modes.
where having to support all the fun things like real mode or virtual 8086 mode is much more painful
If they're going to still support virtualisation to run legacy code, which is what the article is implying, they'll still need the logic to deal with all the existing modes being used inside a VM.
...and those modes are honestly not hard to support. (Protected mode) segmentation is far simpler than paging, and realmode is even easier given that the original 8086 with its 29k transistors could implement it.
From what I can tell, they're not supporting them in virtualization. Instead, they're set up to cause processor traps, which the hypervisor could catch and use to emulate those instructions if necessary.
> ...and those modes are honestly not hard to support.
I have to imagine that there's a decent amount of extra logic in the load/store units to use the current processor mode to work out memory semantics, and now I'm thinking about questions like "what would happen if one thread is in virtual 8086 mode while the hyperthread partner isn't". That's a noticeable burden in things like validation.
And even if the implementation of segmentation is simpler than paging, that's not the issue; the issue is that it's a different pathway, and potentially one that cuts deep into the execution unit. If it's the sort of thing where eliminating support for anything other than paging means getting an L1 cache hit back one cycle faster... then sign me up.
Clock cycles are dictated by the gate timing of the slowest fractional instruction. If they’re getting rid of conditional logic, that could just reduce heat a bit, or if this is now the tall tent pole, it could allow them to clock the chip a little higher at the same gate pitch.
Or, they are adding something new that puts them over that budget, and cleaning up the conditional branching lets them stay under it.
If you mean the graphics relatively at the beginning then no, they refer to boot phases and OS level stuff but that's not fully the same as what you need to run 32bit applications in a 64bit OS.
> 16-bit and 32-bit protected mode are not supported anymore and cannot be entered. The CPU always operates in long mode. The 32-bit submode of Intel64 (compatibility mode) still exists.
Yes but there is a bit of a catcher due to some 32bit programs abusing certain instructions created for handling segmentation. And segmentation support will be partially removed. But no idea if the instruction in question will be gone.
Ok, so maybe you do that in boot order or some boot motdes. But ... remove it?
There is SO MUCH silicon these days, honestly I think CPUs should start putting instruction set compatibility cores in, so you still dedicate .5% to a 16 bit compatibility core, 1% to the 32 bit mode, and maybe even an ARM mode.
Honestly, since all CPUs are basically fronted by microcode, why can't modern CPU microcode be compatible with multiple instruction sets? Or provide some sort of programmable microcode section so emulators aren't done in software?
While I'm bitching, what happened to actually running multiple OSes at once? I have like 16 cores, gigabytes of RAM, multiple display outputs ... why can't I actually run both Linux and Windows and OSX all at once without some bad virtualization container inside one overarching OS? Like, can't Intel manage that with chipset and CPU logic?
Intel (and AMD) are DESPERATE to figure out things to get people to use all their extra cores and silicon for. Well, how about making multiple OS on the same machine an actually pleasant experience? Hell, I would love to run a smartphone OS as well.
why can't I actually run both Linux and Windows and OSX all at once without some bad virtualization container inside one overarching OS? Like, can't Intel manage that with chipset and CPU logic?
Sounds like what you want is a LPAR or proper type 1 hypervisor, which despite everyone claiming to be type 1 none of them really are. And this is largely caused by the various device HW standards on the PC not natively supporting virtualization (or like nvidia charging extra for SRIOV functionality on their GPUs). So what happens is that all these hypervisors need something like VirtIo or hypervisor emulation of devices (ex e1000/ne2000, sb16, etc) to provide generic storage/networking/display/input devices on top of devices which don't themselves provide any virtualization support. Which in PC land is pretty much everything that doesn't support SRIOV. Which in turn means that a heavyweight OS +driver stack is required to manage the HW.
So, you could build a PC based LPAR hypervisor. It would just be limited to the SRIOV capable devices, or plugging in piles of adapters, each one uniquely bound to a single partition. Ex: https://www.hitachi.com/rev/pdf/2012/r2012_02_104.pdf
64Bit (OS) only and kicking out legacy cruft which doesn't just add complexity but also can in some edge cases can make security harder sound like a very sane idea, just maybe kinda late. I mean they probably could have started pushing this in some areas, like server and high-end CPUS, like 5-10years ago.
This makes no sense to me. Backward compatibility is a huge competitive advantage for Intel, and IMHO, it's royally messed up that vm86 mode doesn't work in 64-bit mode.
One DOS application I use was hurt by this: "old DOS OrCAD". It works well in Windows-XP on a 32-bit machine, but does not work at all in 64-bit Windows. (It's actually a 32-bit DPMI program and has drivers to use Windows GDI so you don't have to mess around with drivers).
More evidence: IBM 360 mainframe software still works in Z/OS.
It might be worth it for non-generic computing devices like cell phones, but Intel missed the boat there already.
16-bit x86 code is ancient enough that a software emulation solution can be fast, efficient, and perfectly compatible, so there's no need to keep support for it in hardware at this point.
Microsoft could have used emulation to keep old 16-bit x86 apps working with Windows for AMD64 (in fact they had done this before for non-x86 architectures) but I guess they weren't interested. WINE and DOSBox did it though.
16 bit code is most times braking due to Windows being itself incompatible so you need to emulate it anyway.
And if you have very old but important windows legacy software you either want to also run old hardware and windows OS anyway or run enumeration anyway.
On linux this is somewhat already the case, most distroes are by now for OS 64bit only with at most 32bit application support but not always enabled by default.
vm86 doesn't work in 64-bit mode, but you can simply temporarily switch down to 32-bit mode, and them vm86 works just fine. Early amd64 Linux would do this, and there was even a kernel module to do it after they dropped support for it.
Microsoft had other reasons to drop 16-bit support in Windows 64.
Potentially one if a large customer's paying you for it.
At work, I'm helping a client upgrade from a decade old version (v12) of our product and having to copy over the switches to keep the even older (v11 and previous) behaviours
...and taken up mostly by cache and vector execution units.
An 8086 has 29000 transistors. A modern CPU has several billion. The amount needed to implement the 16-bit subset of the ISA is an absolutely tiny fraction of the total area.
My immediate sense is that the entire vector system easily dwarfs the 16bit system.. as I'm guessing most of the modern 16bit mode is entirely in microcode.
In the case of OrCAD this was very easy because it already had plug-in architecture for graphics. It had this because even the mouse was not standardized in the early days.
but also it's a stretch in general as for personal computers arm is the dominant platform since years (there are way more phones, tablets, etc. then x86 laptops/desktops by now, all of which fit the original definition of personal computer)
I can certainly see how this is beneficial. At the same time it seems like it's a local optimum, to be eclipsed by other architectures. How much software really depends on 64-bit x86 architecture? And for how long?
A large amount of server software can be reasonably ported to a new architecture. New platforms can adopt new architectures (phone/tablet, AR/VR). General purpose software like a web browser abstracts hardware, as does very popular software (as Facebook had, and WeChat does).
Apple hasn't been tied to architectures and transitioned a number of times, always optimizing the whole rather than optimizing intermediate/stationary fixed points. If Intel is to make it to the next phase, it needs more than incremental improvements to compete. I hope that there's a path/future for AMD and Intel to evolve x86 and thrive but it won't be a given or easy.
The almost 50 years of backwards compatibility (along with the accompanying creation of a huge amount of documentation) is one of the strongest reasons for choosing the x86/PC.
With each feature removal, they weaken that argument and push their (prospective) customers towards reconsidering all the other competitive CPUs out there like ARM, MIPS, RISC-V, etc. that are not distant in performance.
Intel has made SoCs for phones, tablets, and other miscellaneous devices, but they weren't PC-compatible. Not surprisingly, they were not well-received.
I believe the only path forward for x86 is to be a "x86 accelerator", a x86 mode in some future AMD/Intel RISC-V processors, perhaps limited to unprivileged "usermode".
Easing the transition until x86 can be relegated to emulation.
Easing the transition until x86 can be relegated to emulation.
Uh, isn't that the point? X86 has been "emulated" since the P6/nexgen 586. That is why people talk about the architecture not mattering much anymore in the face of very similar OoO microarchitectures running arm/ppc/x86/etc code.
Or I guess you could look at it another way, CISC won because it turns out that implementing specific high level functions like say AES, memcpy or vector ops are better implemented with explicit hardware blocks, or provide additional hinting that gives the micro architecture an advantage, while the costs of cracking inconvenient sequences into simpler ones means that effectively the processor designer has the choice between attempting to build hardware for something like Arm's Load/store Multiple or simply breaking it into a sequence of uOps (which one might suggest is basically the modern equivilant of micro code).
And this is why I'm a bit skeptical of RISC-V at the moment. No serious architect of high performance microprocessors can think that designing an architecture to match a modern low transistor budget microcontroller is a good design choice. Because like every other single RISC processor, what happens over time is that they are forced to add higher level instructions for the additional hinting they can provide. If you look at the past 10 years of arm architecture updates you can read between the lines of "the RISC solution for X we were using couldn't compete with vendor Y's product so we added this instruction/etc to solve the problem". Same with POWER, they even gave up on their famous inverted page tables because it was basically a software managed TLB and it turns out building page walking hardware is a better solution than writing software to do it. Its also how no one takes a processor without some kind of SIMD seriously anymore.
That said, its not like RISC-V can't evolve into something "good enough" its just going to be stuck with a bunch of ugly baggage by the time it gets there because its starting with a false set of premises.
Fun fact: VIA's "alternate instruction set" that gives access to the uops shows that it's basically a slightly modified MIPS, with more useful x86-ish addressing modes and instructions.
Maybe we'll see RISC-V backends with x86 frontends in the future.
I wonder... With MS's virtualization based security, and the Arm/Cheri proposals, if it isn't time to revisit much of the capability hardware that AMD removed when they designed long mode. Never mind the long list of C mitigation hardware that both intel and arm are now reinventing.
The 4 layer security architecture that the 386 originally proposed, user, privileged library/service, driver space, and kernel, now with the addition of a hypervisor layer would be a good starting place for modern OS privilege separation. Plus base + size restricted pointers with an attached privilege provided by the 386's variable length segments are the kinds of things that CHERI is trying to bolt on using a slightly different paradigm. Even call gates if used between different modules/etc would have acted as a control flow mitigation technique likely partially obliviating the need for CET, PAC and BTI.
So it wouldn't surprise me at all, if in another 20 years 90% of the 386's capability model has been bolted back onto AMD's long mode in a slightly different, and likely worse way.
>So it wouldn't surprise me at all, if in another 20 years 90% of the 386's capability model has been bolted back onto AMD's long mode in a slightly different, and likely worse way.
It will surprise me if x86 or arm are still relevant 3 years from now.
The way I see it, as the first batch of very high performance designs releases, including but not limited to Ventana Veyron before year end, as well as Tenstorrent Ascalon in 2024, it will be obvious to any leftover skeptics in the industry that it is over, and RISC-V has won.
*cough*[1]
[1] https://en.wikipedia.org/wiki/X86-64#History