Funnily enough, everyone I know who works at Intel who's expressed a thought about Itanium would not agree with the statement that Itanium is superior to x86.
Itanium of course was a huge win for Intel. By convincing all the RISC Unix guys to jump on the sinking Itanium they captured the high end server market for x86.
I have no idea if those Intel employees had direct experience with Itanium or they merely drank the "Itanium is bad" kool-aid. The hardware was pretty bad in the early years but Intel fixed those problems (speed, power, feature size, etc.)
At the end the problem was timing: We didn't know how to write compilers that could figure out micro-scheduling in advance so software couldn't take advantage of the Itanium, and when AMD64 came out and added a bunch of registers and did all that nasty instruction reordering and speculative execution in realtime on-chip everybody said "just use that." Nowadays compiler writers have a much keener understanding of static analysis and almost certainly could take advantage of the chip. And vulnerabilities like Spectre and Meltdown would never have happened because all that optimization would have been done in the compiler where it belongs.
In fact, several of them did have direct experience with the Itanium. One of the comments one had about it was the hardware team had a bad habit of creating problems and hoping that the compiler would fix it.
One of the things about Itanium that I do know is that it's basically VLIW. And VLIW is a technology that sounds really good on paper, but it turns out that you just can't get that good results for realistic workloads--they've basically been abandoned everywhere they've been tried except for DSPs, where they live on apparently because DSPs have so few kernels they care about.
And speaking as a compiler writer, no compiler will ever do a good a job as the hardware can at maximizing instruction-level parallelism. The compiler is forced to make a static choice while hardware can choose based on dynamic effects that the compiler will never be able to predict.
> The compiler is forced to make a static choice while hardware can choose based on dynamic effects that the compiler will never be able to predict.
At the limit of complete generality the Halting Problem guarantees your statement is correct. But it also appears to be extremely difficult to do this at runtime without side-effects that change global state such that information leaks. The real cost for speculative execution may be that it can't be done in a purely-functional manner.
It is true that we currently cannot hide the microarchitectural effects of unsuccessful speculation, as exposed by transient execution attacks like Spectre and cousins.
So what?
Most workloads / customers are much more about performance than security. We can trivially remove such attacks by switching off all speculation (branch prediction, caches, OOO, prefetching). Indeed you can buy CPUs like that, and they are used in environments where safety is of extreme importance. The cost of this is an orders-of-magnitude loss of performance
Most workloads (e.g. Netflix streaming, Snapchat filters, online advertising, protein folding, computer games, Instagram, chat apps) are simply not security sensitive enough to care.
Building a competitive general purposes CPU costs a lot (probably > $1Billion end-to-end), and who would buy a CPU that is safe against Spectre but 3 orders of magnitude slower than the competition? (Not to mention that there are many more dangerous vulnerabulities, from Rowhammer to Intel' Management Engine, to rubber hose ...)
The superficial VLIW-like encoding is abit of a red herring. The instruction bundles that resemble VLIW aren't really semantically too significant after instruction decoding. The grouping by stop bits is the Itanium specific feature that affects scheduling and depenendcies more.
It is not possible, in June
2021, to build a compiler that can schedule general workloads as
well as an out-of-order scheduler can at run-time in a modern CPU.
Note that the scheduling we are talking about is mostly to do with
data location in order to hide the latencies of data-movement: In order to hide data movement latencies, you need to know if needed data is in L1, L2, L3, or main memory. In general workloads this is very much data
dependent. How is a static, or even JIT, compiler supposed to know
with enough precision where data is located at run-time?
An additional problem is that compiler scheduling needs bits for
encoding scheduling information, which does not come for free:
those bits cannot be used for other purposes (e.g. larger immediate arguments).
For more specific workloads where data movement is more predictable,
i.e. not itself data-dependent, the situation is different, whence the
rise of GPUs, TPUs, signal processing cores etc.
Spectre and Meltdown are orthogonal. They come from speculation,
e.g. branch prediction. Compile-time branch prediction is similarly
infeasible for general workloads.
It is impossible to schedule any reasonable program in advance because you don't know what the latencies of any memory loads are. (Also, you'd need a lot of register names in the ISA - modern OoO CPUs have hundreds of physical registers backing ~16 register names.)
Besides that, if you think x86 is bad you should try using some supposedly clean architectures like MIPS or Alpha. They may be RISC but they are definitely too RISC.
I'm no CPU designer, but I seem to remember that Intel took the crown back from AMD with the Core 2 Duo. IIRC, that involved going back to an older design and going down a different path.
If Itanium was just ahead of its time, Intel could always try again. ARM and RISC-V are getting a lot of attention these days. Mill seems to be missing the window of opportunity. 2-nm fab seems like a fantasy. Announcing the new Gadolinium processor family wouldn't be a complete cringe.