> After publication, a spokeswoman for Microsoft got back to us with some extra details. "E2 is currently a research project, and there are currently no plans to productize it," she said.
> "E2 has been a research project where we did a bunch of engineering to understand whether this type of architecture could actually run a real stack, and we have wound down the Qualcomm partnership since the research questions have been answered."
> As for the missing webpage, she added: "Given much of the research work has wound down, we decided to take down the web page to minimize assumptions that this research would be in conflict with our existing silicon partners.
> "We expect to be able to incorporate learnings from the work into our ongoing research."
However, it's likely to cause panic at Intel for the long term. It may prompt Intel to move into Microsoft's territory, or partner more closely with competing OS vendors (chrome?)
> All modern and advanced compilers convert source code through various stages and representation into an internal data-flow representation, usually a variant of SSA. The compiler backend converts that back to an imperative representation, i.e. machine code. That entails many complicated transforms e.g. register allocation, instruction selection, instruction scheduling and so on. Lots of heuristics are used to tame their NP-complete nature. That implies missing some optimization opportunities, of course.
> OTOH a modern CPU uses super-scalar and out-of-order execution. So the first thing it has to do, is to perform data-flow analysis on the machine code to turn that back into an (implicit) data-flow representation! Otherwise the CPU cannot analyze the dependencies between instructions.
> Sounds wasteful? Oh, yes, it is. Mainly due to the impedance loss between the various stages, representations and abstractions.
The trouble was that because the machine code is pretty specific to the internal structure of the CPU every time they released a new major revision of the CPU the existing executables all had to be recompiled. All of their customers needed to get an entirely new OS, new third party software, they had to recompile all their own code, everything. This proved too much of a burden for many and popularity of the architecture suffered.
The other problem of these kinds of architectures is that they're quite inefficient at encoding code with lacks inherent parallelism. The instructions are long and most of them have to be NOPs if most of the execution units are idle - which is a lot of the time. This code bloat in turn wastes memory bandwidth and instruction cache space, making them overall not as efficient at using precious cache space as architectures with more compact instructions.
Recompiling and increased code size would've been fine if that were the only concerns.
The big problem is that so far, in general purpose workloads (the type with heavy control flow and gnarly memory access), VLIW was never able to match OoO in performance in terms of extracting enough ILP.
And it seems recompiling is relatively easy enough for servers? Where everything are in controlled environment?
I contest "now". It seems we ran into security issues at latest with Sandy Bridge (lazy FP) but many even earlier.
> the "Itanium" approach that was supposed to be so terrific—until it turned out that the wished-for compilers were basically impossible to write
- Donald Knuth, source: http://www.informit.com/articles/article.aspx?p=1193856
But I was more just talking about things like branch predictors. You can be informed by runtime data without being a full on JIT, which is the world very nearly every modern CPU lives in these days.
There's a good article here spoiled by the bitrotted formatting: http://archive.arstechnica.com/reviews/1q00/dynamo/dynamo-1.... It looks better if you use Firefox Reader view.
Today's CPUs contains a kind of JIT compiler implemented in hardware. There is no reason why this JIT compiler couldn't be implemented in Software.
A lot of business-software runs already in some kind of VM, with a JIT compiler. This JIT compilers could easily spit out the hardware depended low-level code instead of e.g. x64 ASM. So a new hardware revision would only require an update to the runtime's JIT compiler.
I think IBM is doing something similar for decades now with their mainframes. "Native" executable are "only" some byte-code so you don't ever need to recompile your applications when you upgrade the hardware.
So long as we value software freedom we can take steps to defend software freedom and continue to see practical gains. But proprietary software leads us to a dead-end waiting for a willing proprietor to do something for us (and thus providing a result we cannot trust).
I’m not sure what the solution is, but a recent HN post caught my attention — it was about how quickly Germany caught up to England’s industrialization in the 19th century when books weren’t copyrightable. This may provide a clue. 
Try to get source code for Apple, Amazon or Google products...
Granted they don't make it very easy to run, but it can be done with a HELL of a lot of effort.
This seems to be a fairly close descendant of the older dataflow designs that also inspired out of order computing. The problem with dataflow processors, as I understand it, was that the fact that they weren't pretending to execute instructions in order meant that they didn't provide for the precise exceptions you need for memory protection, multiplexing across threads, etc. Is there a standard EDGE solution to that?
HP was the only one who could sell them and Oracle put restrictions on how they could be used.
Wheb you have an ultra niche product that also happens to require a lot of ground work which is utterly incompatible with everything else on the market you aren’t likely to succeed.
If anything given the handicap that Itanium was playing with it can be considered an astonishing success.
I believe if that wasn't the case, we would be using Itanium laptops nowadays.
 The first iteration (I have one in the garage) was ~ 486 level fast and, sure, the final iterations were fast, but that took an AWFUL lot of silicon. The perf/transistor is terrible, even worse than x86 (which is bad).
That you can't run object code for one ISA on a totally different architecture is kind of the default state and you solve it by recompiling your code. It's why we have portable programming languages. No need to re-write your software except for some assembly routines in the OS and support libraries.
The Wintel dominance really fucked up the industry, badly.
Obviously everyone is entitled to their opinion, but I don't find your argument convincing. ARM had a much better story even with early portables/laptops with Windows CE, and were a dismal failure -- just like the later Surface RT.
ARM had a power/battery-life advantage for laptops and support from Microsoft, and that didn't help; Switching to Itanium would not even have that. If Itanium found any general market success, it would have only been because of Intel's dominance. Luckily for us, that didn't happen.
ARM has just as crappy a story except from the opposite way but not the performance or PC. They had power consumption but didn't duplicate their early PC support in the US. Palm and Windows CE handhelds did ok but go clobbered by the iPhone. ARM never had a big launch with multiple PC OS vendors. ARM also didn't have motherboards available in PC configuration for a lot of its lifetime. That's what made the Raspberry Pi so special since it was an actual motherboard available to people and not some dev / eval board. Now, we see ARM showing up in the server and PC market because the performance is there, smart phones have paid for the research, and ARM is customizable / multi-vendor.
If we had been lucky, IBM would have chosen the 68000.
Palm and Windows CE were there since 1998, maybe even before. the iPhone came out in 2007. There were multiple ARM PDAs at the time (Compaq's iPaq, HP's Jornada, Palm's Pilot, many windows CE devices).
> If we had been lucky, IBM would have chosen the 68000.
Totally agree. The 68000 ISA is so elegant compared to the mess that is x86.
Palm and CE are 1996. Like I said, they did ok, but never sold in the volume that the iPhone and later Android achieved.
I dearly miss the 68000 and seriously sometimes wish the whole PowerPC thing didn't happen in favor of continued 68000. If the x86 proved anything its CISC / RISC wasn't exactly clear cut. The 88000 wasn't that bad except for a price that makes me think they were doing a bit of hallucinogens. I never got to play with ColdFire.
Also, jit runtimes, once you have a hardware specific build of your browser (or of your server stack) you are 90% there.
I chuckled at this one.
The designers claim a 10x speedup compared to existing processors. Their presentations are online on youtube, esp. the security related ones are amazing since they throw away alot of current design in x86/ARM CPUs.
Most likely they'll mitigate the waiting problem by hyperthreading. GPUs do extreme hyperthreading to stay busy without any kind of out-of-order execution.
EDGE compilers construct "blocks", each made up of many instructions which are tightly connected, and send the block to an execution cluster which will dynamically schedule each instruction within the block as it sees fit.
Communication of dependencies across blocks is more expensive. Hopefully there is enough parallelism across blocks that you can execute multiple blocks simultaneously, in addition to ILP within the block itself.