
Microsoft ports Windows 10, Linux to homegrown “E2” CPU design - cpeterso
https://www.theregister.co.uk/2018/06/18/microsoft_e2_edge_windows_10/
======
TimTheTinker
Just in case you missed it, there's an update at the bottom to the effect that
work on E2 has wound down and there aren't any plans to turn it into a
product:

> After publication, a spokeswoman for Microsoft got back to us with some
> extra details. "E2 is currently a research project, and there are currently
> no plans to productize it," she said.

> "E2 has been a research project where we did a bunch of engineering to
> understand whether this type of architecture could actually run a real
> stack, and we have wound down the Qualcomm partnership since the research
> questions have been answered."

> As for the missing webpage, she added: "Given much of the research work has
> wound down, we decided to take down the web page to minimize assumptions
> that this research would be in conflict with our existing silicon partners.

> "We expect to be able to incorporate learnings from the work into our
> ongoing research."

~~~
ChicagoDave
They don’t want to scare the piss out of their current silicon partners. Who
knows what will come of this, but I’m doubtful it’s nothing.

~~~
chx
Sort of blackmail what comes out of this. You can say to Intel, if you don't
give us a nice enough discount for the current order of 100 000 Xeons for
Azure, we will go back to these plans and eventually your revenue stream will
dry up.

~~~
mankash666
While that may be true for something that's more production ready (like an AMD
or Qualcomm chip), this project seems unlikely to affect negotiations.

However, it's likely to cause panic at Intel for the long term. It may prompt
Intel to move into Microsoft's territory, or partner more closely with
competing OS vendors (chrome?)

------
PeCaN
I like that this architecture tackles a large problem in existing
architectures, which Mike Pall put rather nicely:

> All modern and advanced compilers convert source code through various stages
> and representation into an internal data-flow representation, usually a
> variant of SSA. The compiler backend converts that back to an imperative
> representation, i.e. machine code. That entails many complicated transforms
> e.g. register allocation, instruction selection, instruction scheduling and
> so on. Lots of heuristics are used to tame their NP-complete nature. That
> implies missing some optimization opportunities, of course.

> OTOH a modern CPU uses super-scalar and out-of-order execution. So the first
> thing it has to do, is to perform data-flow analysis on the machine code to
> turn that back into an (implicit) data-flow representation! Otherwise the
> CPU cannot analyze the dependencies between instructions.

> Sounds wasteful? Oh, yes, it is. Mainly due to the impedance loss between
> the various stages, representations and abstractions.

[https://www.freelists.org/post/luajit/Ramblings-on-
languages...](https://www.freelists.org/post/luajit/Ramblings-on-languages-
and-architectures-was-Re-any-benefit-to-throwing-off-lua51-constraints)

~~~
tinus_hn
The obvious answer is to completely to the instructions the cpu uses
internally. That would mean recompiling for every cpu revision though.

~~~
zik
That approach has (sort of) been tried before for superscalar architectures.
VLIW architectures were the Next Big Thing in the early 1990s. The general
idea is that the machine code explicitly told the CPU what to do with each of
its execution units. It seems like a good idea. Intel released their i860
processor and waited for the cash to roll in.

The trouble was that because the machine code is pretty specific to the
internal structure of the CPU every time they released a new major revision of
the CPU the existing executables all had to be recompiled. All of their
customers needed to get an entirely new OS, new third party software, they had
to recompile all their own code, everything. This proved too much of a burden
for many and popularity of the architecture suffered.

The other problem of these kinds of architectures is that they're quite
inefficient at encoding code with lacks inherent parallelism. The instructions
are long and most of them have to be NOPs if most of the execution units are
idle - which is a lot of the time. This code bloat in turn wastes memory
bandwidth and instruction cache space, making them overall not as efficient at
using precious cache space as architectures with more compact instructions.

~~~
ksec
That was in the 90s though. We haven't had much "design" in CPU for 5 years.
We are stuck with 4-5Ghz max, IPC seems to have its limit. And now we ran into
security issues with too much clever hardware optimisation.

And it seems recompiling is relatively easy enough for servers? Where
everything are in controlled environment?

~~~
chx
> And now we ran into security issues with too much clever hardware
> optimisation.

I contest "now". It seems we ran into security issues at latest with Sandy
Bridge (lazy FP) but many even earlier.

------
Digital-Citizen
So Microsoft is highlighting the value of software freedom and the harm of
software non-freedom: we're all granted permission to port free software (as
Microsoft claims to have done with the Linux kernel, BusyBox, and more) but
only Microsoft -- the proprietor -- is allowed to port Windows.

So long as we value software freedom we can take steps to defend software
freedom and continue to see practical gains. But proprietary software leads us
to a dead-end waiting for a willing proprietor to do something for us (and
thus providing a result we cannot trust).

~~~
jankotek
Microsoft is pretty open compared to competitors. It has long history of
providing Windows source code to universities (I recall Win2000). It also
provides debug symbols for its libraries.

Try to get source code for Apple, Amazon or Google products...

~~~
scruffyherder
Like Darwin source code?

Granted they don't make it very easy to run, but it can be done with a HELL of
a lot of effort.

------
Symmetry
Here's the Wikiepdia page on EDGE:

[https://en.wikipedia.org/wiki/Explicit_data_graph_execution](https://en.wikipedia.org/wiki/Explicit_data_graph_execution)

This seems to be a fairly close descendant of the older dataflow designs that
also inspired out of order computing. The problem with dataflow processors, as
I understand it, was that the fact that they weren't pretending to execute
instructions in order meant that they didn't provide for the precise
exceptions you need for memory protection, multiplexing across threads, etc.
Is there a standard EDGE solution to that?

~~~
adrianratnapala
What is interesting is that the new breed if timing attacks show that the the
pretense of ordering isn't as tight as we had imagined. So maybe we might once
gain prefer processors like this because the software is going to have to deal
with that complexity anyway.

------
paulgdp
How similar is this architecture to the Mill?
[https://en.wikipedia.org/wiki/Mill_architecture](https://en.wikipedia.org/wiki/Mill_architecture)

~~~
Symmetry
It's unrelated except to the extent that both are communicating the dependency
chain to the processor through the ISA in a way other than sequential order.

------
gmueckl
Compiler hints for data dependencies between groups of instructions sounds
very much like the mistake that sank the Itanium. What is different this time
around?

~~~
PeCaN
That wasn't a mistake, and Itanium did just fine. Circa 2008 the fastest
database servers in the world ran on Itanium.

~~~
pjmlp
Intel was just unlucky that AMD exists and came out with AMD64.

I believe if that wasn't the case, we would be using Itanium laptops nowadays.

~~~
scruffyherder
I don't think they would have ever made Itanium mobile, instead we'd be stuck
with that PAE nonsense, and living in 32bit world, because 'general users
don't need 64bits' ...

~~~
wmf
I think Intel wanted to eventually phase out x86 and push Itanium all the way
down the stack, not for technical reasons but because it was a proprietary
architecture that would never have competing implementations.

~~~
FullyFunctional
You are think of Intel as having a single mind. The reality, and the reality
of most large organizations, is that the ant hill is beaming with divergent
and competing opinions. In particular, there were two powerful, competing
camps: x86 and Itanium. The combined effect of Itanium performance not living
up to promises [1] and AMD introducing a 64-bit x86 extension shifted the
power balance, but it wasn't an inevitable outcome.

[1] The first iteration (I have one in the garage) was ~ 486 level fast and,
sure, the final iterations were fast, but that took an AWFUL lot of silicon.
The perf/transistor is terrible, even worse than x86 (which is bad).

------
ultimoo
> E2 uses an instruction set architecture known as explicit data graph
> execution, aka EDGE which isn't to be confused with Microsoft's Edge
> browser.

I chuckled at this one.

~~~
toasterlovin
Let us give thanks that Ballmer is no longer around. He woulda made sure it
was named Windows.

~~~
cptskippy
Or .NET

------
zaarn
People interested in VLIW (or EDGE, which sounds like VLIW but less strict)
should also checkout the Mill architecture.

The designers claim a 10x speedup compared to existing processors. Their
presentations are online on youtube, esp. the security related ones are
amazing since they throw away alot of current design in x86/ARM CPUs.

~~~
ahartmetz
They claim a 10x power / performance advantage actually, what I gather is that
it's supposed to be higher performance at much lower power. It isn't out of
order and so it has to wait a long time in case of last-level cache misses.
But it has insane throughput when it doesn't have to wait for memory.

Most likely they'll mitigate the waiting problem by hyperthreading. GPUs do
extreme hyperthreading to stay busy without any kind of out-of-order
execution.

~~~
deepnotderp
To be fair, they've addressed memory access issues, and have proposed
"deferred loads" (works well:
[http://people.duke.edu/~bcl15/documents/huang2016-nisc.pdf](http://people.duke.edu/~bcl15/documents/huang2016-nisc.pdf))
and other thus far unnamed techniques for mitigating it.

------
bch
I’m not a hardware expert, so I’m not sure how interesting this is here, but
along the lines of MS architectures and FOSS operating systems, see also their
work w eMIPS and NetBSD[0]

[0]
[https://blog.netbsd.org/tnf/entry/support_for_microsoft_emip...](https://blog.netbsd.org/tnf/entry/support_for_microsoft_emips_extensible)

------
andreiw
On a related note, another interesting CPU design -
[http://multiclet.com/index.php/en/support/general-
technical-...](http://multiclet.com/index.php/en/support/general-technical-
information)

------
solomatov
That's really good news. We need more progress in CPU design.

------
fulafel
This is written in a baby-talk style but scroll to the end, there are links to
research publications.

------
solomatov
This system seems to be the same as VLIW:
[https://en.wikipedia.org/wiki/Very_long_instruction_word](https://en.wikipedia.org/wiki/Very_long_instruction_word)

~~~
_chris_
VLIW compilers construct a "bundle" of instructions which are to be executed
simultaneously, and thus must be completely independent of each other. The
VLIW compiler must further space "bundles" apart such that any use-bubbles are
respected (i.e., a load won't be ready for 3 cycles, etc.).

EDGE compilers construct "blocks", each made up of many instructions which are
tightly connected, and send the block to an execution cluster which will
dynamically schedule each instruction within the block as it sees fit.

Communication of dependencies across blocks is more expensive. Hopefully there
is enough parallelism across blocks that you can execute multiple blocks
simultaneously, in addition to ILP within the block itself.

------
dmitrygr
Sounds a lot like...the Itanic

~~~
legulere
One big problem with the Itanic was that the compilers weren't (yet) smart
enough to produce good code for it. This does not seem that much of a problem
here, as it seems pretty close to how compilers keep immediate representation
anyway.

