
Intel’s Exascale Dataflow Engine Drops X86 and Von Neumann - ssvss
https://www.nextplatform.com/2018/08/30/intels-exascale-dataflow-engine-drops-x86-and-von-neuman/amp/?__twitter_impression=true
======
shriver
> a new architecture that could in one fell swoop kill off the general purpose
> processor as a concept and the X86 instruction set as the foundation of
> modern computing.

Do you want me to think you're a credulous idiot? Because this is how you
acheive that.

Okay, so laying aside the bizarrely stereotypical tech journalism. From what I
understand there are a number of problems with this that need addressing:

If you create a custom compute unit layout for a specific data flow diagram
it's very difficult to identify which layout is most efficient, and then when
you want to optimize for higher performance it's almost impossible - because
you don't know what you're targetting. It may be your optimization pushes your
design to a different layout completely and all the cost functions are
impossible to know. You end up with too many free variables to optimize for.
We're very good at taking a fixed design like a CPU and then taking a program
and jamming it in to that paradigm.

The second problem is that either you need 1 architecture that will
dynamically reconfigure to different graphs or you needs lots of
architectures. They seem to be going for the 'Spin 100 designs' path -so
firstly, how is a customer meant to know which of those designs to actually
buy, what happens if their design evolves from 1 design to another? Secondly,
how is this cost effective? There's a good reason why Intel only spins a
handful of designs per CPU generation.

The third problem is that if you have a custom compute unit layout and your
program doesn't fit to it well it's not like a CPU. You can't re-order
operations to maximally use the units, the bits that aren't useful are just
dead silicon - and from history it seems like the killer is that dead silicon
tends to be a LOT of silicon for any given program.

To be honest, this is a very well understood problem, and there are good
reasons why it hasn't worked so far, and this article doesn't really give us
any information on why it would work this time.

~~~
saas_co_de
> it's very difficult to identify which layout is most efficient

Don't worry. The compiler will figure it out. And this time the compiler will
have AI™.

> the killer is that dead silicon tends to be a LOT of silicon for any given
> program

Part of the idea here is that the cost dynamic has changed. Silicon is cheap
compared to power so even if you have lots of chips not being used at any
given time as long as they can be fully powered off total system cost
(capex+opex) is still better.

> why it would work this time

The difference is scale. If you are running millions of CPUs and adding 100k's
per month then something like this could work, assuming the AI™ magic that
figures out which new chips to build.

Intel is talking up their grandiose vision but practically this is the same as
AMD's chiplets on active interposers ([https://spectrum.ieee.org/tech-
talk/semiconductors/design/am...](https://spectrum.ieee.org/tech-
talk/semiconductors/design/amd-tackles-coming-chiplet-revolution-with-new-
chip-network-scheme)).

The physical technology is real and probably coming soon but it will be
limited to incremental improvements to the existing CPU/GPU compute
architecture from increasing bandwidth and decreasing latency until the magic
compilers arrive.

~~~
shriver
If I'm understanding you right, what you're suggesting is that the individual
silicon modules will be fixed, but they will be contected in a single package
in lot's of different ways.

If that's correct I'd love to see the cost of making the trip between the
modules. I've got to imagine the cost of that is just huge. The interconnect
is also crazy - it's easy to do complex routing tasks on silicon at high
performance. I don't know how you acheive that in a scalable fashion between
silicon.

