
In-Depth View of Wave Computing’s DPU Architecture, Systems - rbanffy
https://www.nextplatform.com/2017/08/23/first-depth-view-wave-computings-dpu-architecture-systems/
======
deepnotderp
Fairly certain a true dataflow machine doesn't have 50% the amount of
instruction RAM as it has data RAM.

Also putting all the "tensors" (I assume this means activations and weights?)
into the external memory is downright dumb.

Only having 16MB of memory on die is also probably not a wise decision.

The use of GALS is certainly interesting though.

~~~
feelix
Are you qualified to make these criticisms?

~~~
deepnotderp
Absolutely not, I'm very biased (see my profile) :)

I do think the criticisms are valid regardless though. Data movement is far
more expensive than computation, so having so little on-die memory is almost
definitely a bad decision.

------
zackmorris
Does anyone know how general-purpose the cores are on this and other parallel
architectures? As I recall, the first video cards couldn't do things like
branching, so it wasn't possible to program loops or flow control. I'm
concerned that if these new processors can only run niche frameworks like
OpenCL or CUDA that it will create friction slowing the adoption of concurrent
programming in languages like MATLAB/Octave/Erlang/Elixer/Julia/Go etc.

There seems to be an opening in the market for truly general purpose
processors having say > 16 or 32 cores. So it would be nice to have a curated
list somewhere of multipurpose chips.

~~~
deepnotderp
Processors like these do not even run OpenCL or CUDA, they execute
computational graphs devoid of conventional control flow.

The Adapteva Epiphany or the Rex Neo is probably closer to what you're looking
for. However, once you ask for anything that handles control flow well, you
pretty much end back up at the general purpose CPU, which is highly
inefficient for a reason. For example, even your GPU today will become very
inefficient with control flow due to branch divergence.

------
FullyFunctional
The article is [only] a write up of the Hot Chips presentation (I was there).
Without a an external validation of the claims it's merely pretty pictures.
_Iff_ their claims holds true, then there's some interesting elements here.

This is far from the only data-flow architecture out there (my favorite is the
TRIPS instance of EDGE), but so far none have succeeded in the market.

The self-timed part is using NULL Convention Logic (NCL). Wave have dubbed
their implementation of this WTL (see
[https://news.ycombinator.com/item?id=11469749](https://news.ycombinator.com/item?id=11469749)).
There are some very interesting questions about exactly how they use this, but
it sounds like the synchronization between compute nodes is using NCL, but the
compute nodes are conventional synchronous logic, clocked by the completion
signal from NCL. That would certainly be an interesting hybrid, but the 6.7
GHz results begs a lot of questions.

------
k_lander
is this similar to what graphcore.ai is attempting?

