
WTL (Wave Threshold Logic) Primer - throwaway000002
https://docs.google.com/viewer?url=http://nebula.wsimg.com/282bb4ac56b1e17a5b8ff819f03ade5a%3FAccessKeyId%3D63CF6890887317CC859D%26disposition%3D0
======
thesz
This is good old asynchronous logic with separate no-data state, renamed
(probably, for patent reasons).

~~~
throwaway000002
I did post about them earlier. [1]

I think there may be some novelty to the design of their "threshold gate" at
the silicon level to avoid metastability issues and thus enable their design
tool flow.

They seem to have since pivoted, though, to something akin to neuromorphic
computing popularized by the large asynchronous circuitry in IBM's TrueNorth
chip.

The work towards making deep learning power-efficient will be important, and I
think that's where they're heading.

There are area tradeoffs, but if you have efficiency gain, then you can take
advantage of "dark-silicon".

Asynchronous design's time may really indeed be coming. [2]

[1]
[https://news.ycombinator.com/item?id=10625233](https://news.ycombinator.com/item?id=10625233)

[2]
[https://news.ycombinator.com/item?id=11425533](https://news.ycombinator.com/item?id=11425533)

~~~
thesz
Having no-data symbol in the alphabet is a sure-fire way to have very stable
circuitry. I am rusty in the details (I did development a long time ago and
not returned for a while), but for O(width) AND-NOTs, two Miller C elements
and glitch-free synthesis you can get async logic without metastability
issues, just by design.

There is a Russian chip named NeuroMatrix:
[https://en.wikipedia.org/wiki/NeuroMatrix](https://en.wikipedia.org/wiki/NeuroMatrix)

It's main accelerator exploits the idea that deep neural networks need less
precision in layers closer to output. So it uses variable-width variable-count
multiply-add circuit within single 64x64 multiply-add circuit. At the start
there are, say, only two multiplications done, 32x64, but in the last layer
you can fire up to 64 multiplications.

Was this circuit built with async logic, it will be even faster, because in
clocked logic they have to control depth to get to clock frequency for worst
case (1x64x64) and can't gain anything except multiplication count for 64x1x1
case. In async setting, the use in the 64x1x1 would be blazingly fast - one
bit depth!

(I can be wrong here and there - right now I am editing VHDL grammar, adapting
it for ANTLR4, sorry; I hope you get the idea, though)

I don't know how DeepNorth operates. But I believe they use the same trick or
something close to it - it's cheap and impressive.

------
finfet1
haha... self timed domino .. published like 10-15 years ago already.

this won't work for many reasons that everyone in the industry already knows.
High fan-out back to many gates that are spatially distant means that you need
not only complex routing, but also complex sizing of the feedback inverter.
So.. this won't even fit in with automatic place and route of complex cells.
And good luck doing this custom in an efficient way such as making it a semi
standard cell library.

Furthermore, the feedback inverter only works in simple cases such as the
cascaded gates you show, when the switching of the one gate depends only on
one fan-out. If it depends on more than one fan-out, the feedback inverter
essentially requires another logic gate in front of it, which simply just blew
up your entire circuit size and any power savings you were attempting.

------
FullyFunctional
This is just NULL-Convention Logic, renamed so they could trademark it. I
mentioned NCL in a comment under the micropipelines
[https://news.ycombinator.com/item?id=11425533](https://news.ycombinator.com/item?id=11425533)

