
Asynchronous (Clockless) CPU - peter_d_sherman
https://en.wikipedia.org/wiki/Asynchronous_circuit#Asynchronous_CPU
======
childintime
Though fundamentally very different, in some cases clocking with a free-
running clock is an alternative. That clock has a frequency that represents
several gate-delays, and these delays in turn depend on the current core
voltage. The end result is a clock close to optimal given the voltage applied
to the core, and that even adjusts the clock within a single SMPS charging
cycle.

A RISC-V prototype achieved almost 40% power savings:
[https://people.eecs.berkeley.edu/~bora/Journals/2017/JSSC17-...](https://people.eecs.berkeley.edu/~bora/Journals/2017/JSSC17-1.pdf)

------
ofrzeta
I have read a lot about async designs recently and most of the research seems
to have drained around 2010.

There doesn't seem to be a consensus on how much power you can actually save
with an async CPU. It's said that clock distribution on modern CPUs/boards(?)
amounts to around 30 or more percent of the overall power consumption but on
the other hand the savings do not necessarily amount to that much.

From a Technical Review article on clockless chips: The Intel "clockless
prototype in 1997 ran three times faster than the conventional-chip
equivalent, on half the power." Apparently economically that didn't make sense
for Intel because you'd have to re-create virtually a whole industry that is
based on clocked chips.

Another Intel scientist (unfortunately I can't re-find that source) later said
that the power savings of async CPUs aren't as high as claimed by their
proponents.

Interestingly Intel Chief Scientist Narayan Srinivasa left the company to be a
CTO at Eta Computing who develop an asynchronous ARM Cortex M3
microcontroller.

~~~
ofrzeta
Adding to my own post

> ... Apparently economically that didn't make sense for Intel because you'd
> have to re-create virtually a whole industry that is based on clocked chips.

Also you'd have to invent a sales/marketing scheme as an alternative to the
existing one that is based on increasing clock rates. GHz is to the PC what HP
(horsepower) is to the car. That might come to an end obviously but now at
least we have cores.

~~~
wolfgke
> Also you'd have to invent a sales/marketing scheme as an alternative to the
> existing one that is based on increasing clock rates. GHz is to the PC what
> HP (horsepower) is to the car.

The GHz race has been over for a long time. Since Intel Core (and AMD Zen, I
think; at least AMD Bulldozer had in my opinion a different design
philosophy), it is all about smarter cores that do more in less clock steps.
Also since AMD Zen, the "number of core race" has regained traction. Finally,
in particular Intel tries to promote extra-wide SIMD instructions (AVX-512).

~~~
ofrzeta
Just look at how they are promoting the new i9 (!) with 8 cores and 5 GHz :)
[https://thenextweb.com/plugged/2018/10/08/intels-9th-gen-
pro...](https://thenextweb.com/plugged/2018/10/08/intels-9th-gen-processors-
bring-8-cores-and-a-5-ghz-i9-chip/)

------
xenadu02
Race conditions: now in hardware at the gate level!

A few questions

1\. Could you call current SOCs asynchronous since they not only clock
different blocks at different rates, but internally within a block subsections
run at various rates?

2\. Does variable clock rate deliver many of the benefits of async without the
complexity? In other words how much more blood is there to squeeze from the
async stone in the current world?

I doubt we'll see a competitive async chip anytime soon, but as CPUs continue
to evolve perhaps we'll see the functional blocks broken up into smaller and
smaller clock domains until it becomes difficult to tell the difference?

~~~
staticfloat
1\. No, asynchronous means something different than multiple clocks. Think of
it like the difference between polling based programming and using coroutines;
with multiple clock domains you have separate sections of your chip performing
tasks at predefined instants in time (when your clock signal rises/when your
polling loop swings around again) but with a truly asynchronous design, you
simply start processing the next chunk of work when the previous chunk is
finished (when the previous chunk of logic drives a signal high/when the
previous coroutine finishes and control flow resumes in your coroutine).

2\. It does deliver some benefits, but not all. Truly clockless design is
desirable in some cases due to power concerns; for example the Novelda Xethru
ultra-wideband radar SoCs are actually clockless, because power distribution
networks can account for 20%+ of the power consumed in chips like this. (This
is what I've heard, I don't have a citation for this. The paper I quote below
similarly handwaves and throws around numbers from 26% all the way up to 40%,
but they don't do any analysis of their own on this)

I've never used a clockless CPU design before, but the theoretical advantages
are listed out quite nicely in this paper [0], which lists (among other
things) the natural ability for the CPU to sit at idle (not executing `NOP`
instructions, actually idle) when no work is available. It appears that the
AMULET 3 processor (which is compared against an ARM 9 core) is competitive in
power consumption, but doesn't quite stand up in performance. While still
pretty impressive for a research project, this shows that we do still have
quite a bit of work to do before these chips are ruling the world (if, indeed,
we can scale up our tools to the point that designing these isn't just an
exercise in frustration).

[0]:
[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.83....](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.83.7655&rep=rep1&type=pdf)

~~~
ofrzeta
> the natural ability for the CPU to sit at idle (not executing `NOP`
> instructions, actually idle) when no work is available.

That just makes so much sense. Just think about how much power could be saved
with all the computing devices that are idle pretty much of the time.

------
deepnotderp
As someone who is working on an asynchronous process myself, I should remind
you that asynchronous is more a design level choice rather than a magic bullet
that makes everything better.

In particular, in CPUs with a big centralized register file there can be
significant overhead to having an asynchronous CPU.

There are certain architectural cases in which it can be a killer advantage or
other cases in which it is pretty much the only way forward (e.g. in a 3D chip
it can be quite difficult to distribute a high speed, high quality/low
skew+jitter clock)

~~~
Dylan16807
> (e.g. in a 3D chip it can be quite difficult to distribute a high speed,
> high quality/low skew+jitter clock)

How thick is "3D" here, and why is that? What makes a bit of vertical distance
harder than several mm of horizontal distance?

~~~
insonifi
I believe troubles are caused by induction effect from adjacent layer (similar
to self-inductance[0]).

[0] - [https://en.wikipedia.org/wiki/Inductance#Self-
inductance_of_...](https://en.wikipedia.org/wiki/Inductance#Self-
inductance_of_a_wire_loop)

~~~
simcop2387
Not just inductance, but capacitance between the layers themselves too since
you've got to have an insulator between them. All the fun things you get in 2d
from nanometer scale devices are now compounded in another degree of freedom.
The only thing I don't think you have to work around is quantum tunneling
between layers because they'll probably be too far apart still to need that
level of work.

------
oregontechninja
I've been studying the Cal-tech and seaforth processors lately and they've
actually inspired me to go back to school so I might one day take part in an
asynchronous design.

Does anybody recommend any readings besides papers on the above?

As far as I can tell most engineers view asynchronous processors as arcane
equipment only meant for the most specialized tasks.

~~~
person_of_color
Where are you doing an MS in CompA?

~~~
oregontechninja
I actually don't even have a BS yet. I dropped out after doing a cost benefit
analysis and learned I could get tech jobs without a degree. It's been a bumpy
road to say the least haha. I'm currently unemployed and looking for a gig/job
before I go back to get my degree. I plan on attending OSU since it's near by.

------
tomxor
I hate to be the one to ask this because a clock-less CPU sounds like such a
neat idea... but wouldn't it also open up a whole other world of timing
attacks? (I'm very happy to be enlightened as to how it would not).

~~~
Sephr
On the contrary, it should close off an entire world of power analysis timing
attacks.

There probably are new internal timing attacks that could be exposed through
some asynchronous CPU designs.

------
gambler
No mention of Ivan Sutherland and his Fleet Architecture?

[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.167...](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.167.7784&rep=rep1&type=pdf)

~~~
sparkie
Some more recent publications here:
[http://arc.cecs.pdx.edu/publications](http://arc.cecs.pdx.edu/publications)

------
jhallenworld
Modern synchronous techniques can already be low overhead, take a look at
"slew tolerant" circuits where you can have not flip-flop setup/hold delays in
the circuit path:

[http://pages.hmc.edu/harris/class/e158/01/lect21.pdf](http://pages.hmc.edu/harris/class/e158/01/lect21.pdf)

Also, big CPUs are power limited anyway. I mean "speed step" allows one core
to run fast, as long as the others are unloaded.

~~~
deepnotderp
Skew tolerant domino needs a multi phase clock and domino logic as you know,
can be quite the power hog.

------
nivertech
The future is running event-triggered lambdas using async system calls in
tickless kernels on clockless CPUs.

Q. would tickless kernels benefit from running on async CPUs?

~~~
agumonkey
Using landauer capable arch.

[https://en.wikipedia.org/wiki/Landauer%27s_principle](https://en.wikipedia.org/wiki/Landauer%27s_principle)

------
Tempest1981
What’s involved in creating simulation tools for async? Is it straightforward,
or in need of research?

~~~
ofrzeta
I can't answer your question but here's a an open source async synthesis
system with a simulator:
[http://apt.cs.manchester.ac.uk/projects/tools/balsa/](http://apt.cs.manchester.ac.uk/projects/tools/balsa/)

Last update is from 2010 so I guess it could use some research :)

------
motiw
Other the last 20 years, I have seen multiple attempts to commercially take
advantage of clockless logic, they all disappeared.

------
zymhan
I don't understand how you buffer the output from stage X if stage Y is
running more slowly.

~~~
wmf
Stage X probably just stops until stage Y is ready; that's why there needs to
be an acknowledge signal that propagates backwards through the pipeline.

------
deebeeoh
To confirm something already stated: back in the day in grad school (90s), my
understanding was that clocked circuits were just too far ahead in tooling, so
clockless could never catch up in the big, complex domains where it would make
a difference.

One of my proofs likenee it to (at the time) ai, where it held out such
promise, got tons of hype, then reliably would totally disgrace itself every
ten years...

