
Computers without clocks (2002) [pdf] - dluc
http://www.cs.virginia.edu/~robins/Computing_Without_Clocks.pdf
======
nbingham
Heyo, I'm a PhD student in the field. I figured I can talk about its current
status.

First, here are various search terms: clockless, self-timed, delay-
insensitive, latency-insensitive, quasi delay-insensitive (QDI), speed
independent, asynchronous, bundled-data

There are a wide variety of clockless circuits that each make their own timing
assumptions. QDI is the most paranoid, making the fewest timing assumptions.
Bundled-data is the least paranoid (its effectively clock-gating).

A clockless pipeline is always going to be slower than a clocked one and
requires about 2x the area. However, clockless logic is way more flexible,
letting you avoid unnecessary computation. Overall, this can mean
significantly higher throughput and lower energy, but getting those benefits
requires _very_ careful design and completely different computer
architectures.

Most of the VLSI industry is woefully uneducated in clockless circuit design
and the tools are terribly lacking. I've seen many projects go by that make a
synchronous architecture clockless, and they have always resulted in worse
performance.

What this means is that it would take billions of dollars for current VLSI
companies to retool, and doing so would only give them a one-time benefit. So,
you probably won't see clockless processors from any of the big-name companies
any time soon. What they seem to be doing right now is buying asynchronous
start-ups and shutting them down.

As of the 90nm technology node, its not possible to be switching all of the
transistors on chip without lighting a fire. This mean that the 2x area
requirement is not much of a problem since a well-designed clockless circuit
only needs to switch 25-50% of them at any given time. Also since 90nm,
switching frequencies seem to have plateaued with a max of around 10 GHz and
typical at around 3 GHz. When minimally sized, simple clockless pipelines
(WCHB) can get at most 4 GHz and more complex logic tends to get around 2 GHz
(for 28nm technology). Leakage current has become more of a problem, but it's
a problem for everyone.

There is a horribly dense wikipedia page on QDI, but it has links to a bunch
of other resources if you are curious.

~~~
avmich
> A clockless pipeline is always going to be slower than a clocked one

How come? In a clocked design, you have to have clocks slow enough so all
possible logic paths would finish. In a clockless one the propagation only
takes as much as needed, and in a case of shorter path can take less time,
doesn't it?

~~~
nbingham
Synchronous design tools are very good at making all of the pipeline stages
have about the same logic depth, which is generally 6-8 transitions/cycle but
can be much less. The fastest possible QDI circuit is a very simple, very
small WCHB buffer which has 6 transitions/cycle. Most QDI logic will have
10-14 transitions/cycle.

Also, the speed of a linear pipeline is limited to the slowest stage in the
pipeline whether or not you use clockless. Clockless only helps pipeline speed
when you have a complex network.

~~~
deepnotderp
I don't think it's really fair to condemn all of asynchronous due to the
slowness of QDI. There are faster ways of doing things like GaSP, dual rail
domino done detection, bundled data, one sided handshaking, etc.

~~~
nbingham
I think there's been a misunderstanding.

You're right, and I don't intend to condemn all of async, or even QDI for that
matter :) I am doing my PhD on it, so I do think there is promise. I just
think that arithmetic is better handled by Bundled-data specifically. Let QDI
do the control leg-work and tack high-performance arithmetic to it.

Also, Gasp is certainly faster, but is limited to simple pipelines. That's why
I like QDI, it lets me make weird circuits.

EDIT: Sorry, I got mixed up between the conversation threads... dislexia is a
thing.

I'm not saying condemn async or QDI, but we must recognize what it is good at
and what it is not. A QDI pipeline stage may be slower, yes. So don't use it
if you just want to implement a linear pipeline. But do use it if you have a
complex network because of the previously mentioned benefits. Gasp and other
async pipeline topologies don't have the flexibility of QDI, and there isn't
really a good framework to mix them with QDI techniques at the moment (maybe
relative timing?). The power of async comes from this flexibility and the
ability to avoid unnecessary computation.

------
lizknope
I've been designing semiconductors for 23 years. This article is 17 years old
and I haven't seen any clockless designs in my professional experience in all
of that time.

About 1/3 to 1/2 the power usage is leakage and I don't see how a clockless
design will help that. We dynamically lower the voltage and have power islands
for unused or lesser used portions.

nbingham makes a great point about the tools. We have invested billions of
dollars in tool flows. We are not going to throw that away until we see some
proof that clockless designs are better in some measurable ways.

~~~
nbingham
> This article is 17 years old and I haven't seen any clockless designs in my
> professional experience in all of that time.

Yeah, async design takes a while, and async chips don't tend to be well
advertised, but they are there.

Async FPGA has 60% less power, 70% increased throughput
[http://csl.yale.edu/~rajit/ps/fpga2p.pdf](http://csl.yale.edu/~rajit/ps/fpga2p.pdf)

High speed routing (from Fulcrum, one of the startups bought by Intel and shut
down) [https://www.hotchips.org/wp-
content/uploads/hc_archives/hc15...](https://www.hotchips.org/wp-
content/uploads/hc_archives/hc15/3_Tue/2.fulcrum.pdf)

Ultra low power processor
[https://ieeexplore.ieee.org/abstract/document/1402056/](https://ieeexplore.ieee.org/abstract/document/1402056/)

Ultra low power neural network accelerator from IBM
[https://www-03.ibm.com/press/us/en/pressrelease/44529.wss](https://www-03.ibm.com/press/us/en/pressrelease/44529.wss)

------
snazz
The GA144, a 144-processor stack machine that’s insanely energy efficient, is
clockless:
[http://www.greenarraychips.com/home/documents/index.html#arc...](http://www.greenarraychips.com/home/documents/index.html#architecture)

I haven’t seen any legitimate uses for it yet, but it’s very cool.

~~~
tachyonbeam
I watched a presentation on this chip at Strange Loop 2013. Unfortunately, the
whole presentation was a sequence of "look at this cool cryptic programming
trick I can do on this chip" by Chuck Moore, without a single legitimate use
case outlined. It made the design look needlessly complicated to use and
really didn't answer the question of why the chip is actually useful or in any
way better than existing low-power microcontrollers. That says more about
Chuck Moore's lack of marketing abilities than about clockless designs,
however.

Strange Loop 2013 presentation:
[https://www.infoq.com/presentations/power-144-chip](https://www.infoq.com/presentations/power-144-chip)

------
EdwardCoffin
Previous discussion from almost exactly three years ago [1]. From my comment
that time [2] here is the link to archive.org's copy of Sutherland's FLEET
project [3]

[1]
[https://news.ycombinator.com/item?id=11425533](https://news.ycombinator.com/item?id=11425533)

[2]
[https://news.ycombinator.com/item?id=11426825](https://news.ycombinator.com/item?id=11426825)

[3]
[https://web.archive.org/web/20120227072220/http://fleet.cs.b...](https://web.archive.org/web/20120227072220/http://fleet.cs.berkeley.edu/)

~~~
dang
There's also
[https://news.ycombinator.com/item?id=11995966](https://news.ycombinator.com/item?id=11995966).

