
Computers Without Clocks – Ivan Sutherland (2002) [pdf] - dang
http://www.cs.virginia.edu/~robins/Computing_Without_Clocks.pdf
======
trollied
Steve Furber (creator of the BBC Micro, and co-designer of the ARM CPU) headed
up a team at Manchester University that designed an asynchronous version of
the ARM CPU, called AMULET.

Details:
[http://apt.cs.manchester.ac.uk/projects/processors/amulet/AM...](http://apt.cs.manchester.ac.uk/projects/processors/amulet/AMULET3_uP.php)
[https://en.wikipedia.org/wiki/AMULET_microprocessor](https://en.wikipedia.org/wiki/AMULET_microprocessor)

------
vvanders
"First, asynchrony may speed up computers. In a synchronous chip, the clock’s
rhythm must be slow enough to accommodate the slowest action in the chip’s
circuits. If it takes a billionth of a second for one circuit to complete its
operation, the chip cannot run faster than one gigahertz."

Haven't read the whole article yet but pipelining was made specifically to
address this exact problem.

Also synchronous circuits have a nice property of dealing with metastability.
Merging different clock domains is a nightmare and I would love to know how
they plan on solving similar issues.

~~~
vmarsy
Could you give a bit more information to non EE experts like me:

\- What do you mean by pipeline ?

I try to make a analogy with the instruction pipelining, which can increase
your throughput but it's not fixing the issue that your CPU has a fixed clock
rate.

\- What is metastability?

\- Why are you mentioning merging clocks as each asynchronous circuit is
clock-less ?

EDIT: Thanks for all the replies!

~~~
trsohmers
It takes a certain amount of time for a signal to propagate through a series
of logic gates (or other electronic components) within a chip, which are also
dependent on many other factors. In most synchronous chip design, you look at
the worst (slowest) timing case for the design, and constrain your clock speed
to that.

You can break up critical (the longest/slowest) paths of a design through
pipelining, which can be done manually, or through nice automated techniques
like register retiming. Basically, you can add flops (as in D-flip flops, also
known as registers) between sections of the design that can be broken into
independent pipelined components.

Example:

Say you have a design that takes 10ns from start to end flops. This means the
max clock speed for that component is 100MHz. If you are clever, you may be
able to dice that up into 10 separate components, which are pipelined, meaning
that while there is a 10 cycle startup latency, if you have continuous
throughput you can run the design at up to 1GHz. Even better is that nowadays,
synthesis tools can do automatic pipelining through something called register
retiming. Without doing any work, you can tell the synthesis tool what clock
speed you want to run at (or how many cycles you want in your pipeline), and
it is able to automagically insert flops to decrease timing for the overall
design.

------
Dylan16807
[http://www.greenarraychips.com/](http://www.greenarraychips.com/) these are
an interesting example of clockless chips. Not sure how they work.

[https://users.soe.ucsc.edu/~scott/papers/NCL2.pdf](https://users.soe.ucsc.edu/~scott/papers/NCL2.pdf)
This sort of circuit appeals to me a lot. Multiple rail encoding, where every
single gate has a hysteresis threshold before it can change its output.
Pipeline stages start out dark, and gates light up as data flows in. There are
no inverters inside a stage; gates only go from low to high. Once stage N+1 is
done calculating, an inverted ack signal cuts off the input to stage N and it
goes dark again.

~~~
FullyFunctional
A better link might be
[http://www.theseusresearch.com/NullConventionLogic.htm](http://www.theseusresearch.com/NullConventionLogic.htm)

Sutherlands micropipeline and most (all?) the other clock-less approaches, are
fundamentally racy and depends on a difficult timing analysis to determine
that the latch is slow enough. What makes NCL so interesting IOM is that it is
guaranteed to work timing-wise by construction. This also means that it is
tolerant to changes in logic time, which means it circuits can tolerate a
wider range of voltage swings (= can save power). (The gate construction has
to satisfy a trivial timing requirement, but it's local to the gate, not the
complete circuit).

The obvious drawback of NCL is that it uses quite a few more transistors than
the equivalent circuit in traditional clocked implementation and tooling is
weak or non-existing.

Karl and his student Matthew presented "Aristotle – A Logically Determined
(Clockless) RISC-V RV32I" at the 2nd RISC-V workshop. Slides & Video:
[http://riscv.org/2015/07/2nd-risc-v-workshop/](http://riscv.org/2015/07/2nd-
risc-v-workshop/) I'm not sure of the status of that.

------
EdwardCoffin
They used to have a website talking about the FLEET architecture, but it seems
to have been taken down. Here it is on archive.org:
[https://web.archive.org/web/20120227072220/http://fleet.cs.b...](https://web.archive.org/web/20120227072220/http://fleet.cs.berkeley.edu/)

Edit: the page cited above has these links, but I should explicitly call the
slides they call the best introduction to Fleet [1], and a page full of memos
[2]

[1]
[https://web.archive.org/web/20120227072220/http://fleet.cs.b...](https://web.archive.org/web/20120227072220/http://fleet.cs.berkeley.edu/docs/slides.pdf)

[2]
[https://web.archive.org/web/20120227072220/http://fleet.cs.b...](https://web.archive.org/web/20120227072220/http://fleet.cs.berkeley.edu/docs)

~~~
ChuckMcM
FWIW, as I recall this was FLEET's fatal flaw (part of the communication
discussion);

* This can cause deadlock

* Programmer must keep input dock fifos from overflowing

Sun did a lot of work with async logic in the SPARC 10, it was written up in
IEEE Spectrum I believe, and one of the things that always is a problem are
that fabrics without flow control (back pressure or emission control) are
subject to failure at the worst possible time.

------
marcosdumay
The one question I have about asynchronous chips since I studied them at my
undergrad is: How does one sell them?

Selling a clocked processor is easy. One tests for a finite set of clock
speeds, and marks by the fastest one that works. People buy the chip, and run
at the tagged clock getting a predictable performance.

Now, make it a batch of asynchronous processors. Each chip you make will have
a different performance - one will add some floats faster, another will fetch
run faster (but only if the second bit of the address is set), while a third
one will shine on integer addition, but completely suck at subtraction (due to
a problem in a single transistor).

How does one tag those chips?

~~~
jamesbowman
I have an asynchronous CPU cluster on my desk right now (a pair of GA144s).
The performance spread isn't actually very dramatic; just a few percent. After
all, the foundries aim for consistency so that synchronous devices get good
yields.

~~~
marcosdumay
You can have either consistency or high performance, not both.

Your batch is consistent because the foundry you brought from isn't pushing
the envelope for performance. The latest Intel or AMD chips don't have this
level of consistency.

------
nocarrier
I've always been fascinated by async circuits but don't know how state of the
art has progressed since the early 2000s. Would any EEs be willing to comment?

~~~
Adutchperson
I'm not an EE, but my dad was the co-author on this paper, if you have a
couple of questions for him, I could pass them along if you'd like.

~~~
ontouchstart
I am curious about how it turned out after 13+ years. Any serious road blocks
in theory or practice?

~~~
jerf
There are a number of technologies that just aren't worth pursuing until the
"normal progress" slows down. Transmeta, for instance, arguably died because
while they produced a superior chip, by the time they could ship it they were
basically tied with what Intel was putting out anyhow.

Asynchronous chips is an example of the sort of thing I expect to start
hearing about again when we run out of die shrinks. Which we're getting pretty
close to, probably. (Another example is "active RAM" where the RAM sticks can
do some sort of computation. Also something like the greenarray chips [1]...
while they're trying to compete with normal growth it's hard for a tiny
company to get traction.)

[1]: [http://www.greenarraychips.com/](http://www.greenarraychips.com/)

~~~
petra
>> There are a number of technologies that just aren't worth pursuing until
the "normal progress" slows down.

I don't know. The field of low-power micro-controllers doesn't really benefit
much from scaling, since sleep current increase when decreasing transistor
size. And they are relatively simple circuits(with low-cost development) but
still a huge market, so it's an ideal place to try a new development
methodology.

And yes, some have tried, but it's not being used today, so it probably
failed.

~~~
sliverstorm
The tools are a big obstacle. The industry is built around synchronous design.
How are you going to time your circuit? Verify it? Etc.

It's a really big chunk of work to bite off, even with a "little"
microcontroller.

We might see it one day, but as best I can tell things like sleep states are
still a big focus, as they can save orders of magnitude power, instead of a
few percent.

------
pmero_44
A great 20min talk from Rajit Manohar from Cornell, talking about self-timed
(or asynchronous circuits) and their use in neuromorphic chips.

[https://www.youtube.com/watch?v=AVrJRPL-e0g](https://www.youtube.com/watch?v=AVrJRPL-e0g)

Async is a perfect design style for these kinds of event driven chips, since
you really don't need to run a fast clock if most of the time the circuits
aren't computing anything ...

------
blt
Oh man, this article was one of the first things I ever read about computer
architecture when I was about 14. I had no idea Ivan Sutherland was the author
until just now. It really stuck with me - I recall the bucket brigade
illustration quite vividly whenever I think about asynchronous CPUs.

------
digi_owl
It intrigues me how similar the internals if a cpu and the workings of a
network is.

------
deadgrey19
See also:
[https://users.soe.ucsc.edu/~scott/papers/NCL2.pdf](https://users.soe.ucsc.edu/~scott/papers/NCL2.pdf)

------
jhallenworld
I thought most of the benefit can also come from skew tolerant circuit design:

[http://www.cerc.utexas.edu/~jaa/vlsi/lectures/23-2.pdf](http://www.cerc.utexas.edu/~jaa/vlsi/lectures/23-2.pdf)

For example, in a pipeline "allowing a slow stage to “borrow” from the time
normally allocated to a faster stage"

------
akuma73
Intel used an asynchronous technique in their Pentium-4 processors. You may
recall that the internal core ALUs ran at 2x the frequency of the rest of the
chip. This was done with self timed domino circuits.

These are notoriously difficult to get working.

------
ontouchstart
> The technological trend is inevitable: in the coming decades, asynchronous
> design will become prevalent.

I wonder if this statement is right on time or still decades away.

~~~
marcosdumay
Since it was stated just above a decade ago, the complete lack of change we
had until today does not invalidate it yet.

------
dang
I got this from
[https://news.ycombinator.com/item?id=10328784](https://news.ycombinator.com/item?id=10328784).
Thanks vmorgulis!

If you haven't heard one of Alan Kay's many explanations of Sutherland's
seminal Sketchpad work ("a Newton-like leap"), here's a wonderful one:
[https://www.youtube.com/watch?v=TY-
hBgYLJqc#t=46m30s](https://www.youtube.com/watch?v=TY-hBgYLJqc#t=46m30s). Note
the reference to Wes Clark, the pioneering system designer who died recently
([https://news.ycombinator.com/item?id=11183970](https://news.ycombinator.com/item?id=11183970)).
Clark liked Sutherland and gave him computer time in the middle of the night,
which is how the Newton-like leap came to be.

~~~
agumonkey
There's way too much Kay content online nowadays. Thanks for the tip. It's
cool to see him rant about the forgotten wonders on stage, it's a different
thing to see him look around like a kid when describing sketchpad 'face to
face'.

~~~
dang
I got to meet him last week and couldn't resist gushing about how much I've
learned from him. He seemed embarrassed. I couldn't help it—there's no one
who's influenced me more in computing. He's agreed to do an AMA on HN, so
hopefully we can set that up soon.

If you get beyond its terrible sound quality, that YouTube video has many
stretches of Alan riffing that are pure gold. He embodies the history of our
field and the values of the classic ARPA community culture. Much of that
precious stuff is encoded in oral culture that we don't have a good way of
continuing. I wish we could find a way for HN to facilitate that. It already
does, to a small extent. But we need more than just to capture it as history,
we need to carry it on, and I don't see that happening.

------
mozumder
Wave pipelining is also a technique you could use to run critical datapath
circuits synchronously without using clocks. It saves space as well, but
eliminating pipeline flip-flops.

~~~
Taniwha
it's very VERY difficult to do this over a reasonable range of process
corners, and limits you to a single carefully chosen clock speed

