A RISC-V prototype achieved almost 40% power savings: https://people.eecs.berkeley.edu/~bora/Journals/2017/JSSC17-...
There doesn't seem to be a consensus on how much power you can actually save with an async CPU. It's said that clock distribution on modern CPUs/boards(?) amounts to around 30 or more percent of the overall power consumption but on the other hand the savings do not necessarily amount to that much.
From a Technical Review article on clockless chips: The Intel "clockless prototype in 1997 ran three times faster than the conventional-chip equivalent, on half the power." Apparently economically that didn't make sense for Intel because you'd have to re-create virtually a whole industry that is based on clocked chips.
Another Intel scientist (unfortunately I can't re-find that source) later said that the power savings of async CPUs aren't as high as claimed by their proponents.
Interestingly Intel Chief Scientist Narayan Srinivasa left the company to be a CTO at Eta Computing who develop an asynchronous ARM Cortex M3 microcontroller.
To quote: "Up to 96 billion operations per second" and "instanteous power ranges from 14 microwatts to 650 milliwatts"
Let's say there will be ten times less IPS (addition has O(log(WordSize)) delay in average case, for GA144 word size is less than 32, thus 10 times the reduction; fastest operation for 2-in-1 self-sync encoding is inverse which is just lines swap and does not incur any computation whatsoever) and at max power. It would be 650mW lasting for 1/9.610^9 seconds, or 6.810^-11 Joules or 68pJ (picojoules).
https://www.ics.forth.gr/carv/greenvm/files/tr450.pdf - page 28 lists power consumption of different operations. Simple integer includes addition and 32-bit variant is in range 50-to-80pJ. If we take average, it will be 65pJ which is very close to my worst-case analysis for GA144.
This means, in my opinion, that self-synchronous CPU exemplified by GA144 has efficiency at least that of synchronous ARM CPU, with very efficient sleep mode entry/exit.
Essentially it is a cluster of special purpose FORTH CPUs.
Because of all modern chips that made with any consideration for power saving use clock and power gating
In its current implemention, it does. Consider that this is Intel's first iteration of AVX-512 and don't forget that in its first iteration, also AVX was plagued by performance problems. These first iteration's main purpose is that "ordinary developers" (i.e. not only highly selected specialists that have to sign lots of Intel NDAs) can begin to develop and experiment with these new extensions. I believe that Intel has grand plans for AVX-512 and its next iteration will be the one that aims that "ordinary users" can profit performance-wise from applications that use AVX-512 instructions.
> ... Apparently economically that didn't make sense for Intel because you'd have to re-create virtually a whole industry that is based on clocked chips.
Also you'd have to invent a sales/marketing scheme as an alternative to the existing one that is based on increasing clock rates. GHz is to the PC what HP (horsepower) is to the car. That might come to an end obviously but now at least we have cores.
The GHz race has been over for a long time. Since Intel Core (and AMD Zen, I think; at least AMD Bulldozer had in my opinion a different design philosophy), it is all about smarter cores that do more in less clock steps. Also since AMD Zen, the "number of core race" has regained traction. Finally, in particular Intel tries to promote extra-wide SIMD instructions (AVX-512).
"One of the biggest claims to fame for asynchronous logic is that it consumes less power due to the absence of a clock. Clock power is responsible for almost half the power of a chip in a modern design such as a high-performance microprocessor. If you get rid of the clock, then you save almost half the power. This argument might sound reasonable at a glance, but is flawed. If the same logic was asynchronous, then you have to create handshake signals, such as request and acknowledgment signals that propagate forward and backwards from the logic evaluation flow. These signals now become performance-critical, have higher capacitive load and have the same activity as the logic. Therefore, the power saving that you get by eliminating the clock signal gets offset by the power consumption in the handshake signals and the associated logic."
A few questions
1. Could you call current SOCs asynchronous since they not only clock different blocks at different rates, but internally within a block subsections run at various rates?
2. Does variable clock rate deliver many of the benefits of async without the complexity? In other words how much more blood is there to squeeze from the async stone in the current world?
I doubt we'll see a competitive async chip anytime soon, but as CPUs continue to evolve perhaps we'll see the functional blocks broken up into smaller and smaller clock domains until it becomes difficult to tell the difference?
It is the queues between clock domains that prevents splitting sync design into smaller and smaller clock domains.
Async design, on the other hand, adds single-clock "queues" between computation stages and that queues are clocked by the completeness of operation, not by some fixed clock rate whatever variable it is.
And here some propaganda (which translates as "explanation", BTW). Ripple-carry adder in asynchronous design has average case complexity O(log(W)) (W being word size), just like worst case complexity of carry-lookahead adder. The worst case complexities are different, of course, O(W) and O(log(W)). But that worst-case complexity will occur with probability 1/2^W. What's more, if you dynamically add words of size W where one word has two parts H and L and H is either ones or zeroes, then complexity of asynchronous adder performance will be O(log(L size)) in average case. That may be a case when same adder used for addresses computation and general addition in generic CPU pipeline. Given that ARM has, I believe, immediate operand of size 12, you will have some nice speedup in address computation out of thin air. Without introducing any hacks and/or optimizations.
2. It does deliver some benefits, but not all. Truly clockless design is desirable in some cases due to power concerns; for example the Novelda Xethru ultra-wideband radar SoCs are actually clockless, because power distribution networks can account for 20%+ of the power consumed in chips like this. (This is what I've heard, I don't have a citation for this. The paper I quote below similarly handwaves and throws around numbers from 26% all the way up to 40%, but they don't do any analysis of their own on this)
I've never used a clockless CPU design before, but the theoretical advantages are listed out quite nicely in this paper , which lists (among other things) the natural ability for the CPU to sit at idle (not executing `NOP` instructions, actually idle) when no work is available. It appears that the AMULET 3 processor (which is compared against an ARM 9 core) is competitive in power consumption, but doesn't quite stand up in performance. While still pretty impressive for a research project, this shows that we do still have quite a bit of work to do before these chips are ruling the world (if, indeed, we can scale up our tools to the point that designing these isn't just an exercise in frustration).
That just makes so much sense. Just think about how much power could be saved with all the computing devices that are idle pretty much of the time.
The power consumption is of importance for HPC - current path to exascale is limited by power consumption.
It's already the case even in synchronous microprocessor. It's exploited by side-channel attacks based on power analysis. It makes it very difficult to implement effective countermeasures using dual-rail protocol in hardware. You can read a bit about it in the state-of-the-art section of one of my papers , which will also give you some other references :).
2. Not really. You still have the possibility of varying the speed on an asynchronous CPU. When you want stuff to go faster you can raise the voltage and increase the cooling.
I remember reading at the time of the ARM AMULET that they tested one by cooling it with liquid nitrogen and ran it at a high voltage, and got it to go as faster in benchmarks than the contemporary standard ARM processor.
This is why you don't need all your USB peripherals to be in perfect sync without having an associated penalty for frequency matching. In reality, things are a bit more complex, but in general self-clocking signal form is a must have for such applications.
In particular, in CPUs with a big centralized register file there can be significant overhead to having an asynchronous CPU.
There are certain architectural cases in which it can be a killer advantage or other cases in which it is pretty much the only way forward (e.g. in a 3D chip it can be quite difficult to distribute a high speed, high quality/low skew+jitter clock)
But scoreboarding allows for most gains of out-of-order execution and is cheap.
How thick is "3D" here, and why is that? What makes a bit of vertical distance harder than several mm of horizontal distance?
Which incidentally, is also the main reason why asynchronous logic isn't more popular.
 - https://en.wikipedia.org/wiki/Inductance#Self-inductance_of_...
Does anybody recommend any readings besides papers on the above?
As far as I can tell most engineers view asynchronous processors as arcane equipment only meant for the most specialized tasks.
Also check out Ivan sutherland's group at PSU: http://arc.cecs.pdx.edu/publications
There probably are new internal timing attacks that could be exposed through some asynchronous CPU designs.
Also, big CPUs are power limited anyway. I mean "speed step" allows one core to run fast, as long as the others are unloaded.
Q. would tickless kernels benefit from running on async CPUs?
Of course you probably still want load-balancing between long-lived tasks, not sure how you'd handle that with a fully async system.
Last update is from 2010 so I guess it could use some research :)
One of my proofs likenee it to (at the time) ai, where it held out such promise, got tons of hype, then reliably would totally disgrace itself every ten years...