Ah, one of my hobby horses. I'd love for this to take off but so far I have not seen anything that convinced me that it will. I have a proof-of-concept of 'life' in clockless form, which was quite a bit of a headache, and since life is theoretically Turing complete that generalizes to the possibility of making clockless computers at scale. The one nasty little detail is that the computer implemented on the life substrate would have a clock simulated in life...
My favorite subject too. Do you have a writeup posted
anywhere? What delay model did you assume (e.g., SI, DI, QDI, burst
mode, something else)? FWIW, the Amulet processor from the Manchester
group and a prototype from Martin's group at Caltech have demonstrated
that asynchronous general purpose processors are possible.
I don't remember ever doing a proper write up of that project, I was hip deep into all kinds of esoteric computing a few years ago (which started with brainfuck and ended with computing fabrics). Fascinating stuff. I was not aware of the Amulet processor, I will definitely look into that, thank you very much for the pointer.
True, but inventing a new language for asynchronous hardware wouldn't be as big of a job as reinventing an industry standard like VHDL. Something
like a small lightweight concurrent process description formalism with a Petri net based
operational semantics would do nicely, and such things have been well
understood for decades. The hard part is for it to gain traction.
As Moore's law slows down, this time for real, there have been a few unorthodox ideas lately. 3DIC, or Cerebras' whole-wafer CPU. This looks like a thing worthy pursuing. With free money and big corps racing for a market first (disruptors trying not to be disrupted), who knows.
I wonder how much in terms of power-performance-area can be gained from going asynchronous.
There have been many quantitative studies presented over the years at
the IEEE Async conferences [1]. The best use cases until now have been
niche applications calling for ultra-low power where performance isn't
too critical.
Ladder logic already exists and was designed for documenting and programming asynchronous clock-less computers. The problem is that it's like programming C++ in the early 90's where every hardware vendor sells you their own proprietary implementation.
Even performance specification gets harder. How does one benchmark the latest AMD and Intel processors if performance is different for each unit (and completely changes for each workload)?
Asynchronous circuits are full of promises (the largest ones on the low power segment), the reason they are not standard is because the are hard.
Aren't we heading for the same problem the way things are going with
dynamic frequency scaling in AMD and Intel processors? How do you
predict when a particular workload running a little hotter than
anticipated will make it kick in? If the answer is to benchmark them
at the factory under carefully controlled conditions and reject those
that don't fall within a narrow range, then what's the difference?
It's similar. The difference is that there are way more dimensions in asynchronous processors.
Anyway, when I was studying the subject, the one problem people wrote about was testing. I think the GP's problem and the one I pointed can only get some attention after tests get reasonably solved, and they both may be easier than they look like.
> the reason they are not standard is because the are hard.
With your first sentence, I was really hoping that the punchline was going to be "because they are hard to market"...
Not only because superficially I'm a cynic, but also because it implies that there's some hope since the MHz/GHz wars are probably at their end or near their end.
Follow the Apple strategy for marketing. Focus on real-world use-cases and how much better they are (current performance at less battery, new use-cases that weren’t possible before, etc).
Yeah that's fine if you're vertically integrated already and have a captive customer that would likely line up around the block for your chip even if it was worse.
My question is, if you're a hw startup that is looking for a bulk buyer, how do you land that buyer. Or, alternatively how do you get bought by apple so they put your startup's chip? How do you compete with sham artists like nervana that are good at talking the talk?
Or is it hopeless? Is there only room for skunk works internal hardware r&d embedded in multibillion dollar companies?
Nope. Apple (& many others) buy hardware companies all the time. The tech (or at least some of it) behind AirPods was an acquisition if I recall correctly.
I disagree that you need to be vertically integrated to make that kind of claim. Figure out the niche market you are targeting that is underserved by the big entities today (eg smart speakers, headphones, deep learning in mobile phones, flexible chips, etc) and why your HW solution is better.
The more challenging situation is if your solution has a higher switching/training cost. If suddenly optimizing your chip requires a lot of training be up front with that and the training materials for SW devs will need to be top notch. If your chip has integration challenges with existing external components simplify them or make sure it’s a training thing for the HW engineers integrating your component.
Just as with SW the harder part is figuring out how to position your solution to be an easy yes. The parts that make HW difficult as a business is that prototyping and manufacturing require much larger lead times generally so all your costs go up (meaning you need deep deep pockets to bring it to market and patient partners).
Like don’t do what Mill CPU did. If your solution means “all SW ever needs to be rewritten” that’s a non starter of a business unless you’re targeting a small niche where that’s not prohibitive (eg military applications with 20 year duty cycles). Generally you want easy ways to make migration possible. So in the mill example, if they can’t focus on just the compiler to make their CPU perform well, they’re DOA. This pivot of “we also need to write our own OS” tells me they’re not going to be successful. Not because they’re technical claims are incorrect (they may be) but because the market they’re going after (or at least that they said) was the “x86” general compute market which is dominated by existing OSes. why not fix the parts of Linux that are a problem for Mill?). Their approach is fundamentally incoherent and misdirected IMO.
Do you understand how this second post you have made, which is more substantive, has absolutely nothing in common, and in many ways is diametrically opposite to your initial advice of "follow apples strategy for marketing", since apple has done literally all the things you say not to do, "all software must be rewritten" (sometimes in alien languages to boot), "don't require another OS" oh I seem to remember apple having a different operating system from what everyone else does...
I feel like you're kind of twisting my words (maybe because I wasn't clear & you got confused?).
My specific advice was originally towards the marketing strategy. So "all software must be rewritten" is not a marketing strategy. That is a go to market strategy. Additionally note that I'm specifically providing my thoughts on the go to market strategy a new entrant (i.e. startup) might use. I don't know why you think I would be saying that the go to market strategy for a startup would be the same as Apple. It's not. Apple has the scale and bankroll that they can operate & innovate on multiple fronts simultaneously. That is not an avenue available to a smaller competitor.
I'm talking about general trends & general advice. I'm saying if you're starting a new chip company, you do not start by boiling the ocean. If you're Google you already make ~40 billion dollars of (relatively high margin) revenue each quarter (Apple makes ~100 billion). The kinds of investments you make with that cash flow is very different from if you're a chip company with a few hundred million in the bank from VCs to bankroll development & no revenues to speak of initially (Nuvia, Cerebras). It's also immediately apparent from this general framework that something like Mills is DOA & will never get anywhere. The company either doesn't have the focus OR they're extremely underfunded to land what actually needs to be done. Likely the latter. They're trying to innovate too much on too many different parts from reading their forums while simultaneously trying to build a company on principle free of VCs & generally alien from how SV works.
If you want advice on how to a completely new chip is brought to market & the extreme hyper-focus that kind of start up needs, I think you can use my advice on that to help you be more successful (or, if you're an investor, evaluate whether such a company is worth investing in or trying to do too much). None of these are necessarily hard & fast rules of course. The specifics of the situation can matter (although usually not as much as you think) but I use this as a first principles starting point before adjusting for the specifics of the situation. For example, if an asynchronous chip + a matching novel language can reduce asynchronous program costs by 20% & provide performance 1000x greater per watt but you also need a paired custom OS, I'd probably say "go & build your own language & operating system" but pair it with "what's the developer story? Do we build something like Rosetta? Is there a non-trivial amount of code we'd need ported to make our solution attractive? etc". I'd of course first evaluate what amount of money that company is looking to raise & whether they're banking on their own abilities to be able to actually generate novel industry-leading inventions or if they're just taking existing tech/concepts that aren't popular yet & polishing it. Building world-class teams on multiple fronts, especially in both chip design & compilers, operating systems, & language design is extraordinarily hard. No one does it all at the beginning. Like it took Facebook a while to get to a point where they built their own PHP compiler for their needs. The cost of it didn't make sense when FB had smaller revenues & needed more of that revenue to focus on other ares of the business.
You also have to keep in mind that modern flows can already automatically place more area/power efficient but slower transistors on the non-critical paths.
Thus signals can get balanced and the wasted area/performance is not as big as one might think.
This article takes me back to my college days ca. 2009, my senior design project was asynchronous cryptography circuits... I read this article and a few research papers more than a few times before I gained an entry level understanding of asynchronous circuits. Unfortunately the entire concept was very difficult for even senior computer engineering students to wrap their heads around when they had just been studying clocked circuits in-depth. Understanding the mueller-c element [0] in particular caused me significant difficulties.
Understanding the concept from a high-level isn't too bad however. As the article covers, if you visualize each piece of information moving along the pipeline as a hand-off, like a chain of people passing balls to each other, with the rule that they can't pass a new ball until they've passed the previous ball, you have a rough idea of how asynchronous processing circuits work. What's funny is that a natural "clock" begins to take shape at that point, although it's determined by the slowest component in your pipeline, rather than by an oscillator. The ball diagrams from the Rx libraries are actually also good visualizations for asynchronous circuits [1].
Those were some rough months attempting to achieve our advisor's goals, clearly the trauma still hasn't left me.
Wow. Somehow this feels like an even more extreme version of Carl Hewitt's actor model in which individual actors themselves involve asynchronous components.
I worked on a research project using these—the design is totally wild, and not just because they don't have clocks. I remember talking to one of the other people at the company, and he had a debugging story that—if I'm recalling correctly—came down to one core seeing the output of another core part way through the register being updated. Cores communicate by linking the top registers of their stacks against the top register of neighboring cores and, since there's no system-wide clock, two cores can run slightly out of step with one another.
That must not have been easy to diagnose! I suppose that's the trade-off with not having clocks, although I don't really know enough about computer architecture to judge.
I wonder if it makes sense to have two identical cores trying to talk to each other without a clock to sync them. If they were more big.LITTLE style you might not fool yourself into thinking you can count on any congruity between runs or over time.
External systems have clocks build into their communications protocols. Maybe the cores are just islands unto themselves.
I saw the title and was going to comment about the same thing. My understanding is the GA144 is clockless. From the product literature the advantage is reduced power consumption. It is also a Forth machine.
I've wanted to get one of these for a while but the eval board price is a little steep for someone who just wants to tinker with it. The chip individually is not terribly expensive. At one point there was a budget version available for around $20 but these are no longer available.
Pretty hard to get around that limitation.