Hacker News new | past | comments | ask | show | jobs | submit login
Transputer (wikipedia.org)
236 points by simonpure 5 months ago | hide | past | web | favorite | 140 comments



In 1988 a friend and I presented at the 3rd Conference on Hypercubes and Concurrent Computers. We had spent the summer and fall programming an Intel Scientific Hypercube. Fractal calculations, heat diffusion, chemical reactions, that sort of thing. Used to take us 20 minutes a run.

We’re on the vendor display floor walking around, talking to folks, watching demos. A guy has a board in an expansion board in his desktop computer. The cover is off and he’s pointing out the 2 transputer chips on it. Meanwhile the display screen is filling out a familiar drawing of a fractal. I ask how long it took them to render the drawing. “Oh, no, it’s doing that work now.” A cute little board readily outstripping our refrigerator sized machine back home.

I ended up talking with David May. The thing he seemed most proud of at the moment was their floating point work. They had written a description of their new floating point unit in Occam. They had tested and debugged it. They had formally proved the implementation of a standard (IEEE 754?). And then they had built the silicon.

I returned home with a set of inmos manuals. Never could get anyone back at uni to see the potential. Sigh. Still, at the moment I knew I was truly in the presence of the state of the art.



Could you be a bit more specific in how the two companies and / or products are related, apart from the nameplay? I noticed that the then chief architect of Inmos also co-founded XMOS.

There seems to be a pretty active forum here: https://www.xcore.com


“ The name XMOS is a loose reference to Inmos. Some concepts found in XMOS technology (such as channels and threads) are part of the Transputer legacy.”


They were both started by the same person, https://en.m.wikipedia.org/wiki/David_May_(computer_scientis...

The XMOS processors share many of the same architectural properties as Transputers.


I have an XMOS dev board sitting in a drawer basically unused since I found I had to use their proprietary extended C dialect. But I understand they may have fixed this since?


You have to use their XC dialect, but it is mostly C anyway.

And you can include reglar h-files to easily link to and build regular C. With provided macros you can have h-files which are using some XC features but still be compatible with both XC and C files (making it easy to wrap XC code and call it from C and vice versa).

There is also this that can use XC features from C. https://github.com/xmos/lib_xcore_c

But I don't find XC to be bad, and likely a much better starting point than the library above.

The compiler is a fork from gcc from 2006 (if I remember correctly) and that shows its age somewhat. The many protections in XC can be a bit tedious as well.


I had existing code written in C++. I ran it using GCC on the Parallax Propeller instead. I had no desire to port to XC as I was trying to keep a core of it relatively cross platform.

Of course this is just another of my unfinished hobby projects, so :-)


I don't know anything about that dialect. What were your experiences?

Is your codebase uncompilable using their tools? Would a simple hello world compile? How foreign does a simple blinky program look to a C programmer?


Their "xc" is C with language extensions to support CSP style parallel processing.

https://en.wikipedia.org/wiki/XC_(programming_language)


Transputers were and are interesting, they are the closest to a dataflow or systolic array processor with enough power and forethought gone into their design that you could use them for real world applications. But the MHz wars killed all those effort and only now, that we've exhausted the easy gains does it make sense to review the past to see what we can salvage in terms of ideas.


Transputers weren't dataflow. They were parallel processors. The Transputer had a program counter (instruction pointer). Dataflow machines don't need one: when an instruction has all of its inputs, it executes on the next available processor. Only a few experimental dataflow machines were ever built, e.g. the one at Manchester University.


Technically true, which is probably the best kind of true, but if we're going to get technical I did say closest. You could program a Transputer to use its links as inputs and outputs in a fabric where the implementation details of program counters and firmware running on a particular Transputer were hidden from sight. This allowed for all kind of interesting architectures to be created 'ad hoc' without having to go through the stages of circuit design and so on, and it allowed for much more complex operations than you'd be able to get out of a 'real' dataflow processor on a single tick or message passed from one processor to another because of the much higher level at which they operated.

So sure, they weren't true dataflow processors right off the bat but you could make them look like a pretty good simulation of one on the outside, even if you didn't have access to a crossbar switch (which would be cheating) and absent the real thing that was as close as I was going to get at the time.


Running something like https://github.com/TimelyDataflow/timely-dataflow on a transputer fabric might be how general purpose computing could interface with this sort of hardware.


Check out https://en.wikipedia.org/wiki/Dataflow_programming -- the word means a lot more than you've mentioned.


I've been developing a dataflow programming language, which runs on a tagged dataflow virtual machine: http://www.fmjlang.co.uk/fmj/tutorials/TOC.html

I'll update the link soon, as I've added several new features to the language.


I've worked on a ton of dataflow systems which were macro-dataflow, i.e. they executed on conventional processors. Great programming model in many instances.


Good name


Please do!


You could program a dataflow system on a network of Transputers but the hardware isn't dataflow. We would have called it a General Purpose Parallel machine.


They're talking about https://en.wikipedia.org/wiki/Dataflow_architecture not dataflow programming.


The two are related. The Wikipedia article I posted has appropriate links that explain the relationship. Note that I replied to someone who said “dataflow“ and not dataflow hardware or architecture.


> Note that I replied to someone who said “dataflow“ and not dataflow hardware or architecture.

The comment you directly replied to, yes, but it was itself responding to a comment saying "they are the closest to a dataflow... processor" which seems to be referring to dataflow architecture?


What is the key difference of transputers and GPUs / vector processors?

Is PS3 with several Cells somehow reminiscent of a transputer?


Yes - there where a number of commonalities and some research was done into running Occam (a language co-designed with the transputer) on the cell by means of a transputer VM: http://www.transterpreter.org/publications/pdfs/a-cell-trans...

Fun times :)


For what it's worth, the FPS T machine attached processors used transputers plus vector units.


GPUs use SIMT or similar, where there are several parallel threads with a shared program counter.


The Maxeler hardware claims to be dataflow, and there is some in these parts. However, I couldn't find out much about it except that it doesn't seem to support languages with a Lucid/SISAL-type heritage.


MIT Tagged Token machine is another TTDA. I actually wonder if cheaper CAM or Qubits would help, even though debugging would still be very, very painful.


My favorite one (that probably will never come back) is the concept of ternary computer on a balanced ternary number system.

In hindsight it is a much better way to encode signs for both integers and floating points.


This was the Setun (https://en.wikipedia.org/wiki/Setun), developed at Moscow State University.


Why is ternary better? I haven't heard this argued before?


Maximal information efficiency is achieved at "e-ary" (~2.718281828) distinct values per place, and this is closer to ternary than binary.


I read this as information density in theory, not necessarily in practice.

For this result to be true in practice evidence that ternary hardware can be manufactured and operated at the scale of binary hardware, e.g clock speed, error rates and power draw. Everything I have heard prior pointed to the conclusion that binary is more efficient when accounting for the physical properties of the substrate.

I'm sceptical of claims in the realm of physics on the basis of pure math without empirical confirmation.

Is there any modern research into manufacturing ternary chips?


Binary also has mathematical advantages over ternary, for instance 2 is the smallest integer greater than 1 (no shit) so powers of 2 are as close together as you can get in the integers.

Bitwise operations correspond to the finite field GF(2), while there is a GF(3) it's not nearly as interesting or useful.


Binary has advantages on a discrete/logical sense (namely bitwise operation).

Balanced ternary (just ternary no) has the advantage that the sign is trivially included in the number, so there is no consideration of (1/2 complement or signed/unsigned extension)

This is partially a lie, as balanced ternary can be read as normal ternary giving only non-negative integers... Still it is a more natural encoding of arithmetic


> powers of 2 are as close together as you can get in the integers.

Why is that useful?

> not nearly as interesting or useful.

Why it that important in computing?


I'm unfamiliar with the literature in this area. May I ask why? (Also, why does Euler's constant appear all over the place?)


         /
       (0)
      /   \
    0      (1)
   / \     / \
  0   1   0  (1)

  = 011 = 3 (in decimal)
a) Numeral Systems (e.g. ternary) are just trees, and specific numerals are just paths from root to leaf.

b) A 6-digit numeral roughly corresponds to a tree of length 6.

c) Base_10 corresponds to a tree with 10 possible children for each node.

d) e is the most efficient multiplier when trying to achieve compound growth in the fewest iterations of multiplication.

> Also, why does Euler's constant appear all over the place?

e is special because e^x is its own derivative. It also acts as a "bridge" between addition and multiplication. It often appears where growth or trees are involved.


Thank you for taking the time to explain, with a diagram even.

Another comment mentioned "radix economy", which, together with your description helped me understand (generally) why Euler's constant is the most efficient base in terms of number of digits needed to express numbers.

https://en.wikipedia.org/wiki/Radix_economy


I don't understand point d)

What do you mean by efficient?

Any other literature you could reference? Cause this sounds interesting and I'd like to do some more research into this.


The cost of finding a leaf in balanced tree (with data only at leaves, which is how we represent integers in base B) of size N is on average proportional to its branching factor B multiplied by its height H. Height is (log N)/(log B), so cost is B * (log N)/(log B).

By derivatives, that's minimized when B satisfies

0 = (log N) (1 / (log B) + B(-1/((log B)^2)(1/B)

==

0 = 1/(log B) - 1/(log B)^2

1 = log B

Base of the logarithm = B.

This looks like you can pick any logarithmic base b you want, and so any B you want, but in fact the derivative I wrote assumes e is the base (hence the term "natural" logarithm.

Other bases b would yield a scaling factor of ... e/b, since d(e^x)/dx = e^x

and b^x = e^((ln b) x)

You can chase "deeper" reasons for this all day long by digging deeper into the the many definitions/properties of e and proving they are equivalent.

Playing with e is the most fun you can have in pre/calculus.


Frankly, it's just a pattern I've observed from playing with numbers myself. I'm unsure how to explain it properly, and I have no academic sources to point toward. You can probably find a better explanation somewhere in an article on optimization problems.

The intuition is that (e) is the optimal water-level when limited water is distributed among a variable amount of buckets where all the filled buckets multiply each other. Alternatively, it's like how volume() is maximized where

  volume(x, y, z) = (x y z)
  const = (x + y + z)
when (x = y = z). Except in our original situation, the number of dimensions is arbitrary instead of fixed.


> Also, why does Euler's constant appear all over the place?

That is just a natural consequence.


Downvoters, this is a reference to the "natural" logarithm.


OK, but the percentage increase in efficiency will be what? At the cost of completely overhauling all your low-level hardware?


About 5.7 percent. So not all that much. For comparison, binary is about 42 percent more efficient then decimal. See [1] for more numbers.

1: https://en.wikipedia.org/wiki/Radix_economy#Comparing_differ...


Balanced ternary is insane, and most data formats would need to be adapted to deal with it. Except in specialized processors, it is ludicrous to even consider it until almost every other avenue has been explored.


Because we're now locked into binary? Balanced ternary computers were built and used in the Soviet Union until 1970: http://www.computer-museum.ru/english/setun.htm


> Because we're now locked into binary

Binary has so many useful properties that are relatively intuitive to understand. Binary is a uniquely convenient number system. Even day to day, base-two number systems are hyper-convenient.

One thing I can think of that balanced ternary has interesting properties for is 2D space partitioning. Balanced ternary partitioning is an interesting tool; then again, you could just use modulo binary space partitioning (i.e. wrap the space around by half the width of a binary partition), it's almost as good. I wonder what the balanced ternary equivalent of an octree would be, I guess it would be a 27-tree, kinda cool, like the 3² tree.

> used in the Soviet Union until 1970

So was a planned economy, and they still haven't recovered. The KGB sure have staying power though, must be the balanced ternary advantage! ;- )

It's not like balanced ternary has no positives, but you have to acknowledge that well before even decent relay computers were invented, binary was already much better explored than balanced ternary; obviously relay computers helped solidify that (though bi-quinary did not survive, popular as it was since Colossus).


The Setun wasn't part of the planned economy, which resulted in it being cancelled by the USSR's economic planners.


> Binary is a uniquely convenient number system.

Binary has many advantages, but more from a logical perspective than a numerical one.

Namely signs.

I don't think balanced ternary computers are ever coming back. I think it is a nice lost technology :)

(And if I ever will find myself in the condition to start a civilization from nothing I will consider a balanced ternary/5/7 number system)


> Namely signs.

Signed arithmetic is elegant enough in binary, I guarantee you will fail to express even basic arithmetic concepts in balanced ternary to any lay person.

As for experts implementing them in a system, there are advantages to balanced forms, but good luck finding enough experts. The benefits probably start kicking in for very large integers, where the carry itself can take longer than you might like.


Information efficiency defined how?

What specifically becomes more efficient about computation?

Very curious, I never heard of this before.


Ternary is one of those things where the theory looks beautiful until you need to start putting circuits together. Then it's kinda shitty.


This is a matter of physical implementations. Using normal transistors ternary is horrible, but there are other physical medium where ternary is more natural (often using magnetic field).


Can you make 5nm CPU features with magnetic fields, even in principle?


Like mining tailings for gold, or fracking for oil... but for silicon.


If you're interested in transputers, you will likely find this interesting (https://en.wikipedia.org/wiki/Atari_Transputer_Workstation)

The Atari Transputer Workstation was a technological oddball of which there were quite a few around that time.

see also: https://en.wikipedia.org/wiki/Connection_Machine and https://en.wikipedia.org/wiki/Lisp_machine

The 80's were a wonderful time for computer hardware...


There was a range of transputer addons and dedicated systems all the way up to the very pretty and very expensive Meiko Computing Surface[1] with something like 100 (300? honestly can't remember) transputers in it. It seemed, for a while, like they were being shoehorned into everything. There was a distinct belief that transputers were the way the future was going. I think the expectation was before too long everything would gain a set of transputers as replacement to a single one thread CPU, or in addition to, much like machines gained graphics cards.

[1] https://en.wikipedia.org/wiki/Meiko_Scientific#Computing_Sur...

Seem to remember there was once a dedicated page with pictures, could be mistaken, but if there was it's been deleted. Such is Wikipedia "progress".


I had an encounter with a Meiko Computing Surface (64 or 128 Transputers, I think?) around 1990, as I was studying Occam at the time. It was impressive to watch it rendering high-res 3D fractal landscapes in real time.


Ditto a few years before that - we had a Meiko box in the visuals R&D dept of the flight simulator company where I was doing my electronic engineering apprenticeship, although I didn't really get hands-on with it.


Amusingly the Atari Transputer started out as an Amiga project but Commodore wasn't interested. http://www.classiccmp.org/transputer/metacomco.htm


I was at Atari when, one fine day, some folks in the UK that we had never heard of did a product announcement about an Atari computer that definitely was not coming out of our engineering group. (We were down to one building at that point, if you don't count our remote office in Monterey as part of the GEM porting effort).

I remember the Tramiels being a little flabbergasted by the Transputer announcement as well.


I was a summer intern at the UK company (Perihelion) doing the Atari-based Transputer machines. The word there was that because Atari had invested in the UK company (?) the Atari machine was essential to include as the front-end, even though it didn't really fit the UK designers' idea of what a good front-end machine would be... (nothing against the Atari design, just that its quirks/shortcuts didn't match up with the Transputer cluster backend 'vision').


Wow, thanks for that tidbit of history. Always enjoy your Atari reminiscences.


Yep, a very interesting machine, and interesting time.

I still have the Helios reference manuals (OS for the ATW).

Now that Intel can no longe snuff out competition by being one fab-generation ahead, we might have space for more interesting architectures.


I got to play with one a few months ago. cool thing!


I programmed it in university in one of our classes. The objective was to make a Mandelbrot renderer faster.


I’ve always thought of the Zachtronics game TIS-100 [1] to resemble programming an extremely simple transputer. Definitely worthwhile to try the game, if only to find out about all new kinds of problems you run into if you have to distribute your processing over multiple independent nodes that communicate over serial links!

[1] http://www.zachtronics.com/tis-100/


Feel it's more like Chuck Moore's GA144, but yes good game



So we can expect a new appearance on the front page in 2022 :)


I hypothesize what was really missing to make the transputer successful was a language/compile-target to express propagators: https://youtu.be/nY1BCv3xn24

We are starting to see languages that could leverage such an architecture: * http://minikanren.org/ * https://cuelang.org/ * https://github.com/ekmett/guanxi

Is there an algebraic logic strongly associated with what we conventionally think of as types (see https://cuelang.org/docs/concepts/logic/) which we can effectively and efficiently compute unions and Boolean satisfiability?


The thing that sunk the Transputer was that you couldn't (for the time it had a hardware performance advantage over competitors) program it in C.


A friend of mine made a C compiler for Transputer.

He also made a loader that could copy a program to a whole network of transputer chips in seconds, instead of hours like theirs.

They hated his C compiler, and hated him for making it, and did their best to keep anybody from knowing about it. That was easier to do, before www.


I'm not sure who is "They" here. Well...ok I have a bit of an idea of one or two people ;)

But for 90% of the transputer group at the time it was very obvious that we needed alien language support. We had to spin up a compiler group, which took time. I seem to remember we also OEM'ed some compilers from outside vendors before the internal ones were ready (memory hazy and I didn't work directly on software there). I had direct contact with many customers, specifically discussing C language support and I don't at all remember anything like what you describe where 3rd party products were deliberately not mentioned.


Supply, cost and the 9000 being delayed. The 486 killed the Transputer basically.


I'm speaking about a time long before the T9 was even thought of. The 386 was performance competitive with the T8 if you didn't need FP.


I'm not sure that's true, at least if you were running HeliOS.


Right, and Fortran was used for scientific work (not necessarily with Helios?). https://en.wikipedia.org/wiki/HeliOS


Again, I'm talking about 1986/7 or so. In that time frame the T4/T8 were performance competitive with anybody's CPU. However you could only program in Occam. Huge hygiene factor.

Of course today we have golang which has much the same features as Occam and everybody thinks it is the best thing since sliced bread. And we have folks seriously re-writing dusty decks in Rust. But back then you just couldn't seriously sell a CPU that forced you to program in an unknown, fairly basic, programming language.

Also, internally we still used a lot of BCPL. So the company itself didn't eat its own dogfood in the way it expected customers to.


http://tardis.dl.ac.uk/computing_history/parallel.html confirms my memory, but perhaps the timeframe is wrong. (I rather thought Bill Purvis wrote a compiler, but it seems that was Occam for the T machine.) Thanks for the history, anyhow.


What did you program it in?


Occam was the native INMOS provided programming language.


I programmed in Occam for a time. It used two spaces for indentationally-significant code which I thought a great idea then -- and which is one reason I liked Python and took easily to it. https://en.wikipedia.org/wiki/Occam_(programming_language)

For a time around 1986-1987 I had several Transputer boards in a robot lab I worked in at Princeton University. I liked the INMOS sale pitch that someday you could have a transputer at every robot joint. Something similar is relatively easily accomplished now using various serial connections between microcontrollers -- even if it might still be easier to just run cables to the joints from boards elsewhere. That combination of Transputer boards (including several loaner boards from another lab on campus which has got them related to NSA-funded signal processing stuff) for a time gave me the most powerful computer on the Princeton campus in terms of total memory and CPU cycles -- even if the IBM 308X mainframe in the computer center no doubt had more total I/O capacity.

One other thing I liked about transputers was the (for the time) high speed serial links between them which could also go to special chips to read or write parallel data. I used such chips to interface a network of transputers to the teach pendant of an Hitachi A4 industrial robot so they could essentially press the buttons on the pendant to drive the robot around. Interesting how modern USB in a way is essentially just a serial transputer link. A PU undergrad student I helped used that setup for a project to use sensing whiskers to "feel" where objects were (inspired by a robotics magazine article I had seen by a group in Australia who had done that for a different robot).

While transputers not supporting C was an issue early on, another issue was just that Occam did not easily support recursion. Transputer at the start also did not have standard libraries of code then like C had from Unix. And also transputers were expensive (as was the software development kit) and also they were not easy to get in the USA with various supply chain issues and delays.

Occam was mind-expanding in its own way. I had previously networked a couple of Commodore VICs and a Commodore 64 for a robot project (via modems to an IBM mainframe) for my undergraduate senior project at Princeton a couple of years earlier using assembler, BASIC, and C -- so the transputers with Occam were a big step up from that conceptually. Still, practically I had gotten the Commodore equipment plus a parallel code simulator (a VM in C) I wrote for the IBM mainframe running under VMUTS to do more interesting stuff -- perhaps because I had years of experience programming in those other languages by then whereas Occam was so different and I did not spend that much time with it before leaving that job. Even without Occam and transputers, that robot lab job was the best paying job working for someone else I ever would have in many ways as far as autonomy, mastery, and purpose -- but I did not appreciate it then as it was my first real job beyond college and I figured the grass would be greener somewhere else -- not realizing that grass tends to be greener where you water it. Thanks for creating such a great working situation for me, Alain!


> While Inmos and the transputer did not achieve this expectation, the transputer architecture was highly influential in provoking new ideas in computer architecture, several of which have re-emerged in different forms in modern systems.

I have a vague sense that there's a rich vein of potential projects and maybe businesses in "before their time" ideas in the world of computing.


If you want inspiration the Computer Chronicles was a show on PBS documenting in real time the process and rise of computers. I believe all episodes are on YouTube here: https://www.youtube.com/user/ComputerChroniclesYT


This is incredible. I think I've just found my new late-night / fall-asleep 'fictional' TV show — because it really feels that way, in our weird cognitive perception of the past.

Thanks a lot for the pointer. This may spark funny and uncanny world-domination ideas! ( :

A serious hint/note for the "futurologists" / innovation-driven folks out there — my people: look for what was true back then but is no longer, especially in the form of limitations or roadblocks or axiom (hard limit) to a design. Remove that piece and see what you get...


Couldn't agree more. A lot of things were invented in the 60's-80's which didn't work out due to lack of applications, technological limitations of those times etc.


With co-host Gary Kildall, of CP/M and Digital Research fame!


Yes, hundreds, many of which are now being 'reinvented' or 'rediscovered'. Time to figure out how to persuade investors to get on board - processor architecture isn't one of their fads at the moment.


The japanese tried to build massively parallel computers back in the 80s. The project ultimately failed because the performance of normal CPUs continued to improve. It looks like we are once again in an age where parallelism seems to be the only way out: CPUs have hit a thermal ceiling, speculative execution has led to vulnerabilities... Given the popularity of today's GPUs and their applications, I'd say the 5th generation computer project was ahead of its time.

https://en.wikipedia.org/wiki/Fifth_generation_computer


A partial list:

USB On-chip PLL clock generator Automatic power on reset On-chip programmable DRAM controller Hardware threading On-chip I/O DMA Concurrency and parallelism in programming languages


I wonder if the Cell microarchitecture will make a comeback someday in some form.


Some additional commentary from a blog analysing the Inq's story:

https://assemblergames.com/threads/ps3-concerns-cell-archite...

«It means that no more than three of the SPE's can talk to the memory controller at one time using full capacity. Four SPE's will fill the bus, and the CPU controller will not be able to access the memory at all.

[...]

What you can do is have all the SPE's work at the same time, but using nowhere near the capacity they each have. Basically the PS3 gets punished for performing well, and tweaking cell for better results in the future seems to be a nightmare if not impossible. »


I doubt it. As shipped, it was profoundly broken.

Charlie Demerjian did a good exposé on this for the Inquirer, but the page has been taken down. Here is a blog post that references it, though:

http://topofcool.com/blog/2006/06/05/sony-suicide-watch-ps3-...

The memory read speeds for the SPEs (Synergistic Processing Elements, or "APUs" these days) was horrible. It was about 3 orders of magnitude slower than it was meant to be.

This is why the chip basically died. Memory read speeds for the SPUs were impossibly slow, meaning that they were crippled into uselessness, AIUI.


On the 40th anniversary of Inmos, creators of Transputer, we filmed this set of talks discussing the legacy and impact of Inmos in Bristol, UK. You'll find some fun insight into Inmos and the transputer in some of the longer talks.

https://www.youtube.com/playlist?list=PLKbvCgwMcH7A_taW2Td3R...


Question to people who know anything about circuit, processor design: are we confident that current designs and paradigms are almost-as-good-as-it-gets for our classes of materials, or is it just the result of 'good enough design + scale economics = winner CPU arch' (resp. all types of processors) but there could be many great "unknown unknowns" out there?


If Intel were given a container ship full of a gold and a mandate to redo the basic underpinnings of computation, and given a ten year alternative timeline, I'd wager they'd be about 30% faster than an equivalent Intel that maintained the status quo.

Honestly I think we're more held up by programming languages that make concurrent programming hard than by the underlying architecture. It's not that hard to compile an existing C++ or Java codebase to a new ISA. It's hard to rewrite it in a language that scales well to thousands of threads. And that language doesn't even exist. The closest we have is whatever Nvidia is calling C++ on CUDA and that sucks.


Current CPUs are no where near "almost-as-good-as-it-gets" for the simple reason that compatibility with existing software (OS and Consumer Software wise) still is main driving force in CPU design. If you come up with something revolutionary, you have to convince hell-ofa-lot software engineers to port to your arch, because without software no one will buy your hardware.


I guess the choice thus comes down to how much potential benefits may be possible, versus what software actually needs? (I can see first-hand we don't need more single-thread performance in so many use cases)

I just have no idea "how good it can get", compared to what we have now. Is there an alternative world where CPUs vastly outperform ours in performance/power using the same materials? But more importantly how much can "vastly" be?


Yes!

And, by a lot.

A huge portion of the power a CPU consumes is simply due to the clock running which is why modern CPUs go through tremendous efforts to disable circuits not in use and lower the clock speed when the CPU is idle.

However, imagine if we didn't have a clock at all. If, instead, the only thing that caused CPU transistors to switch was new instructions being executed.

In that world, the only time CPUs would consume any power is gate leaks and when the CPU is actually doing stuff. That would translate in extremely low power draw.

Performance would also be way up. Data would mainly be constrained by switching speed, which can be super fast. Today, data speed is mostly constrained by how big the largest section of the pipeline is and how fast the clock moves.

So why don't we do this today? Mostly because the entire industry is built around synchronous (with a clock) CPU design. Switching to an async design would be both super difficult (lack of tools to do so) and very different from the way things work now. Just like parallel programming is very hard, async circuit design requires a large amount of verification that we just lack. Further, HDLs are simply not well built.. but also not well built for async.

Async circuitry usually requires more transistors and more lines. That, however, isn't really a problem anymore. Today, the vast majority of transistors in a modern CPU are spent not on logic, but on the cache.

It'd be super expensive to adopt. It would be totally worth it. But I doubt we'll see it happen until AMD and Intel both completely stop at advancements.


I was in a startup trying to commercialise async technology back in 2002; we wound it back to just doing better clock gating and eventually sold out to Cadence.

It's not a silver bullet. It gives you maybe 30% less power? Gate leakage has been creeping upwards too, since there's a direct speed/leakage tradeoff.


Interesting.

There's been a few advances that have limited gate leakage (primarily finfets that I'm aware of). But it is still there.

I agree though, not a silver bullet. It would buy one generation of power gains and performance.

Marketing wouldn't like it either because clock speed has so often been used to sell CPUs.

I'm also not sure how much modern CPUs can incrementally add async rather than having a groud up redesign and if that would get them close to the same power gains. Already, modern CPUs have impressive latencies for most instructions.

Real gains, though are somewhat unkowable. If I were to guess, the first place we'll see an async CPU will be mobile. After that happens, we should have a much clearer picture of the real gains it grants.


Twenty years ago my supervisor had one of these: https://en.wikipedia.org/wiki/AMULET_microprocessor

There was apparently (I didn't see it) a neat demo where you could run a live benchmark and spray freeze spray on the processor, which would cause it to speed up, since the propagation delay was inherently dependent on gate temperature.

I don't expect to see it commercialised any time soon. Too much retraining and retooling required.


> One very notable feature due to the asynchronous design is the drop of power dissipation to 3 μW when not in use

That's impressively low power!

I wasn't aware that AMULET was a thing. Neat to see that someone put the effort into making an async CPU.

I had heard from one of my professors that Intel tried the same thing with a Pentium 1 and ultimately gave up due to poor tooling. (I don't know the exact timeframe of this, but I believe it was around the P2 or P3.)


Sort of - today's synchronous low power MCUs can achieve better. First Google hit for me: https://www.st.com/en/microcontrollers-microprocessors/stm32...

.. gives me 0.34uA for a 32 bit cortex m4.


The wiki says they are ARM-based. Doesn't this mean all you need to do is to recompile?


Yes. When I said "Too much retraining and retooling required", I meant on the IC design side. An AMULET user would see nothing unusual about the processor apart from uneven execution speed.

(If you're worried about side-channel attacks, you definitely don't want asynchronous technology as it's going to leak data-dependent timing information!)


This is a very interesting perspective. The evident benefit of async helps us see a world/civilization where indeed computing is pervasive to a much, much deeper/higher degree.

Positive note: it could certainly help take silicon-based electronics further in a resource-starved world; it could/should also simply be part of the paradigm of the next thing if it comes soon enough — photo-hype, buzz-ristors, whatever tech wins.

(Thanks for an uplifting glimpse at the "TO DO" list of humanity, and one more spark of interest as a programmer!)


This actually seems like it would be somewhat ideal for many web servers, where in a large number of cases the only interesting things going on are in response to events.



These kinds of questions have an essential unknowability to them, in that every phenomenon you try to model through computing is an inexact approximation - whether it's something like the precision of a mathematical computation or "what is this human being's real name".

What we have done to date is the low-hanging fruit: take known categories of application and juice them up by applying CPU power. And for that, the amount of power you need is "enough to have an effect". A lion's share of the benefits of putting computers in the loop were realized in the 70's or 80's, even though the measured amount of power that could be applied then was relatively miniscule. There were plenty of alternatives at various times that were "better" or "worse" in some technical sense, but succeeded or failed based on other market factors. The real story going forward from then has been changes in I/O interfaces, connectivity and portability - computing has merely "kept up with" the demands imposed by having higher definition display, networking, etc.

So then we have to ask, what are the remaining apps? If we can engineer a new piece of hardware that addresses those, it'll see some adoption. And that's the thrust of going parallel: It'll potentially hit more categories at once than trying to do custom fast silicon for single tasks.

But the long-term trend is likely to be one of consolidating the bottom end of the market with better domain solutions. These solutions can be software-first, hardware-later, because the software now has the breathing room to define the solution space and work with the problem domain abstractly, instead of being totally beholden to "worse-is-better" market forces.


What an intelligent bird's-eye view of the problem. Thank you. I love the perspective, you kind of blend the evolution of simple bits at the hardware level (increasingly lots of them, but nonetheless "known category of applications") with complex high-level software space. The clarity, looking forward, is greatly improved through this lens.

That last paragraph confirms my own vision for the next cycle, the next decade or so.

About "remaining apps": the current cycle of ML, post-parallism (GPUs, DL, etc. since early 2010s) is imho one such candidate for transformative computing, where like vapor or hydrocarbons or electricity, computing lends itself to enabling a whole new category of 'machines', of tools. That's really interesting (just not the end-all be-all some seem to think, but a whole new solution space to build upon).

I guess robotics is a fitting candidate as well (insofar as interacting with real physical objects changes everything) but we are many years away from commercially-viable solutions, afaik.

I also like to think there are (potentially transformative) social or behavioral use of compute-enabled machines that we haven't scratched much yet. Areas of life/civilization generally too complex to be brute-forced or even fully modeled, like biology and health, or the more advanced social behaviors (topics best described by Shakespeare, Tocqueville, Stephen Covey, Robert Greene...); or things we 'just' need to brute-force e.g. life-like graphics/VR or seamless/ambient/'augmented' computing/reality. Some of these may be in for the taking during the next cycle or two.


Single thread performance is still the limiting factor for things like perceived web browser performance. Raw MIPS is available in absurd amounts, but things like memory round trip delay become limiting.

The lower energy limit is https://en.m.wikipedia.org/wiki/Landauer%27s_principle but we're a long way from that.


> memory round trip delay

Is that where Zen's architecture, with memory controller central to the chip (close to memory IO up/down) and 'chiplets' containing cores left and right, makes so much sense?

I'm partial to announcements of "3D" stacking chips; wherein memory cells, controller and CPU cores are arranged like cake layers, thus with extremely short memory round trips. A clever design could entirely remove the need for L3 cache I suppose, make RAM behave essentially as such. It's elegant I think, somewhat balanced in design, essential. And I find it crazy and awesome that cooling those is even possible.


I don't think this is the case. We have seen MIPS and ARM appear, and all sorts of ISA extensions (like NEON). Amazon even offers non-x86 machines now!

The old mantra "without software no one will buy your hardware" is no longer generally true. There are lots of programmers now days, and as long as your new chip can show improvement in one area, someone may use it. We've even had chips which execute Java bytecode directly!


Absolutely not.

“Worse is better” applies on all levels: microarchitectures, ISAs, operating systems, applications, languages, computing paradigms, etc. Once Moore’s law is truly over, we will be forced to make some real progress in areas other than shrinking silicon features.


That was kind of my intuition, from what little I know about these things (nerdy culture, but tangential and second-hand at best).

Tangential remark, however, I hope there's room for post-silicon / -electronics technology. We are far from some of the more advanced use-cases of computing (currently "sci-fi" but hard/real science, just hard problems in technology as well), in ways that no cleverness nor cheat (e.g. qbits) could substitute for actual processing capability. I don't know how many orders of magnitudes silicon still has to go, but my impression is that it's relatively limited physically, compared to what could be. But that's a topic for the 2030's and later, probably, hopefully. Here's to the quest for Landauer's limit!


There's a nice quote on this topic from David May, ISA designer of the Transputer, at the end of this article (available free if you sign up - irritating pay wall): https://www.telegraph.co.uk/technology/2019/12/06/end-moores...


No signup needed, just change your user agent to "Googlebot" :)

>“Moore’s ‘law’ came to an end over 20 years ago,” says David May, professor of Computer Science at Bristol University and lead architect of the influential "transputer" chip. “Only the massive growth of the PC market then the smartphone market made it possible to invest enough to sustain it.

>“There’s now an opportunity for new approaches both to software and to processor architecture. And there are plenty of ideas – some of them have been waiting for 25 years. This presents a great opportunity for innovators – and for investors.”



I feel the same. I long for a stagnation in processing tech so that we can go back to weird assembly tricks being commonplace and necessary to keep up. But gan or diamond will probably step in and prevent that.


Mill CPUs are a potential alternative for the current status quo.

What is going on with them? Curious about how they'll turn out. Hopefully they'll be released one day.

https://en.wikipedia.org/wiki/Mill_architecture


Their compiler isn't ready and they haven't started working on an FPGA implementation yet. I personally expect they'll never launch a product.


It's not the class of materials but the class of problems.

"Good enough for real world existing programs" and "best possible for a specific problem" produce very different designs.


Could you please elaborate? I'm pretty sure I only get 10% at best of the implications of this. I understand crystal clear what you mean — very much agree.

FWIW I can't stress enough that this is just a question / thought experiment, whose point is probably to explain why we could do better but don't, and with good reason (boiling down to "not worth it, at least yet, possibly ever" I suppose, but the details are the meaty part for me).


Related: four-phase logic[0] and its red-haired, CMOS step-child, domino logic[1].

0. https://en.wikipedia.org/wiki/Four-phase_logic

1. https://en.wikipedia.org/wiki/Domino_logic


David May was my computer architecture professor.


David May is my PhD supervisor and cofounder of our new microprocessor company - BeyondRISC. David's 3rd startup ;)


If you're interested in more academic Transputers take a look at the CPA (Communicating Process Architectures) conferences [0]. Just last year it featured two papers on Transputers.

[0] http://wotug.cs.unlv.edu/conference.php?id=46


The concurrent programming language occam that ran on the transputer can still be ran today with occam pi.

https://www.cs.kent.ac.uk/projects/ofa/kroc/


I look forward to running KRoC on Wasm, the http://www.transterpreter.org/ is an inspiration.


Nice to see that there's still Transputer work going on at Kent - I studied Occam briefly there in the late 80s/early 90s and was allowed to look at (not touch!) their Meiko Computing Surface rendering fractal landscapes in real time.


I remember seeing these advertised ubiquitously in computer magazines in the late 1980s. Was always a bit curious about them. This Wikipedia link 30 years later is the closest I ever came to interacting with one myself.


It sounds a liiitle bit like GreenArrays? http://www.greenarraychips.com/

Except with multiple chips, not many cores like GreenArrays.

Similar ideas seem to pop up often and not really catch on too much.


Green Arrays suffer from the issue of low compute power. They target the extreme low power market using a proprietary Forth dialect. It is difficult to benchmark how it performs against existing general purpose ARM/x86-64/RISC-V or even NVIDIA chips other than their claims of efficiency gains due to a parallel architecture. Most of their existing press (especially on HN) is because their founder is the inventor of Forth and that everything is allegedly built on a single bootstrapped end-to-end Forth system with its own EDA toolkit. Would be interesting if somebody can do a rigorous test against e.g. Parallela chips.


I so wanted an Atari ABAQ back in the late '80s.


Are all current computers essentially transputer?




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: