Hacker News new | past | comments | ask | show | jobs | submit login
Inmos and the Transputer – Parallel Ventures (thechipletter.substack.com)
56 points by klelatti 9 months ago | hide | past | favorite | 36 comments



I worked on the ParSys SuperNode in the early 90's, a 96-Transputer machine with a reconfigurable network switch that meant that any 4-regular graph could be realised in hardware.

So the network of transputers could be anything you wanted ... you could configure it to suit your problem.

I learned Occam and wrote some programs for it, but one of my main tasks was to port the NAG serial ForTran library to the machine. If it was going to be used for scientific work, that was regarded as essential.

I remember one occasion when running an "ls" command on a directory not only crashed the machine, but wiped the hard-drive. The OS at the time was a home-grown unix-a-like, and the internal tables for "ls" were fixed size. The NAG library had do many files in a directory, the tables were overflowed and system memory got over-written. The machine was on the network, although it wasn't possible to login remotely, so to use it you had to be in the room. It got to the point where when people saw me coming they backed up all their work across the network and logged off ... they knew it wouldn't be long before the machine would crash and need a hard restore.

The T800 series didn't have memory protection, and didn't have floating point, so using it for scientific calculations seemed doomed. Worse, every time someone brought be a program for parallelisation, I could refactor it and get a 100 to 10,000 times speed-up on a serial machine ... the parallel machine wasn't needed.

It was a beautifully conceived machine, and I desperately wanted it to succeed, but it ended up being nibbled to death by circumstances and never found the niche it needed.

I still occasionally thumb through some of the books I have. Possibly I should donate them to a computing museum.

Or sell them.


> ...every time someone brought be a program for parallelisation, I could refactor it and get a 100 to 10,000 times speed-up on a serial machine...

That is a good summary of my experience working in HPC. Many HPC codes had a strong "just throw hardware at it" vibe with relatively little effort put into maximizing performance of individual nodes. There were a couple cases where the optimized code ran faster on my laptop than the supercomputer with the original code. Expensive network interconnects have diminishing returns if you don't do the software design work.

Another recurring issue was that the process nodes for exotic silicon was usually a few generations behind the state-of-the-art for commodity silicon, so even if I could get the exotic silicon to perform at its theoretical limits, the commodity silicon was sufficiently superior at the basics that you lost most of the advantages of the exotic architecture in practice with good performance engineering.


The T800 series did have floating point, and as I understand it, it was one of the most extensive uses of formal methods in processor design of its era http://www.transputer.net/tn/06/tn06.html

The rough guide to transputers is, T2: 16 bit; T4: 32 bit; T8: FPU; T9: superscalar.


You're absolutely right, I was mistaken.

Now I'm trying to remember if out SuperNode had T4s or T8s. I might have to dig out my notes.


Please do. I am fascinated by computer history and finding info from people who worked with them is always welcome.

What books do you have? I might be interested in buying them from you :)



I did my Physics degree thesis on computing with Transputers.

As part of that I got 3 Transputers (T425 I think) and I designed and built a board to house them plus an interface to the BBC micro computer which acted as the host.

We didn't have the budget for any of the software tools (like the Occam compiler) so I the first thing I had to do was write an assembler.

Transputer instructions are all variable length, with parameters being encoded as variable integers. So if you wanted to jump a few bytes away it would be a 1 byte instruction but you'd need a 2 byte instruction to jump further away then a 3 byte instruction etc.

This meant that the assembler needed lots of passes to work as it wasn't clear how many bytes each jump instruction would take. In fact it could take up to 7 passes until the program converged into a stable state with each jump the shortest length.

I wrote the assembler in BBC BASIC and due to the fact the BBC Micro had very little memory it worked on (floppy) disk rather than in memory. Floppy disk! This made it probably the slowest assembler in the world - my largest programs would take 10 minutes or more of disk chunking before they could be assembled.

Once I'd written the assembler I used it to write a boot loader which configured the Transputers and loaded itself around the whole network of Transputers. Transputers could boot each other so I only needed to write the boot loader to the one the BBC micro was connected to and it would run and copy itself to the other Transputers it found, figuring out the topology of the network as it went.

Once the boot loader was loaded then I could run the application. I did two, one a Mandelbrot set generator (of course!) which displayed the images on the BBC micro, and the other a physics simulation of an Ising spin model (which was kind of the point of all of this). I did the simulation on the university's mainframe in fortran too and the Transputer (cluster I guess you'd call it now-a-days) was faster!

The Transputer was an amazing chip and so ahead of its time. I often imagine an alternate future where we have 1000s of small single core chips with their own memory rather than current SMP model. It would certainly be harder to program but I bet it would fly!


> This made it probably the slowest assembler in the world - my largest programs would take 10 minutes or more of disk chunking before they could be assembled.

I can beat that.

In the early 80s I was designing and programming 6502 based controllers for UPSs. The code filled an 8 kByte EPROM and took 45 minutes to assemble on an Apple II.


> I often imagine an alternate future where we have 1000s of small single core chips with their own memory rather than current SMP model.

it's not individual chips but that's basically what GPUs or systolic processor arrays (adapteva epiphany) are like, in net effect. every core gets some memory/scratchpad to work, and there is a giant shared memory.


here's the paper on the adapteva epiphany btw, never thought about it in those exact terms but it actually does seem like a bit of a spiritual successor to the transputer in some ways:

https://www.parallella.org/docs/e5_1024core_soc.pdf

https://en.wikipedia.org/wiki/Zero_ASIC

The Parallela boards allow access to a small array (16 cores) on a raspberry pi-style formfactor, although it's been a number of years so it's probably fairly far behind (eg) things like NVIDIA Orin


Transputer (and Occam) designer David May founded XMOS in 2005 where the ideas of the Transputer are still alive in the the xCore Architecture they make and sell. Their SDK offers offers a programming languages called "XC" which contains Occam features in a C skin:

https://handwiki.org/wiki/XC_(programming_language)


Feels like bare metal MPI.


I spent the summer of 1989 working on an Meiko Computing Surface which was a parallel computer built from T800 Transputers. I learnt Occam and implemented a few parallel algorithms on it. As I recall, the code running on the Transputers did not have access to the file system so getting data into and out of the mesh networks was difficult. It appeared at first glance that back propagation might be a good fit on the Transputer but I never did get it running. Instead, I used a Connection Machine CM2 to train my neural networks. However, getting time on the CM2 was hard so I wound up just using a collection of IBM RS6000 workstations that IBM donated to my university.


The University of Kent worked on occam-pi language which extended the language occam used by the Transputer.

http://occam-pi.org/

I partially attribute this language to my deep interest in parallelism (async, coroutines) as a hobby because I studied occam-pi at university.

Imagine a language that every program written for it is parallel and linearly scalable by default because of the construction of the language and the problem. That's my dream.


The main innovation of Occam, i.e. the only one of its features that did not exist in earlier programming languages, was what in Occam (1985) was named "replicated parallel".

This Occam "replicated parallel" was based on the paper "Communicating Sequential Processes" by C.A.R. Hoare (1978-08), where it was named "array of processes". Hoare was an important contributor to the definition of Occam.

The Occam "replicated parallel" (1985) is the same as the "PARALLEL DO" of OpenMP Fortran (1997-10) and the "parallel for" of OpenMP C and C++ (1998-10), i.e. the concurrent execution of many threads that share the same code.

This structure is essential for being able to use CPUs or GPUs with many cores, because if a programmer would have to write one thousand different function definitions, or even just one thousand different function invocations, in order to fully occupy a processor able to execute simultaneously one thousand threads, that would be hopeless.

That is why "parallel for" is the main structure used by CUDA, OpenCL and by the graphics shader languages, even if for some weird reason all these GPU-oriented programming languages use an obfuscated terminology, where many things have different names than in the rest of the computer-related literature, without any rational justification, so they do not say "parallel for", but "kernels" or the like.


As part of my Physics with Computing degree at UKC (late 80s-early 90s) we learned Occam (pen & paper only!) and were allowed to see (but not touch) the Meiko Computing Surface drawing fractal landscapes in real time.


Did you use the pi calculus derived parts of the language at all? I only ever stuck to the classic CSP occam 2.


Hi everyone, author here. Just to note that this is just Part 1 of a multi-part series.

I’d love to cover some of the real world applications of Transputers in Part 2 so please share if you think of interest.


My final year project at University was building a real time simulation of the human peripheral hearing system on an array of T800s, written in Occam. The work continued without me as a PhD project.

My supervisors published a paper about it in Microprocessors and Microsystems https://www.sciencedirect.com/science/article/abs/pii/014193...

What I had in mind at the time (once Moore's law reduced the size and power demand) was customisable hearing aids - the sort of thing that can now be done with AirPods Pro.

For a certain generation of system engineers it seems like early exposure to parallel programming on Transputers helped them get a leg up in distributed system design, especially once multi core systems came along.


My internship at Daimler-Benz used Inmos transputers in the vision system and control system for a completely autonomous vehicle project.

I've written a few comments about it before on HN:

https://news.ycombinator.com/item?id=8950421

https://news.ycombinator.com/item?id=10333126

https://news.ycombinator.com/item?id=18759446

https://driving.ca/auto-news/news/three-decades-ago-mercedes...


Did you work with Ernst Dickmanns or his students on self-driving cars @ Daimler, they were (arguably) the first?


Yes. My direct boss was Andreas Kuehnle. Ernst Dickmanns was my boss's boss (or maybe my boss's boss's boss). I met him during one of our vehicle demonstrations and presented my final intern project review to a panel that included him.

He may have also been the higher-up boss that our system threw to the ground during a false-positive emergency braking event as described here: https://news.ycombinator.com/item?id=10333126


That's great - thanks for sharing!


This comment might be interesting: https://news.ycombinator.com/item?id=9808159


That’s really interesting- thanks so much.


Now that's a name I have not heard in a long time! I saw the Transputer at a trade show (Comdex, perhaps) circa 1986. I was very intrigued by it and the Occam language. It disappeared from the public eye but I always wondered what happened to it.


While the Transputer had very limited success, all modern server CPUs have inherited its method of structuring the interfaces of a CPU.

At that time (around 1984), and also during the next quarter of a century, almost all CPUs had interfaces based on an universal bus, possibly shared with other processors in a multiprocessor system, like the Intel CPUs until as late as 2008.

On the other hand the Transputer had interfaces partitioned into a fast communication link with other processors (like now the inter-socket links), a local memory interface (like now DDR5) and a local peripheral interface (like now PCIe).

After the disappearance of Transputers, this system architecture has been revived in some late DEC Alpha CPUs, then AMD has launched Opteron in 2003, with the help of some members of the DEC Alpha team, and eventually, 6 years later, Intel has followed the AMD model with Nehalem.

Now nobody makes server CPUs that do not have Transputer-like intefaces.


Very interesting, thanks. Thinking Machines was much in the news back then and that idea of multiple relatively simple but highly interconnected CPUs was "in the air."

I had gone to the trade show to look at computers for the research group where I was a student - physical science, not computer science. We ultimately bought something more conventional which was obsolete in about 18 months when DEC workstations started appearing. It wasn't until I was firmly midcareer that I realized our (actually my) limitations had to do with lack of imagination and not any intrinsic limitations of the PC/XT/AT machines of our time.


Some of the work was reused in the ST20 for embedded applications.


You can build a bridge from the transputer to Linn,the golden ear dream HiFi company. The crazy letter K obsessed linn CPU was even more wierd and aligned to Inmos. Apparently the prototype wound up chucked into a canal in a fit of rage.


I guess some people who worked on the transputer later went on to design Graphcore's IPU? The architecture looks similar (and Bristol based)


Like others here I met the Transputer at university (with Occam) and was amazed. Occam is not that much of an abstraction over the Transputer as it does quite a lot in microcode, including not really having an execution stack but a tree formed of workspace pointers. This means the processor itself keeps track of multiple tasks so when one is blocked (by also microcoded message channel i/o) the next one in the list is jumped to. That fits very well with occam which prefers procedures to communicate via channels rather than returning values.

Those message channels could be local to that Transputer or via a link to another one. From the code point of view it looked the same and the difference was handled again in microcode.

The shocking thing is how simple this all seemed to be with the exception of the implementation of ALT (the equivalent to select in golang, waiting on multiple channels).


Now there is https://www.xmos.ai/

It's a fascinating architecture - the development board and toolchain are relatively affordable.


I had a uni project in 1990 where we mapped old fashioned neural networks to a 16 pcs Transputer computer. That was fun.


transputers seem really neat but hardware is getting hard to find, has anyone played with this emulator?

https://github.com/pahihu/t4


I remember reading about it in the German c't magazine back then and wanting one, but then never heard of it again.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: