We’re on the vendor display floor walking around, talking to folks, watching demos. A guy has a board in an expansion board in his desktop computer. The cover is off and he’s pointing out the 2 transputer chips on it. Meanwhile the display screen is filling out a familiar drawing of a fractal. I ask how long it took them to render the drawing. “Oh, no, it’s doing that work now.” A cute little board readily outstripping our refrigerator sized machine back home.
I ended up talking with David May. The thing he seemed most proud of at the moment was their floating point work. They had written a description of their new floating point unit in Occam. They had tested and debugged it. They had formally proved the implementation of a standard (IEEE 754?). And then they had built the silicon.
I returned home with a set of inmos manuals. Never could get anyone back at uni to see the potential. Sigh. Still, at the moment I knew I was truly in the presence of the state of the art.
There seems to be a pretty active forum here: https://www.xcore.com
The XMOS processors share many of the same architectural properties as Transputers.
And you can include reglar h-files to easily link to and build regular C. With provided macros you can have h-files which are using some XC features but still be compatible with both XC and C files (making it easy to wrap XC code and call it from C and vice versa).
There is also this that can use XC features from C. https://github.com/xmos/lib_xcore_c
But I don't find XC to be bad, and likely a much better starting point than the library above.
The compiler is a fork from gcc from 2006 (if I remember correctly) and that shows its age somewhat. The many protections in XC can be a bit tedious as well.
Of course this is just another of my unfinished hobby projects, so :-)
Is your codebase uncompilable using their tools? Would a simple hello world compile? How foreign does a simple blinky program look to a C programmer?
So sure, they weren't true dataflow processors right off the bat but you could make them look like a pretty good simulation of one on the outside, even if you didn't have access to a crossbar switch (which would be cheating) and absent the real thing that was as close as I was going to get at the time.
I'll update the link soon, as I've added several new features to the language.
The comment you directly replied to, yes, but it was itself responding to a comment saying "they are the closest to a dataflow... processor" which seems to be referring to dataflow architecture?
Is PS3 with several Cells somehow reminiscent of a transputer?
Fun times :)
In hindsight it is a much better way to encode signs for both integers and floating points.
For this result to be true in practice evidence that ternary hardware can be manufactured and operated at the scale of binary hardware, e.g clock speed, error rates and power draw. Everything I have heard prior pointed to the conclusion that binary is more efficient when accounting for the physical properties of the substrate.
I'm sceptical of claims in the realm of physics on the basis of pure math without empirical confirmation.
Is there any modern research into manufacturing ternary chips?
Bitwise operations correspond to the finite field GF(2), while there is a GF(3) it's not nearly as interesting or useful.
Balanced ternary (just ternary no) has the advantage that the sign is trivially included in the number, so there is no consideration of (1/2 complement or signed/unsigned extension)
This is partially a lie, as balanced ternary can be read as normal ternary giving only non-negative integers... Still it is a more natural encoding of arithmetic
Why is that useful?
> not nearly as interesting or useful.
Why it that important in computing?
/ \ / \
0 1 0 (1)
= 011 = 3 (in decimal)
b) A 6-digit numeral roughly corresponds to a tree of length 6.
c) Base_10 corresponds to a tree with 10 possible children for each node.
d) e is the most efficient multiplier when trying to achieve compound growth in the fewest iterations of multiplication.
> Also, why does Euler's constant appear all over the place?
e is special because e^x is its own derivative. It also acts as a "bridge" between addition and multiplication. It often appears where growth or trees are involved.
Another comment mentioned "radix economy", which, together with your description helped me understand (generally) why Euler's constant is the most efficient base in terms of number of digits needed to express numbers.
What do you mean by efficient?
Any other literature you could reference? Cause this sounds interesting and I'd like to do some more research into this.
By derivatives, that's minimized when B satisfies
0 = (log N) (1 /
(log B) + B(-1/((log B)^2)(1/B)
0 = 1/(log B) - 1/(log B)^2
1 = log B
Base of the logarithm = B.
This looks like you can pick any logarithmic base b
you want, and so
any B you want, but in fact the derivative I wrote assumes e is the base (hence the term "natural" logarithm.
Other bases b would yield a scaling factor of ... e/b,
since d(e^x)/dx = e^x
and b^x = e^((ln b) x)
You can chase "deeper" reasons for this all day long by digging deeper into the the many definitions/properties of e and proving they are equivalent.
Playing with e is the most fun you can have in pre/calculus.
The intuition is that (e) is the optimal water-level when limited water is distributed among a variable amount of buckets where all the filled buckets multiply each other. Alternatively, it's like how volume() is maximized where
volume(x, y, z) = (x y z)
const = (x + y + z)
That is just a natural consequence.
Binary has so many useful properties that are relatively intuitive to understand. Binary is a uniquely convenient number system. Even day to day, base-two number systems are hyper-convenient.
One thing I can think of that balanced ternary has interesting properties for is 2D space partitioning. Balanced ternary partitioning is an interesting tool; then again, you could just use modulo binary space partitioning (i.e. wrap the space around by half the width of a binary partition), it's almost as good. I wonder what the balanced ternary equivalent of an octree would be, I guess it would be a 27-tree, kinda cool, like the 3² tree.
> used in the Soviet Union until 1970
So was a planned economy, and they still haven't recovered. The KGB sure have staying power though, must be the balanced ternary advantage! ;- )
It's not like balanced ternary has no positives, but you have to acknowledge that well before even decent relay computers were invented, binary was already much better explored than balanced ternary; obviously relay computers helped solidify that (though bi-quinary did not survive, popular as it was since Colossus).
Binary has many advantages, but more from a logical perspective than a numerical one.
I don't think balanced ternary computers are ever coming back. I think it is a nice lost technology :)
(And if I ever will find myself in the condition to start a civilization from nothing I will consider a balanced ternary/5/7 number system)
Signed arithmetic is elegant enough in binary, I guarantee you will fail to express even basic arithmetic concepts in balanced ternary to any lay person.
As for experts implementing them in a system, there are advantages to balanced forms, but good luck finding enough experts. The benefits probably start kicking in for very large integers, where the carry itself can take longer than you might like.
What specifically becomes more efficient about computation?
Very curious, I never heard of this before.
The Atari Transputer Workstation was a technological oddball of which there were quite a few around that time.
see also: https://en.wikipedia.org/wiki/Connection_Machine
The 80's were a wonderful time for computer hardware...
Seem to remember there was once a dedicated page with pictures, could be mistaken, but if there was it's been deleted. Such is Wikipedia "progress".
I remember the Tramiels being a little flabbergasted by the Transputer announcement as well.
I still have the Helios reference manuals (OS for the ATW).
Now that Intel can no longe snuff out competition by being one fab-generation ahead, we might have space for more interesting architectures.
2018 (a bit): https://news.ycombinator.com/item?id=18576037
Same Wikipedia page from 2018: https://news.ycombinator.com/item?id=16190102
We are starting to see languages that could leverage such an architecture:
Is there an algebraic logic strongly associated with what we conventionally think of as types (see https://cuelang.org/docs/concepts/logic/) which we can effectively and efficiently compute unions and Boolean satisfiability?
He also made a loader that could copy a program to a whole network of transputer chips in seconds, instead of hours like theirs.
They hated his C compiler, and hated him for making it, and did their best to keep anybody from knowing about it. That was easier to do, before www.
But for 90% of the transputer group at the time it was very obvious that we needed alien language support. We had to spin up a compiler group, which took time. I seem to remember we also OEM'ed some compilers from outside vendors before the internal ones were ready (memory hazy and I didn't work directly on software there). I had direct contact with many customers, specifically discussing C language support and I don't at all remember anything like what you describe where 3rd party products were deliberately not mentioned.
Of course today we have golang which has much the same features as Occam and everybody thinks it is the best thing since sliced bread. And we have folks seriously re-writing dusty decks in Rust. But back then you just couldn't seriously sell a CPU that forced you to program in an unknown, fairly basic, programming language.
Also, internally we still used a lot of BCPL. So the company itself didn't eat its own dogfood in the way it expected customers to.
For a time around 1986-1987 I had several Transputer boards in a robot lab I worked in at Princeton University. I liked the INMOS sale pitch that someday you could have a transputer at every robot joint. Something similar is relatively easily accomplished now using various serial connections between microcontrollers -- even if it might still be easier to just run cables to the joints from boards elsewhere. That combination of Transputer boards (including several loaner boards from another lab on campus which has got them related to NSA-funded signal processing stuff) for a time gave me the most powerful computer on the Princeton campus in terms of total memory and CPU cycles -- even if the IBM 308X mainframe in the computer center no doubt had more total I/O capacity.
One other thing I liked about transputers was the (for the time) high speed serial links between them which could also go to special chips to read or write parallel data. I used such chips to interface a network of transputers to the teach pendant of an Hitachi A4 industrial robot so they could essentially press the buttons on the pendant to drive the robot around. Interesting how modern USB in a way is essentially just a serial transputer link. A PU undergrad student I helped used that setup for a project to use sensing whiskers to "feel" where objects were (inspired by a robotics magazine article I had seen by a group in Australia who had done that for a different robot).
While transputers not supporting C was an issue early on, another issue was just that Occam did not easily support recursion. Transputer at the start also did not have standard libraries of code then like C had from Unix. And also transputers were expensive (as was the software development kit) and also they were not easy to get in the USA with various supply chain issues and delays.
Occam was mind-expanding in its own way. I had previously networked a couple of Commodore VICs and a Commodore 64 for a robot project (via modems to an IBM mainframe) for my undergraduate senior project at Princeton a couple of years earlier using assembler, BASIC, and C -- so the transputers with Occam were a big step up from that conceptually. Still, practically I had gotten the Commodore equipment plus a parallel code simulator (a VM in C) I wrote for the IBM mainframe running under VMUTS to do more interesting stuff -- perhaps because I had years of experience programming in those other languages by then whereas Occam was so different and I did not spend that much time with it before leaving that job. Even without Occam and transputers, that robot lab job was the best paying job working for someone else I ever would have in many ways as far as autonomy, mastery, and purpose -- but I did not appreciate it then as it was my first real job beyond college and I figured the grass would be greener somewhere else -- not realizing that grass tends to be greener where you water it. Thanks for creating such a great working situation for me, Alain!
I have a vague sense that there's a rich vein of potential projects and maybe businesses in "before their time" ideas in the world of computing.
Thanks a lot for the pointer. This may spark funny and uncanny world-domination ideas! ( :
A serious hint/note for the "futurologists" / innovation-driven folks out there — my people: look for what was true back then but is no longer, especially in the form of limitations or roadblocks or axiom (hard limit) to a design. Remove that piece and see what you get...
On-chip PLL clock generator
Automatic power on reset
On-chip programmable DRAM controller
On-chip I/O DMA
Concurrency and parallelism in programming languages
«It means that no more than three of the SPE's can talk to the memory controller at one time using full capacity. Four SPE's will fill the bus, and the CPU controller will not be able to access the memory at all.
What you can do is have all the SPE's work at the same time, but using
nowhere near the capacity they each have. Basically the PS3 gets punished
for performing well, and tweaking cell for better results in the future
seems to be a nightmare if not impossible.
Charlie Demerjian did a good exposé on this for the Inquirer, but the page has been taken down. Here is a blog post that references it, though:
The memory read speeds for the SPEs (Synergistic Processing Elements, or "APUs" these days) was horrible. It was about 3 orders of magnitude slower than it was meant to be.
This is why the chip basically died. Memory read speeds for the SPUs were impossibly slow, meaning that they were crippled into uselessness, AIUI.
Honestly I think we're more held up by programming languages that make concurrent programming hard than by the underlying architecture. It's not that hard to compile an existing C++ or Java codebase to a new ISA. It's hard to rewrite it in a language that scales well to thousands of threads. And that language doesn't even exist. The closest we have is whatever Nvidia is calling C++ on CUDA and that sucks.
I just have no idea "how good it can get", compared to what we have now. Is there an alternative world where CPUs vastly outperform ours in performance/power using the same materials? But more importantly how much can "vastly" be?
And, by a lot.
A huge portion of the power a CPU consumes is simply due to the clock running which is why modern CPUs go through tremendous efforts to disable circuits not in use and lower the clock speed when the CPU is idle.
However, imagine if we didn't have a clock at all. If, instead, the only thing that caused CPU transistors to switch was new instructions being executed.
In that world, the only time CPUs would consume any power is gate leaks and when the CPU is actually doing stuff. That would translate in extremely low power draw.
Performance would also be way up. Data would mainly be constrained by switching speed, which can be super fast. Today, data speed is mostly constrained by how big the largest section of the pipeline is and how fast the clock moves.
So why don't we do this today? Mostly because the entire industry is built around synchronous (with a clock) CPU design. Switching to an async design would be both super difficult (lack of tools to do so) and very different from the way things work now. Just like parallel programming is very hard, async circuit design requires a large amount of verification that we just lack. Further, HDLs are simply not well built.. but also not well built for async.
Async circuitry usually requires more transistors and more lines. That, however, isn't really a problem anymore. Today, the vast majority of transistors in a modern CPU are spent not on logic, but on the cache.
It'd be super expensive to adopt. It would be totally worth it. But I doubt we'll see it happen until AMD and Intel both completely stop at advancements.
It's not a silver bullet. It gives you maybe 30% less power? Gate leakage has been creeping upwards too, since there's a direct speed/leakage tradeoff.
There's been a few advances that have limited gate leakage (primarily finfets that I'm aware of). But it is still there.
I agree though, not a silver bullet. It would buy one generation of power gains and performance.
Marketing wouldn't like it either because clock speed has so often been used to sell CPUs.
I'm also not sure how much modern CPUs can incrementally add async rather than having a groud up redesign and if that would get them close to the same power gains. Already, modern CPUs have impressive latencies for most instructions.
Real gains, though are somewhat unkowable. If I were to guess, the first place we'll see an async CPU will be mobile. After that happens, we should have a much clearer picture of the real gains it grants.
There was apparently (I didn't see it) a neat demo where you could run a live benchmark and spray freeze spray on the processor, which would cause it to speed up, since the propagation delay was inherently dependent on gate temperature.
I don't expect to see it commercialised any time soon. Too much retraining and retooling required.
That's impressively low power!
I wasn't aware that AMULET was a thing. Neat to see that someone put the effort into making an async CPU.
I had heard from one of my professors that Intel tried the same thing with a Pentium 1 and ultimately gave up due to poor tooling. (I don't know the exact timeframe of this, but I believe it was around the P2 or P3.)
.. gives me 0.34uA for a 32 bit cortex m4.
(If you're worried about side-channel attacks, you definitely don't want asynchronous technology as it's going to leak data-dependent timing information!)
Positive note: it could certainly help take silicon-based electronics further in a resource-starved world; it could/should also simply be part of the paradigm of the next thing if it comes soon enough — photo-hype, buzz-ristors, whatever tech wins.
(Thanks for an uplifting glimpse at the "TO DO" list of humanity, and one more spark of interest as a programmer!)
What we have done to date is the low-hanging fruit: take known categories of application and juice them up by applying CPU power. And for that, the amount of power you need is "enough to have an effect". A lion's share of the benefits of putting computers in the loop were realized in the 70's or 80's, even though the measured amount of power that could be applied then was relatively miniscule. There were plenty of alternatives at various times that were "better" or "worse" in some technical sense, but succeeded or failed based on other market factors. The real story going forward from then has been changes in I/O interfaces, connectivity and portability - computing has merely "kept up with" the demands imposed by having higher definition display, networking, etc.
So then we have to ask, what are the remaining apps? If we can engineer a new piece of hardware that addresses those, it'll see some adoption. And that's the thrust of going parallel: It'll potentially hit more categories at once than trying to do custom fast silicon for single tasks.
But the long-term trend is likely to be one of consolidating the bottom end of the market with better domain solutions. These solutions can be software-first, hardware-later, because the software now has the breathing room to define the solution space and work with the problem domain abstractly, instead of being totally beholden to "worse-is-better" market forces.
That last paragraph confirms my own vision for the next cycle, the next decade or so.
About "remaining apps": the current cycle of ML, post-parallism (GPUs, DL, etc. since early 2010s) is imho one such candidate for transformative computing, where like vapor or hydrocarbons or electricity, computing lends itself to enabling a whole new category of 'machines', of tools. That's really interesting (just not the end-all be-all some seem to think, but a whole new solution space to build upon).
I guess robotics is a fitting candidate as well (insofar as interacting with real physical objects changes everything) but we are many years away from commercially-viable solutions, afaik.
I also like to think there are (potentially transformative) social or behavioral use of compute-enabled machines that we haven't scratched much yet. Areas of life/civilization generally too complex to be brute-forced or even fully modeled, like biology and health, or the more advanced social behaviors (topics best described by Shakespeare, Tocqueville, Stephen Covey, Robert Greene...); or things we 'just' need to brute-force e.g. life-like graphics/VR or seamless/ambient/'augmented' computing/reality. Some of these may be in for the taking during the next cycle or two.
The lower energy limit is https://en.m.wikipedia.org/wiki/Landauer%27s_principle but we're a long way from that.
Is that where Zen's architecture, with memory controller central to the chip (close to memory IO up/down) and 'chiplets' containing cores left and right, makes so much sense?
I'm partial to announcements of "3D" stacking chips; wherein memory cells, controller and CPU cores are arranged like cake layers, thus with extremely short memory round trips. A clever design could entirely remove the need for L3 cache I suppose, make RAM behave essentially as such. It's elegant I think, somewhat balanced in design, essential. And I find it crazy and awesome that cooling those is even possible.
The old mantra "without software no one will buy your hardware" is no longer generally true. There are lots of programmers now days, and as long as your new chip can show improvement in one area, someone may use it. We've even had chips which execute Java bytecode directly!
“Worse is better” applies on all levels: microarchitectures, ISAs, operating systems, applications, languages, computing paradigms, etc. Once Moore’s law is truly over, we will be forced to make some real progress in areas other than shrinking silicon features.
Tangential remark, however, I hope there's room for post-silicon / -electronics technology. We are far from some of the more advanced use-cases of computing (currently "sci-fi" but hard/real science, just hard problems in technology as well), in ways that no cleverness nor cheat (e.g. qbits) could substitute for actual processing capability. I don't know how many orders of magnitudes silicon still has to go, but my impression is that it's relatively limited physically, compared to what could be. But that's a topic for the 2030's and later, probably, hopefully. Here's to the quest for Landauer's limit!
>“Moore’s ‘law’ came to an end over 20 years ago,” says David May, professor of Computer Science at Bristol University and lead architect of the influential "transputer" chip. “Only the massive growth of the PC market then the smartphone market made it possible to invest enough to sustain it.
>“There’s now an opportunity for new approaches both to software and to processor architecture. And there are plenty of ideas – some of them have been waiting for 25 years. This presents a great opportunity for innovators – and for investors.”
What is going on with them? Curious about how they'll turn out. Hopefully they'll be released one day.
"Good enough for real world existing programs" and "best possible for a specific problem" produce very different designs.
FWIW I can't stress enough that this is just a question / thought experiment, whose point is probably to explain why we could do better but don't, and with good reason (boiling down to "not worth it, at least yet, possibly ever" I suppose, but the details are the meaty part for me).
Except with multiple chips, not many cores like GreenArrays.
Similar ideas seem to pop up often and not really catch on too much.