
Revisiting the Intel 432 (2008) - jsnell
http://dtrace.org/blogs/bmc/2008/07/18/revisiting-the-intel-432/
======
ChuckMcM
The Colwell paper really is excellent. And given feature sizes of chips today
it would be fascinating to see a 432 implemented as envisioned, rather than as
possible given transistor counts of the day. It was going to be the
microprocessor version of the MULTICs system and much of what it imagined
doing in hardware (capabilities) would make for secure environments that you
could reason about more effectively. Probably make for a great FPGA project
now.

~~~
willvarfar
CHERI.
[http://www.cl.cam.ac.uk/research/security/ctsrd/cheri/](http://www.cl.cam.ac.uk/research/security/ctsrd/cheri/)

~~~
ChuckMcM
Thank you! Those are awesome. Downloaded all the papers to my iPad for
perusal.

~~~
willvarfar
This is the newest (and my favourite) paper
[http://www.csl.sri.com/users/neumann/2015oak.pdf](http://www.csl.sri.com/users/neumann/2015oak.pdf)

Another chip that is better supporting privilege separation (but not using
Capability-based addressing) is the Mill. (disclosure: I'm on the Mill team).

------
SixSigma
Since learning of it, I have always been disappointed that I was not able to
write and run software for a Burroughs machine running the MCP

[http://en.wikipedia.org/wiki/Burroughs_MCP](http://en.wikipedia.org/wiki/Burroughs_MCP)

"the first operating system to manage multiple processors, the first
commercial implementation of virtual memory, and the first OS written
exclusively in a high-level language."

using the Work Flow Language

[http://en.wikipedia.org/wiki/Work_Flow_Language](http://en.wikipedia.org/wiki/Work_Flow_Language)

Like the success the 432 could have been I feel that the Burroughs
architecture was prematurely abandoned and at modern speeds would have plenty
to offer.

~~~
skissane
Burroughs MCP still exists, so I am not sure how true your statement that it
was "prematurely abandoned" is. Burroughs became (through a series of M&As)
Unisys, and Unisys still supports MCP and is updating it with new versions. It
runs on Unisys Clearpath mainframes. Unisys has moved away from their
distinctive physical hardware to software emulation on an x86 platform. x86
has improved so dramatically, that even given the emulation overhead, it still
is faster than the old physical mainframe CPUs.

There is also an emulator which runs an old (1970s) version of MCP, not the
current (2010s) version -
[http://www.phkimpel.us/B5500/](http://www.phkimpel.us/B5500/)

~~~
SixSigma
Thank you, your response was partly what I was hoping to get by saying it.

------
pavlov
This post could have its publishing year 2008 included in the HN title.

Seven years later, we have some perspective on the author's prediction:

"Indeed, like an apparition from beyond the grave, the Intel 432 story should
serve as a chilling warning to those working on transactional memory today."

Intel's transactional memory implementation TSX was famously broken in its
initial incarnations in Haswell/Broadwell. [1]

[1]
[http://en.wikipedia.org/wiki/Transactional_Synchronization_E...](http://en.wikipedia.org/wiki/Transactional_Synchronization_Extensions)

~~~
bcantrill
And a few months after that, the gloves really came off.[1]

[1] [http://dtrace.org/blogs/bmc/2008/11/03/concurrencys-
shysters...](http://dtrace.org/blogs/bmc/2008/11/03/concurrencys-shysters/)

~~~
acqq
Thanks!

The link to the article in ACM Queue (October 24, 2008) which is a deadlink on
that page at the noment is now:

[http://queue.acm.org/detail.cfm?id=1454462](http://queue.acm.org/detail.cfm?id=1454462)

"Real-world Concurrency, Bryan Cantrill and Jeff Bonwick, Sun Microsystems"

It's sad that ACM is not able to redirect their old links properly.

The other article mentioned is probably:

[http://queue.acm.org/detail.cfm?id=1454466](http://queue.acm.org/detail.cfm?id=1454466)

"Software Transactional Memory: Why is it only a Research Toy?"

I'm actually searching for the analysis of the Intel's implementation.

~~~
bcantrill
Argh -- my apologies for the dead links! I have updated all of them, with my
apologies again for the apparent inability of the ACM to honor old links.

------
EdSharkey
Are there any parallels worth observing between the Intel 432 history and the
Itanium history?

I was always enthusiastic about Itanium when it was announced and sad when it
didn't unseat the x86.

~~~
kps
I think of Itanium as the i860¹ redux. Andy Glew et al had some interesting
discussion on comp.sys.arch² about MPX³ as an emasculated descendant of a
capability system.

¹
[http://en.wikipedia.org/wiki/Intel_i860](http://en.wikipedia.org/wiki/Intel_i860)

² which this www is too small to contain

³
[http://en.wikipedia.org/wiki/Intel_MPX](http://en.wikipedia.org/wiki/Intel_MPX)

------
scott_s
I want to call this text out, because I nearly did a spit-take when I saw it,
and someone skimming the post may miss it:

 _The mortally wounded features included a data cache (!), an instruction
cache (!!) and registers (!!!). Yes, you read correctly: this machine had no
data cache, no instruction cache and no registers — it was exclusively memory-
memory._

No. Registers. I would love to know what discussions they had, and what
arguments were made, to come to that decision. That point alone has made me
re-open the paper to take a closer look.

~~~
spitfire
It was a stack machine. No user visible registers.

One of the neat things about stack machines is that you can quietly add
registers to a machine in the background without having to recompile code.

I think this would have been a huge advantage over time. Being able to add
registers and performance without having to recompile code. It took us decades
to add a few new registers to the X86 architecture.

Just imagine if every single (286, 386, 486, pentium, pII, etc) generation had
more registers and they were automatically used by software.

Pretty neat imho.

IIRC it was also a tagged architecture as well. So they could define a generic
instruction set and fallback to software implementation for many ops, adding
hardware implementations at leisure.

Executive summary: They could have done it right first, then made it fast.
Rather than make it fast, then try and clean up the technical debt later.

~~~
yuriks
>Just imagine if every single (286, 386, 486, pentium, pII, etc) generation
had more registers and they were automatically used by software.

This is, in fact, exactly what modern processors do. They have upwards of a
hundred registers internally.

~~~
twic
Although because the x86 only has a handful of architectural registers, it has
to spend quite a bit of area and heat working out how to make use of them all.

The SPARC's register window design was rather more elegant - provide 520
architectural registers, arranged in a stack, and shuffle between physical
registers and memory as needed:

[http://ieng9.ucsd.edu/~cs30x/sparcstack.html](http://ieng9.ucsd.edu/~cs30x/sparcstack.html)

Unfortunately, it seems it didn't actually work very well!

~~~
angersock
I had a professor in college bag on those, but he never really explained why.
Does anybody know?

~~~
FullyFunctional
Oh where to start.

* Primarily, compiler technology leap frogged it to where you can do at _least_ as well with a fixed set of registers and a global allocator.

* The windows were inspired by SPURS (IIRC), where it allowed a much finer granularity whereas SPARC's window is always exactly 16 registers (8 are global, and 8 overlap with the next or previous window).

* Windows turned out to be a real PITA for super scalar implementation.

* Windows assume a constrained model of computation and makes efficient tail recursion hard and co-routines impossible.

etc etc

Give me more time and I could make the list longer, but the crux is that it's
another example of a misguided shorted sighted optimization (like branch delay
slots, shared with many RISCs).

~~~
twic
I really liked the idea of branch delay slots too :(.

~~~
FullyFunctional
I can promise you wouldn't once you've tried going beyond the simplest
possible single issue pipeline. Thankfully branch prediction made them sort of
pointless. RISC-V and Alpha are two of the better RISCs ISAs in this world and
they don't have them. Read the RISC-V ISA footnotes [1] for excellent design
decisions rationales.

[1]
[http://riscv.org/download.html#tab_isaspec](http://riscv.org/download.html#tab_isaspec)

------
orionblastar
[http://en.wikipedia.org/wiki/Intel_iAPX_432](http://en.wikipedia.org/wiki/Intel_iAPX_432)

Intel's first 32 bit processor. Tried to get away from the 8008 and 8080 and
be like the 8800. Was more like a micromainframe in design. It was a stack
machine with no registers.

------
zackmorris
_" every function was implemented in its own environment, meaning that every
function was in its own context, and that every function call was therefore a
context switch!. As Colwell explains, this software failing was the greatest
single inhibitor to performance, costing some 25-35 percent on the benchmarks
that he examined."_

It's too bad because this is the future of computing. Or more precisely -
rather than having a "process" that owns child data and functions, future
processors will default to every function running in its own isolated
environment, defaulting to no shared state. There will be a more advanced
copy-on-write permissions model that determines what's shared (if anything),
more like shell processes or channels in languages like Go. To have that kind
of scalability (with respect to future goals like multiprocessing) and
security at only a 35% performance cost would certainly have been compelling,
especially in the 1970s.

It's too bad that they didn't understand the physical limitations that make
registers and caching basically a necessity, because it might have saved them.
These things are only optimizations, so with today's available chip area and
place and route techniques, they could be wedged into a conceptually correct
architecture without much trouble.

In short, I would take these findings with a grain of salt and consider how
technology has advanced to the point where yesterday's blunders might be
tomorrow's breakthroughs.

------
acqq
Reading about 432, I didn't expect:

"Instructions are bit variable covering a range from 6 up to and beyond 300
bits in length using Huffman encoding (I, 171)."

Huffman compressed instructions. Wow.

[http://www.brouhaha.com/~eric/retrocomputing/intel/iapx432/c...](http://www.brouhaha.com/~eric/retrocomputing/intel/iapx432/cs460/)

