
Which Machines Do Computer Architects Admire? (2013) - dhotson
https://people.cs.clemson.edu/~mark/admired_designs.html
======
xscott
I'm not a computer architect (so my opinion shouldn't count in this thread),
but as someone who did a lot of numerical programming over the years, I really
thought Itanium looked super promising. The idea that you can indicate a whole
ton of instructions can be run in parallel seemed really scalable for FFTs and
linear algebra. Instead of more cores, give me more ALUs. I know "most"
software doesn't have enough work between branches to fill up that kind of
pipeline, but machine learning and signal processing can certainly use long
branchless basic blocks if you can fit them in icache.

At the time, it seemed (to me at least) that it really only died because the
backwards compatibility mode was slow. (I think some of the current perception
of Itanium is revisionist history.) It's tough to say what it could've become
if AMD64 hadn't eaten it's lunch by running precompiled software better. It
would've been interesting if Intel and compiler writers could've kept focus on
it.

Nowdays, it's obvious GPUs are the winners for horsepower, and it's telling
that we're willing to use new languages and strategies to get that win.
However, GPU programming really feels like you're locked outside of the box -
you shuffle the data back and forth to it. I like to imagine a C-like language
(analogous to CUDA) that would pump a lot of instructions to the "Explicitly
Parallel" architecture.

Now we're all stuck with the AMD64 ISA for our compatibility processor, and it
seems like another example where the computing world isn't as good as it
should be.

~~~
jcranmer
Itanium is essentially a VLIW architecture and... well, as the bottom of the
page mentions, VLIW architectures tend to turn out to be bad ideas in
practice.

GPUs showed two things: one, you can relegate kernels to accelerators instead
of having to maximize performance in the CPU core; and two, you can convince
people to rewrite their code, if the gains are sufficiently compelling.

~~~
cpr
Specifically, the promise of VLIWs (scalar parallelism) was overhyped--the
Multiflow compiler couldn't find enough scalar parallelism in practice to keep
all 28 ALUs busy (in the widest model we built), or even all 7 ALUs (in the
narrowest).

(I ended up running the OS group at Multiflow before bailing right before they
hit the wall.)

~~~
zozbot234
We need new programming models that make it easier to expose static
parallelism to the compiler. Doing it all in plain old C/C++, or even in
"managed" VM-based languages, cannot possibly work - and even conventional
multi-threading is way too coarse-grained by comparison to what's most likely
needed. Something based on a dataflow-oriented description of the code would
probably work well, and be possible to integrate well enough with modern
functional-like paradigms.

~~~
Symmetry
VISC seems at least potentially promising.

[https://www.anandtech.com/show/10025/examining-soft-
machines...](https://www.anandtech.com/show/10025/examining-soft-machines-
architecture-visc-ipc/2)

~~~
cpr
Yes, but VLIW's premise was that all that coordination that the VISC
architecture is doing at runtime in hardware could be computed at compile-time
in software.

------
tachyonbeam
I think one of the most influential designs of recent times has been the DEC
Alpha lineage of 64-bit RISC processors[1]. Originally introduced in 1992,
with a superscalar design, branch prediction, instruction and data caches,
register renaming, speculative execution, etc. My understanding is that when
these came out, they were way ahead of any other CPU out there, both in terms
of innovative design and performance.

Looking at this chip, it seems to me that almost all the innovations Intel
brought to the Pentium lines of CPU over many years were basically
reimplementing features pioneered by the DEC Alpha, just over a decade later,
and bringing these innovations to consumer-grade CPUs.

[1]:
[https://en.wikipedia.org/wiki/DEC_Alpha](https://en.wikipedia.org/wiki/DEC_Alpha)

~~~
xscott
I loved working on DEC Alphas. They seemed to me like the best of breed
conventional 64 bit machines, and it was sad when we quit buying them because
x86 boxes were cheaper.

> it seems to me that almost all the innovations Intel brought to the Pentium
> lines of CPU over many years were basically reimplementing features
> pioneered by the DEC Alpha

I can't find a strong source to link, but I thought most of the Alpha team
ended up at Intel. If so, that would explain the trickling in of re-
implementations.

~~~
russler23
One of my CS professors used to work at DEC. His lectures were starkly clear
and pretty intense.

Jim Keller and other folks from PA Semi worked at DEC earlier in their
careers.

~~~
cbm-vic-20
I worked at DEC- one thing that I really miss is the high quality of
documentation they produced. A lot of it has been archived here:

[https://archive.org/details/bitsavers_dec](https://archive.org/details/bitsavers_dec)

------
kragen
There are some really great designers on the list, like Sophie Wilson and
Gordon Bell, but the list of admirable machines comes up really short — and
missing a lot of really significant and admirable machines.

Maybe these are the machines _bad_ computer architects, like Alpert, admire.
Alpert is notable mostly for leading the computer industry's most expensive
and embarrassing failure, the Itanic (formally known as the Itanium), despite
the presence on his team of many of the world's best CPU designers, who had
just come from designing the HP-PA --- a niche CPU architecture nevertheless
so successful that HP's workstation competitors, such as NeXT, started using
it. Earlier in his career he sunk the already-struggling 32000, the machine
that by rights _should_ have been the 68000. (And maybe if they'd funded GCC
it could have been.)

What about the Tera MTA, with its massive hardware multithreading and its
packet-switched RAM, which was gorgeous and prefigured significant features of
the GPU explosion?

What about the DG Nova, with its bitslice ALU chips and horizontal-microcode
instructions? What about the MuP21, with its radical on-chip dual circular
stacks?

What about the HP 9100, with its dual stacks and PCB-inductance microcode,
where the instruction set was the _user interface_?

What about the LGP-30, which managed to deliver a usable von Neumann computer
with only 113 vacuum tubes (for amplification, inversion, and sequencing)?

What about the 26-bit ARM, with its conditional execution on every
instruction, and packing the program status register into the program counter
so it automatically gets restored by subroutine return, and, more importantly,
interrupt return?

What about Thumb-2 with its unequaled code density?

What about the CM-1? Anyone can see that AVX-512 (or for that matter modern
timing-attack-resistant AES implementations!) owe everything to the CM-1.

And the conspicuous omission of the Burroughs 5000 has already been noted by
others.

I mean, there are some good designs on the list! But it hardly seems like a
very comprehensive list of admirable designs.

~~~
sitkack
It sounds like they just went around the room and asked some folks to list off
some systems. I don't think a terrible amount of thought was put into this.

I'd add the Tandem NonStop to my personal list. I don't know why I overlooked
the LGP-30 [1], I'll have to find a schematic. 113 vacuum tubes is really
impressive, I wonder if there is any overlap with this design and System Hyper
Pipelining [2]. Do you know of other architectures that use time multiplexing
to reduce part count?

What bit serial computers do you like?

Ahh, it is the Story of Mel computer, awesome.

[1]
[https://en.wikipedia.org/wiki/LGP-30](https://en.wikipedia.org/wiki/LGP-30)

[2] [https://arxiv.org/abs/1508.07139](https://arxiv.org/abs/1508.07139)

~~~
kragen
NonStop is super interesting! HP has most of the old Tandem papers and manuals
online still, I think, and you can see how the software and hardware co-
evolved. It's mind-boggling the extent to which they designed the operating
system around transactions; with things like TIP, the Transaction Internet
Protocol, they did try to get wider adoption for that approach, but it's
largely been forgotten. A shame, since we spend so much of our time debugging
highly-concurrent distributed systems these days.

Delay-line and drum computers (like the LGP-30, the HP 9100, and the grandmama
of them all, the Pilot ACE) all sort of had to do a sort of time multiplexing;
the Tera MTA I mentioned, as well as the CDC 6600's PPs (FEPs), worked that
way too, time-sharing a single ALU and control unit among many register sets.
That's also one of the things going in modern GPUs, but it's hard to say it's
to _reduce_ part count. Still, they'd need a lot more parts to do the same
thing if they didn't do it.

This CSR/SHP thing sounds really interesting! Thank you!

~~~
sitkack
I should provided a link to the Tandem tech reports [1], they make for great
reading, great for the Little Gray Cells. I do think hardware supported
distributed transactions would make many problems go away. Fusing what was
once modular has unlocked lots of gains (ZFS), with the rise of the
hypervisors and abstract VMs, we are getting there in baby steps. Modularity
incurs a cost that is much higher than most people realize, if we see
something that _could_ be modular, we usually take it, but those decisions
force future decisions we aren't aware of. I think someday we will view it in
a similar light to OO.

I got to tour the Tera offices in Seattle in the late 90s, about all I
remember is that it was a torus and it used some finicky silicon process that
was leading to manufacturing delays. I was all into Beowulf and Mosix [2] at
the time using Alpha or x86, so I wasn't drawn to it that much.

[1]
[https://www.hpl.hp.com/hplabs/index/Tandem](https://www.hpl.hp.com/hplabs/index/Tandem)

[2] [https://en.wikipedia.org/wiki/MOSIX](https://en.wikipedia.org/wiki/MOSIX)

~~~
kragen
The other machines that I think of as being the siblings of the Tandem are the
Nova (a lovely instruction set crippled by its shitty OS), the HP 3000, and of
course the byte-addressed PDP-11. Despite their differences they all have a
very similar flavor, reflecting a CISCy Zeitgeist when minicomputers were just
beginning to cut their umbilical cords to PDP-10s and the like.

Tera’s first machine was bipolar (ECL I assume) and they finally squeezed out
a CMOS successor with a lot of assistance from their EDA vendor. Never knew
the story of why moving to CMOS was so urgent.

Amusingly, it was the Beowulf list where someone converted me to the Tera
religion (rgb I think). I was convinced that was the way all computers would
work soon. And, well, it's how GPUs work, kind of. But mostly I was wrong.

Not sure I agree about modularity. Galaxies, mammalian bodies, trees,
bacterial films, cars, books, and river systems are modular. It would be
surprising if we could make software non-modular. But we could make it only as
modular as a tree.

~~~
sitkack
I am happy you don't agree on modularity. I don't want to be correct, I want
to arrive at correct conclusions. :)

Composition is great, scale free self similarity is probably the basis for the
universe.

Modularity is a great _design_ technique, it can also make things weaker and
force other (unknowable) design choices because the module boundary prevents
the flow of information/force. Overly constrained modular systems encourage
globals, under constrained modular systems are asymptotic to mud/clay.

I don't want to use K8S as a strawman to attack modularity, but I think it is
an example of using this powerful design tool to solve the wrong problem using
mis-applied methods all the while being _more_ complex and using _more_
resources. In the case of designing systems, modules/objects/processes (Erlang
sense) are critical, but not so much in building/engineering them.
Demodularizing or fusing a design can make it more robust and more efficient.

I don't _dislike_ modularity, I just think it is a bigger more complex topic
than most give it credit for. Unix is highly-non modular and very poor
composition. It sits on a molehill of a local maximum, itself sitting in the
bottom of a Caldera, a sort of Wizard Mt on Wizard Island.

Other things you might like is the research around "Collapsing Towers of
Interpreters" [1]

Or Dave Ackley's T2 Tile Project and Robust First Computing [2]

Would love to chat more, but internet access is spotty for the next week, non-
replies are not ignores.

[1]
[https://lobste.rs/s/yj31ty/collapsing_towers_interpreters](https://lobste.rs/s/yj31ty/collapsing_towers_interpreters)

[2]
[https://www.youtube.com/watch?v=7hwO8Q_TyCA](https://www.youtube.com/watch?v=7hwO8Q_TyCA)
[https://www.youtube.com/watch?v=Z5RUVyPKkUg](https://www.youtube.com/watch?v=Z5RUVyPKkUg)

------
tlb
It's disappointing that most machines today suck so badly. How did that become
the state of the industry, with so many smart people working so hard and
nobody likes their latest designs?

The last high-performance design I actually liked was the DEC Alpha. You could
write a useful JIT compiler in a couple hundred lines.

I suspect that nVidia's recent GPUs are wonderfully clever inside, but they
don't publish their ISA and the drivers are super-clunky. So I can't admire
them.

I appreciate the performance of intel Core chips, but there's so much to
dislike. The ISA is literally too big to fully document. The kernel needs
1000s of workarounds for CPU weirdnesses. You have to apply security patches
to microcode, FFS.

RISC-V would be great if we had fast servers and laptops.

~~~
kjs3
What's wrong with Power 8 & 9? What's wrong with ARM64? What was wrong with
Sparc64 until Oracle screwed it up (well...register windows...ok). How is
RISC-V intrinsically better than those architectures, considering it doesn't
exist in a form that performs anywhere near as fast?

------
bane
Surprised nobody picked the Atari 400/800 and Amiga 500 computers (which are
the 8-bit and 16-bit spiritual parent/child machines by the same people).

On the other end, pure CPU only machines are kind of interesting as a study in
economy, like the ZX Spectrum, a horrible, limited architecture that managed
to hit the market at an unreasonably cheap price, make money, and end up with
tens of thousands of games.

~~~
cubano
OMG did I love my Atari 800 back in 1985 when it was clearly one of the best
price/performance machines available at that time.

Overlooked by many (but not all) was its built-in MIDI ports and its abilty to
control all those early model beatboxs and synths...unfortunately I forget the
name of the rather crappy software that I used to get things talking and
synced up, but it did work and the bitmapped color graphics were way ahead of
its time.

Too bad Atari self destructed with the cartrage business, and who knows what
other poor business decisions it made, but that computer was one of my
favorite things of my late teens.

~~~
rjsw
Are you mixing up the Atari 800 and ST, the 800 didn't have MIDI ports.

------
CalChris
Interesting that the B5000 didn't make this list. Berkeley CS252 has been
reading the _Design of the B5000 System_ paper for years. The lecture slides
don't criticize it but _Computer Organization and Design_ sorta does:

 _The Burroughs B5000 was the commercial fountainhead of this philosophy
(High-Level-Language Computer Architectures), but today there is no
significant commercial descendant of this 1960s radical._

~~~
Aloha
I was also surprised - but I wonder thats the computer architectures language
designers like, not computer architects.

------
oddity
The list seems biased towards pre-2001, so I’ll toss one in: Cell. I hold that
it was so ahead of its time, it dragged game devs, kicking and screaming, into
the future ahead of schedule when they were forced to support the PS3 for the
extended console cycle. :)

Larrabee was cute, but to this day I still have no idea what their target
workload was.

~~~
kjs3
Yup. Most of this was culled from a 2001 conference (so small but
distinguished sample set), and you really need to read the detail to
understand what they were appreciative of. It's not a good/bad thing and
probably represented what they were thinking about at the time (e.g. Alpert
calls out Multiflow because it influenced a processor he built). Sites even
includes a backhand at VAX by calling it the example of what they didn't do in
Alpha; damning with faint praise.

I haven't fired up my Cell dev board (Mercury) in a while. Prolly should do
that. :-)

------
DonHopkins
I always thought of the 6809 as the Chrysler Cordoba of 8 bit microprocessors,
with soft Corinthian Leather upholstery and a luxurious automatic multiply
instruction.

[https://www.youtube.com/watch?v=Vsg97bxuJnc](https://www.youtube.com/watch?v=Vsg97bxuJnc)

------
erosenbe0
The CDC-6000 and Cray-1 designed by Seymour Cray are the most admired, hands
down.

It is also notable that quite a bit of R&D was done in Chippewa Falls, WI,
which is just a regular old town in America's Dairyland.

~~~
arminiusreturns
I'm no architect but I loved the Cray-2. I took over an old datacenter that
had one just sitting there and a sense of awe hit me every time I saw it. What
cost 12M brand new and was a marvel of engineering (12x faster than Cray-1!)
was just sitting there collecting dust. Crazy world this is. They eventually
sold it to a collector I think.

~~~
kjs3
The Y/MP I got to use was pretty cool too...a multiprocessor Cray. But then
that's a Chen design and I don't imagine Seymour would approve of everything
that went into it.

~~~
LgWoodenBadger
In college our ACM chapter named its DEC machine (I think it was a Tru64, but
it's so long ago I don't remember) "cray-ymp." If you were a member of the ACM
you got an account on it, and I used to MUD from it.

One of the MUD admins was astounded when he noticed I was playing from what he
thought was a real Cray Y/MP. If I was smarter I would have played along.
Alas...

~~~
kjs3
Missed opportunity for amusement.

Funny enough the Y/MP-48 was running Unicos (Cray's mostly-Unix) and someone
had compiled nethack or rouge or something similar and was playing it until he
had sucked up all the funny-money the department had budgeted for the year
(yeah...we got accounted for CPU time, and yeah...someone screwed up the
quotas). There was a kerfuffle....

------
gumby
Surprised the PDP-6/10 didn’t make the list as it was the dominant research
architecture for a certain period. Another Gordon Bell jewel.

~~~
kjs3
Alas...so little respect for the 36-bitters these days. The PDP-10 especially
was hugely influential.

~~~
cubano
I wrote my first non-BASIC program ever on a PDP-10 or 11 and as I remember,
it was one of those numberical programming for Engineers classes where we had
to figure out why, when using floating-points vs integer math, that 2 != 2.0
(because 2.0 was actually 2.0000000000001 of course)

The funny thing was...no one told me the secret and I musta spent 5 long hours
pulling out my hair.

If I remember right, one of the RAs running the computer center finally had
pity on me and showed me what was going on...

I'm not really sure I can call them the "good ole days of programming", but
that was how things were done back then.

------
PeterStuer
As far as processors are concerned I loved the Zilog Z80 and the Motorola
68000. Oddly enough I really disliked the MT 6502 and the Intel 8086.

As total systems I loved the HP 41CX, the Sinclair ZX Spectrum, the Symbolics
Lisp Machine and the Apple Mac IIcx (or really just any Mac before the PowerPC
debacle).

After that era, I just started home-building x86 machines, and while there was
the odd preferred component, it never went beyond the 'A is better than B'
stage.

~~~
kazinator
I started on the 6502, but outgrew it.

Still, cult chip! I mean, something like the following shows obsessive
dedication to the thing:

[http://www.visual6502.org/JSSim/](http://www.visual6502.org/JSSim/)

------
fouc
Anyone admire forthchips? Such as the 144-core chip from
[http://www.greenarraychips.com](http://www.greenarraychips.com)

------
Merrill
>Processor design pitfalls - Designing a high-level ISA to support a specific
language of language domain

Is there an equivalent pitfall in designing the ISA to support a specific
Virtual Machine?

For example, wouldn't the performance of a server processor when running the
Java Virtual Machine be a key factor in determining its commercial success?
I've always wondered whether the failure of Itanium wasn't at least partly
caused by the shift from binary executables to bytecode with the contemporary
success of the Java language. Even when JIT compilers were used, they were
probably too simple to take advantage of the VLIW architecture.

~~~
pdimitar
I don't feel that's the core reason but you do bring up a good point; some
technologies are too good for their time and get swept in the history books
due to nobody having a clue how to utilise them properly.

Not sure if that's the exact case for Itanium but your argument fired a
neuron. :)

------
mtreis86
The machines I most admire are mechanical computers like the ones used in WWII
era battle ships for targeting their long guns. Those machines performed
differentiation and curve matching using cams and gears.

~~~
rootbear
There is a fascinating series of videos on YouTube that describe the US Navy
analog fire control computers. I had no idea such things existed until I came
across those videos.

~~~
bloopernova
A friend's father back in the early 90s was convinced that analogue computers
were going to come back and kick the asses of these newfangled digital
pretenders :)

He was a little strange, but I loved seeing his workshop and attempts at
building such computers. It's a pity I'm not in touch any more, I would like
to learn a bit more about what he was actually doing and if it was effective
at all.

------
cptnapalm
I have a small IBM 390 which I haven't been able to find out much, but I did
spot while searching that my 1999 S/390 has a 256 byte cache line. That's 4x
over a 2020 i7.

------
cameldrv
Pentium Pro should be on the list. The out of order execution, especially with
the micro-op translation was a huge breakthrough.

~~~
titzer
The Pentium Pro pioneered neither of those concepts.

~~~
kjs3
Very true, but I believe it was the first to deliver them at consumer price
points and volumes.

------
squarefoot
M68k and Z80 IMO deserved to be in that list much more than x86.

~~~
bloopernova
The Z80 and its implementation in the ZX Spectrum 48k will always have a place
in my heart. So many BASIC games typed out, so many magazine tapes loading
strange programs and games.

And the M68K powered my Atari ST, my friends Amigas, and so much more.

Makes me wish I could get a small ZX Spectrum, Atari ST, and Amiga hardware
kit that would interface with USB HID stuff and output displayport/hdmi.
(rather than software emulation on a Raspberry Pi)

------
bshanks
The ones listed by 4 or more people (not including Bell) were:

\- CDC-6600 and 7600 - listed by Fisher, Sites, Smith, Worley

\- Cray-1 - listed by Hill, Patterson, Sites, Smith, Sohi, Wallach (also Bell,
sorta)

\- IBM S/360 and S/370 - listed by Alpert, Hill, Patterson, Sites (also Bell)

\- MIPS - listed by Alpert, Hill, Patterson, Sohi, Worley

Special mention:

\- 6502 - only listed by Wilson, but she was the chief architect of ARM so i
think her choice is important to note

\- Itanium - mentioned in the top-ranked comment in this HN discussion

\- DEC Alpha - mentioned in the second-ranked comment in this HN discussion

------
ChuckMcM
I was always partial to the DEC-10 architecture. That said my first exposure
to a machine that had been really well thought out was the IBM 360.

------
dillonmckay
[https://en.m.wikipedia.org/wiki/VAX-11](https://en.m.wikipedia.org/wiki/VAX-11)

32 bit system from the late 1970s.

~~~
bitminer
The VAX was, I think, co-designed alongside VMS. The two together were an
innovative design, distinguishing architecture from implementation, a
comprehensive isa, a roadmap for the future, etc etc. VAXcluster was amazing
integration of both.

I believe the design was influenced by Djikstra's Structured Programming book
but have no evidence.

My epiphany on the issues with the isa came when I discovered that the
checksum calculation used by the VMS backup utility was faster when done in a
short instruction loop over the microcoded instruction. MicroVAX II.
Microcodes were a huge barrier between the speed potential of the electronics
and the actual visible isa. Duh!

Cray knew this, but he didn't build product lines, just single point products.
Sun built product lines with RISC and ate Digital Equipment's lunch.

------
rootbear
I'm tempted to suggest Babbage's Analytical Engine, on the basis of shear
audacity alone. Babbage was just amazingly ahead of his time.

------
bathtub365
Well, pretty much anything will run AutoCAD these days.

~~~
choonway
same goes for CATIA V5...

------
tyingq
Are there any notable/not-just-academic "clean sheet" CPU architecture efforts
other than what Mill computing is doing?

------
ragerino
I admire AMDs Zen architecture.

~~~
yjftsjthsd-h
For internal engineering beauty, or because it apparently made good trade offs
and is taking the market to town right now? :-)

------
MaxBarraclough
No mention of the SuperH.

------
vi-mode
I'm into computers for decades. The first time after ages a computer blew my
mind again was when I got deep into k8s.

If you haven't yet do it now, check k8s out, get knee-deep into it and not
from a devops perspective but as one who admire computers.

~~~
bloopernova
More software than hardware, but I share the same wonder at Kubernetes. I love
the whole concept of: "this is a big blob that runs your stuff, don't worry
about networking and some other server-type considerations, just run your apps
and there's a bunch of magic"

It's a little sad to see my previous career disappear, I'd love nothing more
than to manage some VMware clusters running Linux VMs for the rest of my life,
but technology moves on. It's forcing me to become a coder and manager rather
than sysadmin.

