
How much better was DEC Alpha than contemporary x86? - turrini
https://retrocomputing.stackexchange.com/questions/13611/how-much-better-was-dec-alpha-than-contemporary-x86
======
indymike
At the time that Alpha came out, the x86 was struggling to get more clock
speed, and the groupthink was that RISC was the way forward. RISC chips would
run at 4-8x the clock speed of an x86 and even when you adjusted for needing
2-3x more instructions, RISC was 50-200% faster.

DEC had a winner with Alpha. It had speed, and most importantly, you could run
Windows NT on it. NT mattered because most Unix vendors at the time wanted
$1,000/CPU for their licenses, and NT was $cheap (the OEM version was around
$300 if I recall correctly).

As other posters have said, DEC just could not get out of their own way and
let Alpha succeed. Wierd sales policies, hostile partnerships, and intense
competition all really stymied DEC. A lot of the weirdness came from trying to
protect their legacy base of VAX and PDP midrange systems and a general hatred
of IBM (who was pushing OS/2).

BTW as I recall, Windows NT supported x86, Alpha and MIPS (another RISC
vendor) with the first commercially available version of NT. MS added a few
other RISC architectures in the following years (ARM most notably). x86 closed
the speed gap a few years later with the Pentium II (the Pentium Pro was
largely used in servers) and the rest is history.

~~~
maxerickson
In ~2000 I knew someone using NT on an Alpha. It wasn't a better experience
than my 2 year old P2, and they had paid more for it.

~~~
jessaustin
Shortly after I graduated in '98, I purchased an Alpha that ran both NT and
some Unix (I forget which?). This purchase was motivated by nostalgia for the
DECs in the computer lab at school, which were awesome, although I agree that
in the long run I wasn't completely satisfied with it.

~~~
hindsightbias
Ultrix

~~~
asveikau
I think alpha was also the first non-x86 architecture that Linux was ported
to, in the mid 1990s.

~~~
gpderetta
Linus was a big fan of alpha.

~~~
asveikau
Yeah, I remember writings and interviews from that time period, where he spoke
and wrote well on the subject.

Can't say I understand very well what was going on in his mind, but somehow I
got the impression that some of that was not totally specific to alpha in
particular, but that alpha represented non-x86 at large, and he was excited
and enthusiastic to tackle the technical problem of making the kernel
portable. And certainly it paid dividends; eg. linux on ARM is running on
billions of devices today, and that surely would have been more work had they
not decided to make the kernel portable in the first few years of its history.

------
rayiner
Which alpha? The 21264 was introduced in 1996. It was a 4-issue out of order
processor that could have 80 instructions in flight. It had 2 LSUs and could
clock up to 500 MHz initially.

The Pentium Pro, released in late 1995, was a 3-way design clocking up to 200
MHz. It was surprisingly competitive. It had all the modern requirements: out
of order execution, out of order memory pipeline, on board L1 and L2 caches.
It just had less resources in each dimension. Intel caught up to 500 MHz in
1999, with the PIII. That was still a 3-day design. Intel’s first four way
design was the Core 2 in 2006. That could hold 96 instructions in flight.
Intel didn’t release a CPU with two load store units until 2011 (Sandy Bridge
could do two loads per cycle) or possibly 2019 (Sunny Cove was the first that
could do two stores per cycle).

Obviously, Intel caught up based on a combination of features long before
then. I’d guess around early 2000s, when clock speed hit 1 GHz+. Clock for
clock, the Pentium Pro wasn’t actually that much slower than Alpha on integer
code. It took much longer for intel to be competitive on floating point code.

(By the way, it blows my mind that we went from 200 MHz to 1 GHz in about 4
years. Meanwhile, it’s been 15 years since my first 2 GHz+ MacBook, and a new
MacBook still has a base clock around that range. Maybe it’ll turbo boost to 4
GHz for 10 seconds before it exceeds its thermal envelope.)

~~~
K0SM0S
> Maybe it’ll turbo boost to 4 GHz for 10 seconds before it exceeds its
> thermal envelope.)

Actually that's on Apple, the manufacturer, for choosing to undercool these
chips (probably thinking that most people won't care, which I tend to agree
with, but it's problematic when the machine is sold as a "Pro" device).

Anecdotal comparison, my Thinkpad sustains its 4 GHz for hours on end no
problem. There's no comparing the performance of these i7 / R7 with decade-old
dual cores at 2 gigs, it's just unfair.

~~~
jsjohnst
> but it's problematic when the machine is sold as a "Pro" device

I’m so tired of this trope. Do you think there’s a single definition of Pro
and it’s yours?

~~~
rbanffy
If you sell me a "Pro" machine, I expect to be able to fully utilize all its
resources for as long as I need to. While I don't need full power all the time
(once or twice a month I need it for more than 10 minutes), when I need it,
I'd prefer it to be available for more than a couple minutes at a time and not
make my laptop sound like a hair drier and compute like an Atom laptop.

Having said that, my 15" MBP is sufficient for my needs, but I'd not recommend
it for anyone doing heavy numeric lifting. Or compiling large volumes of code.

------
zozbot234
The Alpha had an incredibly weird memory model that could create this
behavior:

 _Assume: p= &a, a=1, b=0_

    
    
      Thr. 1         | Thr. 2
      b = 1          |
      memory_barrier | i = *p
      p = &b         |
    

_The result can be i = 0_

Even though p=&b happens _after_ b=1 in Thread 1, it is seen as happening
_before_ it by Thread 2. This is basically because the b=1 part might get
ignored by Thread 2 unless that thread also goes through a memory barrier of
its own. When the i = \\*p is executed, Thr. 2 might read p from main memory
as written by Thr. 1, but then still rely on the old b=0 value sitting in its
cache. It's hard to describe that as being "superior", but maybe it enabled
some performance optimizations in the common case that made it worthwhile.

~~~
ajross
I'm not sure I understand your example. The Alphas had a pretty standard MESI
cache architecture and a very typical cache coherence behavior. It's true that
individual CPUs require barriers to enforce ordering, which is _not_ true on
x86 (where the architecture guarantees consistent ordering or memory
operations between all entities on the bus, and spends significant die area to
do that). But honestly x86 is the odd architecture here -- almost no one else
felt that was a good tradeoff. (In hindsight, x86 was right, of course.)

Where Alpha was weird was in its floating point implementation (two versions
on chip! to support the crazy VAX formats that no one cared about even at the
time) and its alignment requirements (the hardware, not really atypically,
couldn't make a misaligned load, but actually "could" by trapping into a
microcode handler that was 10x slower, leading to mysterious insane
performance regressions).

~~~
wbl
Why should the hardware do something software can take care of just as well
was the design philosophy here.

~~~
ajross
Because when it doesn't, it leads to "insane" results like the one upthread.
Reasoning about memory barriers is outrageously hard, and asymptotic software
quality is outrageously expensive. At the end of the day an x86 kernel is
going to have fewer race conditions than one for a traditional architecture,
and that's worth something.

This is a lesson that software folks have been pushing with tools like static
typing and memory lifetime analysis. The same applies at the hardware level.

~~~
zozbot234
You "just" need a memory barrier on every ptr indirection (including array
indexing, etc.) that might go through something that's being shared among
multiple threads. The more interesting question is whether requiring that
additional barrier buys more performance in the typical case where you're
_not_ touching shared data.

~~~
BeeOnRope
> The more interesting question is whether requiring that additional barrier
> buys more performance in the typical case where you're not touching shared
> data.

Probably yes, i.e., there was some hardware reason for it. You can find one
explanation [1] floating about that explains the effect was due to the cache
banking design, with separate invalidation queues for each bank. Presumably
that was better in some respect than the alternative designs available to
implementors.

That said, the difference is probably fairly _small_ and if Alpha were around
today they would almost certain want to abandon that particular reordering
since they are the only one doing it and there would be a lot of pressure on
them to make it fast (this comes up in all sorts of non-trivial scenarios like
double checked locking, reads of final fields in Java, etc).

That's generally the pattern, I think: if you are the odd man out with the
weakest model, and you aren't the dominant player, you will have a lot of
pressure to at least offer fast ways of lining up with the stronger models. So
even if a weak model is a better design point in a vacuum, it might not be
true when real-world implementation pressures are considered.

\---

[1]
[http://lse.sourceforge.net/locking/wmbdd.html](http://lse.sourceforge.net/locking/wmbdd.html)

------
johnklos
I still run an AlphaServer DS25. It's amazingly quick, even in 2020. It
outperforms CPUs which came years after it.

People sometimes think that the market caused the Alpha to fail. Really, it
was Intel. They wanted Itanic to succeed so much that they made a deal with HP
to end the Alpha prematurely, even though demand for Alpha systems was high.

Even after HP announced that the Alpha would stop being developed, Alphas were
being sold as a high end systems for all sorts of uses and had many entries in
the Top 500List of supercomputers - Alpha systems were four of the top ten on
the planet in November, 2002, which was after HP announced they wanted to
transition from Alpha to Itanic.

My AlphaServer DS25 is beautiful hardware. Everything in the machine is
manufactured to a standard we just don't see any more.

~~~
mega_dingus
In interviewed at Intel & DEC in '95, worked at DEC 96-97

Corporate infighting, incompetent sales, and misaligned vision pervaded the
culture. It didn't help that PPro came out of left field and took everybody by
surprise

When I interviewed at Intel in '95, they were absolutely giddy with how they
stole Alpha's thunder. DEC was a better geographic choice for me, and, well...

HP didn't come into the picture until long after - DEC sold to Compaq in '98,
Compaq to HP in '02\. The race was over at least 7 years before that.

Edit: Not to mention the absolutely hostile stance DEC took to MS wrt NT. Due
to Cutler's involvement with NT, DEC sued MSFT and somehow thought that would
make MSFT become a loyal partner.

I remember Office 97 performing terribly on Alpha. We sent a compiler guy
there to figure out why; turns out the Office team had a single Alpha in their
pipeline, set to compile -O0, and pretty much said "this is there to check the
lawsuit checkmark"

~~~
stuartc842
>turns out the Office team had a single Alpha in their pipeline, set to
compile -O0, and pretty much said "this is there to check the lawsuit
checkmark"

what does this mean? compile -O0? lawsuit checkmark?

~~~
detaro
-O0 means compiling without optimization. Presumably they were contractually required to have a version Office for Alpha, so they did the minimum effort thing to produce one.

~~~
mega_dingus
Exactly this

------
scottlocklin
I actually bought myself an Alpha for my home machine as a grad student in
1996. I think they were remaindering low end machines; the Multia[1], a little
21066a machine probably designed to run WNT. What I remember about it in
particular is it came with a bum memory chip, and ... somehow they sent an
actual service engineer to come fiddle with my computer in the Berkeley hills.
He had to do this twice. Can't imagine what it cost for that kind of service;
I think I paid a few hundred bucks for the Multia -I was poor! Kind of guessed
DEC needed to adjust their business model.

The computer itself was fairly poorly designed, and I remember the memory bus
(and g77 compiler) in the thing kept it only about as fast as a contemporary
intel chip; it eventually suicided itself during one of Berkeley's frequent
power outages.

[1]
[https://en.wikipedia.org/wiki/DEC_Multia](https://en.wikipedia.org/wiki/DEC_Multia)

~~~
saltcured
I had one of those too, though I feel like I might have gotten it in 1995. I
immediately put Linux on it and found that it was quite fast for some tasks
where the open source software compiled well on Alpha.

The biggest practical drawback for me was that Netscape wasn't available as an
Alpha binary. Running it with the open source FX32 x86 emulator was too slow.
It performed about as well as a Python-based browser at the time. I ended up
using ethernet to run Netscape and other x86-focused software on a 486DX4-100
with remote X display back to the Alpha which had my nice monitor.

I brought that Multia with me to my first job, and used it to do 64-bit
portability work (on Linux) for an middleware/communication project that we
were busy porting to many obscure 64-bit HPC platforms. It was fun to have a
64-bit Linux at home, long before x86_64 came onto the scene.

~~~
scottlocklin
Somehow I don't recall this limitation. Maybe someone compiled a version of
Mosaic for it? I probably wasn't very webby at the time; grew up on text based
interbutts. Was definitely ringing up the University internets using a 19k
modem.

I do remember playing Quake on the thing; that was pretty cool.

------
bluedino
It's also interesting that the Alpha CPU can be blamed for the poor file
compression in Windows -

[https://devblogs.microsoft.com/oldnewthing/?p=94615](https://devblogs.microsoft.com/oldnewthing/?p=94615)

 _One of my now-retired colleagues worked on real-time compression, and he
told me that the Alpha AXP processor was very weak on bit-twiddling
instructions. For the algorithm that was ultimately chosen, the smallest unit
of encoding in the compressed stream was the nibble; anything smaller would
slow things down by too much. This severely hampers your ability to get good
compression ratios._

More Alpha details from Raymond Chen:

[https://devblogs.microsoft.com/oldnewthing/20170807-00/?p=96...](https://devblogs.microsoft.com/oldnewthing/20170807-00/?p=96766)

~~~
dirkt
This story has been retracted by the author:

[https://devblogs.microsoft.com/oldnewthing/?p=96915](https://devblogs.microsoft.com/oldnewthing/?p=96915)

------
gdubs
The thing I remember about DEC Alpha was that they were used as renderfarm
processors by special effects artists working in Lightwave on Amiga (for shows
like Babylon 5, etc). I used to drool over ads for DEC Alpha “screamernet”
machines hooked up to Amiga 4000s. Alas, I was stuck with a Tandy 486 SX...

~~~
Wistar
The animation studio I worked for in the mid-late 90s was a Softimage shop
using SGI Indigo and Intel pentium workstations with a 20-30 machine Alpha
(NT) render farm. It was a very good setup with one weird issue: depending on
the type of render being done, especially anything with ray marching, the SGI
or Intel workstations couldn't participate in the render because the results
looked subtly different.

~~~
rowanG077
Probably a bug in the render code that made it look different.

~~~
Wistar
It was explained to me as a difference in the way certain arithmetic
calculation were made in the actual hardware. The render engine was mental
ray. It was really only apparent if there were atmospherics, such as
volumetric lighting, in the scene. Net renders with the workstations
participating would come back with some of the resulting tiles looking
noticeably different—the density of the atmospheric effects (ray marching)
would appear to have differing densities between the alpha-rendered tiles and
the SGI or Intel-rendered tiles.

------
thedance
It was none faster because it was impossible to program, without the ability
to load or store anything but natively aligned and sized quads and with a
quite useless memory order model (essentially anything could be reordered past
any other thing). Reading, modifying, and writing one byte on this rig was
basically impossible and don’t even think about how hard it would be to write
a mutex.

~~~
dragontamer
> essentially anything could be reordered past any other thing

Except for memory barriers. The DEC Alpha command "mb".

x86 has a (relatively) strong memory model, but ARM and POWER9 both have weak
memory models. Not quite as weak as DEC Alpha, but weak enough that you need
to be very careful about memory-barrier placement in both ARM and POWER9.

XBox360, PS3, and ARM (cell phone) system programmers would know about the
difficulties involved. Yeah, its hard to learn, but totally possible to write
a mutex.

\--------------

There's probably more code running on "weak memory models" today (ie ARM) than
on "strong memory models" (aka: x86) today.

> without the ability to load or store anything but natively aligned and sized
> quads

Check out the assembly code generated with -O3 with GCC or LLVM. x86 has
natively aligned quads for performance reasons, and compilers have also solved
this problem. In fact, you'll see plenty of "nop" generated in compiler code,
so that all of your assembly is aligned to the cache-line to maximize uop
cache issues on modern x86 machines.

x86 doesn't have any REQUIREMENT for aligned assembly code or data. But x86 is
far more efficient when reading/writing from cache-aligned locations. (In
particular: reading across an 64-byte boundary naturally results in 2x L1
cache reads instead of 1x L1 cache read).

Both the alignment, and memory-barrier, issue are solved today. Arguably, they
weren't solved back in the DEC Alpha days (I was too young to be programming
at that time)... but modern compilers and toolchains can certainly work with
memory barriers and "aligned only" memory.

~~~
pizlonator
ARM’s memory model is stronger than Alpha’s especially on ARM64.

It’s not about just whether it’s possible to write a mutex but also whether
you can write a really good one. The best concurrent algorithms - whether
mutexes or lock-free data structures - benefit from a careful mix of strength
and weakness in the memory model. I think Alpha is too weak. X86 may be too
strong - so say smart people - but it still manages to be fast as fuck.

No idea what you’re talking about wrt alignment. On x86, you can load/store
misaligned. On Alpha, you can’t. On ARM, you can on some but not on others.
The CPUs where you can are easier to program.

No idea what you’re talking about wrt memory barriers. It’s not a solved
problem. Memory barriers slow things down so it’s better not to have to use
them. X86 and ARM64 give you tricks to avoid using barriers, or to use cheap
barriers, in many important racy algorithms. Alpha gives you fewer tricks.
(Specifically x86 gives you lots of ordering “for free” while arm64 let’s you
use the self-xor dependency trick to cheaply order loads and has a generous
buffet of half fences.) The memory model matters a lot - the weaker it is, the
fewer tricks programmers have to avoid doing expensive things to request
specific orderings.

~~~
dragontamer
> On x86, you can load/store misaligned.

At a performance penalty. Read or write across a 64-byte cacheline, and your
CPU will be forced to perform 2x loads to implement your singular unaligned
load. An unaligned load across a cacheline is literally 1/2 the speed of an
aligned load, and is something the modern programmer (and compiler) is
designed to avoid.

The compiler basically avoids all misaligned loads/stores, even on x86. (Aside
from when the programmer really forces it: like _reinterpret_cast
<int_>(0x800003f) or something)

\-------

> The best concurrent algorithms - whether mutexes or lock-free data
> structures - benefit from a careful mix of strength and weakness in the
> memory model.

And all of those algorithms would run CORRECTLY, if those half-barriers were
replaced with full barriers. Maybe slower, but they'd be correct.

That "generous buffet of half fences" can only exist on a weak memory-model
system (like ARM), because x86 AUTOMATICALLY performs those half-fences after
every load / before every store instruction. That's the thing about strong
memory models: once you're "too strong", it doesn't even make sense to have
those half-barrier instructions.

ARM and PowerPC were too weak 10-years ago. They've "strengthened" their
memory model by adding new half-barriers to their instruction set. That's the
real secret: to change your CPU over time to match programmer's preferences.
ARM / PowerPC started off too weak, but are now approaching "just right", with
the addition of new instructions.

DEC Alpha can't do that, because the DEC company died decades ago. Its only
fair to consider the programming environment and expectations of the time
period that the DEC Alpha existed in.

~~~
pizlonator
The perf penalty of misaligned loads and stores is incredibly low.
JavaScriptCore uses them quite a bit. The penalty is way lower than handling
the misalignment by way of a trap, which is what you’d do on Alpha if you had
to run some code that was designed to use misalignment. That’s just a fact -
the original comment was about how this fact made Alpha a worse target.

I think you’re glossing over a lot of details about the memory model. What the
model says about the ordering of dependent loads isn’t a matter of just adding
fences later - it’s more fundamental than that. Of Alpha achieved it’s perf
advantage thanks to speculating loads then it would have to lose that
advantage when it was modernized to current standards. Also, the original
comment was about Alpha back then versus Intel—and-others back then, so it’s
not interesting to say that Alpha could have just improved - that’s not really
responding to the original comment about how hard Alpha was to program.

~~~
BeeOnRope
> The perf penalty of misaligned loads and stores is incredibly low.

Yeah, this.

You might as well think of them as free in most scenarios involving small
reads and writes. If there is some small advantage to misalignment (often,
reduced memory use through better packing), do it!

The cross-line penalty would occur only for 3 out of 64 alignments for a
4-byte load: less than 5% of the time, and then the penalty is small.

That % is more or less worst case in well designed code [1]. Sometimes you can
do misaligned loads that you guarantee will never cross. A common example is a
misaligned LUT where the values overlap. E.g., given n, you want to load 8
bytes with:

    
    
      [n, n+1, n+2, ..., n+7]
    

Let's say n ranges from 0 to 15. A traditional aligned LUT would have 16x
8-byte values, one for each n (128 bytes). You could also do it with a single
23 byte LUT, running from 0 to 22, and a misaligned load into that LUT at byte
position n. As long as you align the LUT itself (e.g., to 32 bytes), you will
never cross a line.

For large accesses, things become less clear. After all, a 512-bit AVX-512
access is guaranteed to cross if it is misaligned, and 256-bit access randomly
distributed cross half the time, etc. Vectorized code is also the type of code
that may be written to approach the 2/1 load/store per cycle limit, so it
really pays to try to get alignment for any type of non trivial loop.

\---

[1] Specifically, this is the crossing % you would get if you could assume
nothing about the distribution of the accesses, i.e., they are uniformly
randomly distributed. If you know something about the expected alignment, then
you can shift everything to make crossing less likely.

------
hylaride
One interesting note about alpha is that the reason PuTTY exists is that the
creator had a windows NT alpha workstation, but there was no native telnet
clients that had good terminal emulation for Alpha NT.

He supported the alpha build well into the 2000s.

~~~
aap_
Funny. I've run a recent x86 putty on Alpha.

~~~
hylaride
Are you compiling it yourself? Also, you're still using a windows workstation
on alpha? If so, that's interesting.

"We used to also provide executables for Windows for the Alpha processor, but
stopped after 0.58 due to lack of interest"

0.68 was released in 2005-04-05.

[https://www.chiark.greenend.org.uk/~sgtatham/putty/faq.html#...](https://www.chiark.greenend.org.uk/~sgtatham/putty/faq.html#faq-
ports-general)

------
GeorgeTirebiter
I highly suggest viewing one of the Alpha's original designers, Jim Keller,
recent chat with MIT Prof Lex Fridman
[https://youtu.be/Nb2tebYAaOA](https://youtu.be/Nb2tebYAaOA) I was stunned by
Jim's total command of Computer Architecture, and his deep insights about the
subject. I will never again utter the phrase 'modern computer' without
requisite awe.

~~~
Tsiklon
Jim Keller and the teams he leads/enables has their fingerprints all over
modern CPU architecture - From AMD's K8 (Athlon 64) and Zen (their modern
competitive product) to Apple's A4 and A5.

------
gok
MIPS/SPARC/POWER chips were all meaningfully better than x86 chips at the time
too. Alpha had the trick that it could kind of run Windows NT, but it was a
waste to bother since 3rd party app support was so spotty.

------
sprash
DEC was on the forefront of so many technologies and did set so many standards
(first Unix machine, vt100, first viable 64-bit processor, first viable search
engine, much more even before that).

It is sad to see that this company and its products were brought down by pure
political maneuvers and hostile takeovers instead of competition or merit.

~~~
fulafel
MIPS beat them on the 64 bit CPU front. And I think some vector supers were 64
bit before that.

~~~
sprash
But MIPS 64 bit didn't outperform the fastest 32bit processor on the market by
huge margins. That is what I meant by "viable".

------
fmajid
In 1993 I had a DEC AXP 3000/300L workstation (Alpha with a 32-bit memory
bus). It was fast, but the DEC OSF/1 OS was pretty bad and hobbled the
machine.

There were lots of portability issues, for instance GNU Emacs wasn't 64-bit
clean whereas Lucid Emacs was, so for two decades I ran Lucid/XEmacs instead.

Alpha was the first microprocessor to hit the billion instructions per second
mark (BIPS-0) and was used to build oa 50Gbps router in 1998 using Alphas
21164 with routing code running entirely in L1 cache:

[https://www.cs.princeton.edu/courses/archive/fall12/cos561/p...](https://www.cs.princeton.edu/courses/archive/fall12/cos561/papers/partridge-50gbs-98.pdf)

------
ohiovr
Lightwave 5.5 was about 6x faster than Pentium 66 at the time I used it
configured with 256 megs of memory. Which was already faster than what I used
at home which was just an Amiga 3000 25 mhz with 16 megs of memory.

------
jasoneckert
I had four Digital Ultimate Workstations (=AlphaServer 1200s) with dual 533MHz
21164 Alpha CPUs running Windows 2000 Server back in 2000. They ran Windows
2000 Server much faster than any PC hardware I could get my hands on at the
time.

------
bluedino
_One real-world look at Alpha vs x86 (and MIPS) performance was this .plan
post from John Carmack:_

\----------------------------------------- John Carmack's .plan for Jun 25,
1997 \-----------------------------------------

We got the new processors running in our big compute server today.

We are now running 16 180mhz r10000 processors in an origin2000. Six months
ago, that would have been on the list of the top 500 supercomputing systems in
the world. I bet they weren't expecting many game companies. :)

Some comparative timings (in seconds):

    
    
      mips = 180 mhz R10000, 1meg secondary cache
      intel = 200 mhz ppro, 512k secondary cache
      alpha = 433 mhz 21164a, 2meg secondary cache
     
      qvis3 on cashspace:
    
      cpus    mips    intel   alpha
      ----    ----    ----    ----
       1     608     905     470
       2     309     459
       3     208     308
       4     158     233
       8      81
      12      57
      16      43
    

(14 to 1 scalability on 16 cpus, and that's including the IO!)

The timings vary somewhat on other tools - qrad3 stresses the main memory a
lot harder, and the intel system doesn't scale as well, but I have found these
times to be fairly representative. Alpha is almost twice as fast as intel, and
mips is in between.

None of these processors are absolutely top of the line - you can get 195 mhz
r10k with 4meg L2, 300 mhz PII, and 600 mhz 21164a. Because my codes are
highly scalable, we were better off buing more processors at a lower price,
rather than the absolute fastest available.

Some comments on the cost of speed:

A 4 cpu pentium pro with plenty of memory can be had for around $20k from
bargain integrators. Most of our Quake licensees have one of these.

For about $60k you can get a 4 cpu, 466 mhz alphaserver 4100. Ion Storm has
one of these, and it is twice as fast as a quad intel, and a bit faster than
six of our mips processors.

That level of performance is where you run into a wall in terms of cost.

To go beyond that with intel processors, you need to go to one of the
"enterprise" systems from sequent, data general, ncr, tandem, etc. There are
several 8 and 16 processor systems available, and the NUMA systems from
sequent and DG theoretically scale to very large numbers of CPUS (32+). The
prices are totally fucked. Up to $40k PER CPU! Absolutely stupid.

The only larger alpha systems are the 8200/8400 series from dec, which go up
to 12 processors at around $30k per cpu. We almost bought an 8400 over a year
ago when there was talk of being able to run NT on it.

Other options are the high end sun servers (but sparc's aren't much faster
than intel) and the convex/hp systems (which wasn't shipping when we
purchased).

We settled on the SGI origin systems because it ran my codes well, is scalable
to very large numbers of processors (128), and the cost was only about $20k
per cpu. We can also add Infinite Reality graphics systems if we want to.

~~~
AnimalMuppet
Ah, the "good old days". $20-30k per CPU? Yeah, I don't miss that.

------
apricot
I remember reading a book describing the Alpha architecture as an undergrad.
It was so much cleaner than x86. I was sad to see it fail in the marketplace,
but sadly it's a common story in tech.

------
based2
[http://alasir.com/articles/alpha_history/](http://alasir.com/articles/alpha_history/)

------
lsllc
This is what happens when you let business people near technology. They ruin
it (Compaq that is).

------
mr_toad
Things I remember about the DEC Alpha machines that were used at University
when I was a student:

They were expensive. We paid for CPU seconds. Not conducive to experimentation
.

The OS sucked. (From an end-user perspective, that is). It was both verbose
and unfriendly. At least Unix was succinct.

Most of the people using the system were engineers, mathematicians, and comp-
sci students, who all preferred either Unix or Windows.

~~~
mepian
The Alpha was running both Unix (Digital UNIX aka OSF/1 AXP aka Tru64 UNIX)
and Windows up until 2000 RC1. Even Linux was officially supported in the last
years.

~~~
tssva
Based upon the original comment I'm guessing the system he used was running
OpenVMS. Many universities replaced their previous VAX systems running VMS
with Alpha servers running OpenVMS.

------
turk73
The first one I used was a DEC UNIX based workstation. That thing was pretty
cool at the time, circa 1993. I learned a ton about UNIX on it and was really
my only access to that type of OS until Linux first came out a year or so
later.

Later on, I worked at a healthcare company that used a cluster of Alphas
running OpenVMS or whatever the hell that OS is called. This was circa
2011-2014. It was DEFINITELY NOT COOL ANYMORE. That hardware was really old
and the software on it was subject to frequent restarts due to memory leaks.
The company used it to try and operate order intake for a multi-site online
pharmacy. The system was impossible to interact with, it had a bizarre TCP
socket based API, one awful, buggy, SOAP service, and otherwise data could
come out via reports generated in a binary file format. Not strictly DEC's
fault as the Alpha hardware did last a long time, but it was generally an
awful experience for me to have to deal with that particular system. The
company tried and failed to replace it so they doubled down, bought the source
code for a ridiculous sum in the millions of dollars, and proceeded to try and
maintain it themselves by hiring crusty old timers whose people skills were
either way out of date or never existed in the first place.

Oh, and the repairs to the hardware were through a local computer salvage firm
that basically bought boards and other bits off of EBay. This is a major
player in the online pharmacy space, mind you.

------
based2
[https://en.wikipedia.org/wiki/AltaVista](https://en.wikipedia.org/wiki/AltaVista)

