
Intel releases the last Itanium chip, the 9700 - bhouston
http://www.pcworld.com/article/3196080/data-center/intels-itanium-once-destined-to-replace-x86-in-pcs-hits-end-of-line.html
======
ChuckMcM
As an interesting data point, (and the article doesn't do it justice), when I
saw the presentation on AMD's "Sledgehammer" architecture (the AMD64) at
Microprocessor Forum I wrote to the CTO of NetApp at the time "If you're
wondering, Itanium just died." And then more than 16 years later it actually
is going to cease development.

It also started a multi-year effort to convince NetApp to use an AMD chip in
their filer :-).

I think the events reinforce three good things to know;

1) Adapting the existing system to do new things can trump entirely new
systems, even if those new systems are 'better' in some way.

2) You can change hardware "overnight for free" compared to how difficult it
is to migrate large software systems.

3) "Eat your young" \- Keep innovating in your own products, even if that
means it makes previous versions obsolete, because if you don't your
competitors will.

So long Itanium.

~~~
RcouF1uZ4gsC
I agree with your assessment. It seemed Intel tried to use the 32-bit limit of
x86 to drive the high end computing to Itanium. AMD did x86-64 and removed
that limit, and Intel was forced to play along and x86 escaped from its
artificial limits and removed much of the justification for Itanium.

------
userbinator
The Itanium is a noteworthy example of what happens when one designs an
architecture exclusively for parallelism to the exclusion of all else, and
leaves all instruction scheduling to the compiler. The performance was great
when software could take advantage of the parallelism, but horrible otherwise,
since the processor would still be fetching bundles of 3 instructions (16
bytes each!) but only 1/3 of them would be anything other than NOPs,
effectively making each instruction a ridiculously-cache-bloating 16 bytes
long.

I suppose a similar analogy would be doing everything in x86 with SIMD
instructions and not using the scalar set at all.

~~~
KenoFischer
> I suppose a similar analogy would be doing everything in x86 with SIMD
> instructions and not using the scalar set at all.

With ever wider SIMD units this is actually happening to some extent. With
AVX512 (e.g. on KNL) you can do 64 operations (single precision FLOPs) in
vector units in the same amount of time as you can do 1 scalar operation.
Combined with the low clock speed of the KNL, you really don't want to be
doing scalar operations if you care at all about performance. Of course SIMD
isn't the same as VLIW, since the operation you're doing has to be the same in
all vector lanes, but with masking support in AVX512, it is getting a little
closer to that.

Also compilers have gotten quite a bit smarter and with hardware die shrinks
taking longer, I wouldn't be at all surprised if VLIW ISAs were to make a
comeback. I guess the Itanium was a bit ahead of its time.

~~~
deepnotderp
This, recent compiler advances in the past two decades have actually made VLIW
somewhat tractable, but since the memory hierarchy and data movement wall
consumes most of the power nowadays, the benefits of VLIW are mitigated.

~~~
CalChris
_Trace scheduling_ started with Fisher 81 [1] and Ellis 85 [2]. Trace
scheduling is what made VLIW even possible if not exactly tractable. That was
30+ years ago.

What _recent_ advances have made VLIW any more tractable than trace scheduling
already did? BTW, trace scheduling works for scheduling superscalar processors
as well.

VLIW never went away. It's used in embedded, the TriMedia processors. It's
used in the REX Computing NEO chip. The Mill CPU is a VLIW of sorts.

I'm not anti-VLIW but I don't know of any _recent_ breakthroughs that make it
any more tractable now for non-embedded, non-HPC general purpose computing.

[1] _Trace Scheduling: A Technique for Global Microcode Compaction_

[https://pdfs.semanticscholar.org/5698/09af0fcbe5a42371cea8d3...](https://pdfs.semanticscholar.org/5698/09af0fcbe5a42371cea8d30932d7d8f4a933.pdf)

[2] _Bulldog: a compiler for VLIW architectures_

[http://dl.acm.org/citation.cfm?id=912347](http://dl.acm.org/citation.cfm?id=912347)

~~~
trsohmers
Designer of the Neo here, and owner of the California "VLIW" license plate. As
you can guess, I am a die hard VLIW advocate, and a strong believer that the
original promises of VLIW (Drastically simpler decode logic, implicit
instruction level parallelism, virtually no control/data hazards on chip).

VLIW has gotten an extremely bad rap outside of the embedded space due to
Itanium, which I strongly contend was not a VLIW in spirit (which you cans see
me talk in more depth on here:
[https://youtu.be/ki6jVXZM2XU?t=441](https://youtu.be/ki6jVXZM2XU?t=441)).
Itanium introduced a ton of indeterminism by having static scheduling
unfriendly things like branch prediction, variable latency caches, and still
trying to have some level of support for x86. The core problem for Itanium is
that it was impossible for the compiler to make good decisions in the vast
majority of applications since the compiler could not know exactly where and
when things in memory would be. I believe the underlying failure point has
been computer architects for failed VLIW architectures not realizing that
fancy bells and whistles and "cleverly complex" hardware designs is the exact
opposite of what you want when the compiler needs to be able to do purely
static allocation. Adding fancy dynamic pieces in hardware to account from
this (seen in a lot of places with Itanium) just makes the problem worse.

The main improvement we have made with the Neo architecture is we have hard
(exact cycle count) guarantees on all memory movements on and off the chip,
which actually gives the compiler the information it needs to make good
decisions and very condensed code. I think the jury is out on the
applicability for "general purpose" code on VLIWs... it is not something we at
REX really care about at the moment, and I think RISC-V is a great solution
with improvements over x86 and ARM.

~~~
kijiki
> The main improvement we have made with the Neo architecture is we have hard
> (exact cycle count) guarantees on all memory movements on and off the chip
> ...

This is a fascinating idea, but I have a question.

What happens when the next process generation lets you improve cache timings?
What about newer, faster DRAM (or whatever) timings?

Do I have to recompile the world in that case, to see any improvement in
performance?

~~~
trsohmers
The assumption by us is that you would be recompiling from at least the LLVM
IR level for each different version of the chip. This isn't a real problem for
the markets we care about, as they have the source code for all of their
applications. In the case of not wanting to distribute your source code, I am
interested in being able to distribute emcrypted LLVM IR as a pseudo-binary,
similar to how silicon IP companies deliver encrypted verilog and vhdl for
integration into other chips.

~~~
kijiki
Interesting. The traditional problem in this space is that the source code is
lost or otherwise not available. It is not clear that some encrypted source
analog to encrypted verilog would be more available.

------
tyingq
We would probably all be working with Itanium servers if not for AMD
introducing 64 bit x86. I believe that also accelerated Linux adoption and the
decline of all the commercial Unix platforms.

Good to see AMD on the rise again. I appreciate their role in heading off
Itanium.

~~~
bluejekyll
Had x86_64 not been released, I also think we'd see a more diverse set of
serverside chips as well, such as SPARC and POWER. There wouldn't have been an
obvious dominance in the marketplace of a single architecture.

Also, had it not been released, we'd probably have seen a larger fracturing of
the laptop CPU market with a broader switch to ARM64.

In a sense, because AMD forced Intel's hand in supporting x86_64, they
actually helped Intel have even more dominance in the market with an
architecture that spans laptops to server work horses. We developers barely
need to lift a finger in targeting either.

~~~
drewg123
If amd64 had not been released, I think itanium would probably own the server
market. But it would be a very different server market .. much more like the
late 90s market than today's market.

Remember that Intel was twisting arms and making deals to kill the weaker RISC
chips in the marketplace. In the leadup to the first Itanium, HP had acquired
Compaq (which had acquired DEC), so they held both Alpha and PA-RISC. And SGI
was still a fairly big player with mips64. Both HP and SGI made deals to use
Itanium going forward (with HP going so far as to port HP-UX and VMS to
Itanium).

I think the server marketplace would probably have been much higher margin
without amd64 competition, as itanium was always a high margin player. Its
main competitors were the high margin server chips above, plus POWER and
SPARC. The low end was x86 controlled by intel, and they could certainly
continue to artificially segregate the market and keep margins high. That's
what they were good any monopoly does, and what they've been trying to do
today with Xeons.

I was a huge fan of Alpha (one of 2 people who ported FreeBSD to the alpha),
and I was very sad to see it go. I've always blamed Itanium for its downfall
-- I think Alpha had a lot more headroom than PA-RISC, and HP would have kept
it going if it were not for Itanium. So I have to admit that I'm quite happy
to see Itanium die.

~~~
ehvatum
Hey, thanks for your work on the FreeBSD alpha port! We got a lot of mileage
out of it, while pointing and laughing at Itanium the whole time :)

------
trentnelson
Related: excellent article on IA64/x64 etc:

[https://github.com/tpn/pdfs/blob/master/A%20History%20of%20M...](https://github.com/tpn/pdfs/blob/master/A%20History%20of%20Modern%2064-bit%20Computing%20-%20Feb%202007%20\(CSEP590A\).pdf)

Features quotes from various interviews with folks like David Cutler.
Fascinating (and not surprising) to connect the dots between the NT x64
calling conventions + SEH + RIP-relative addressing and Dave Cutler's initial
input.

Regarding the Itanium itself, I've always found this excerpt quite
interesting:

"Davidson also pointed out two areas where academic research could create a
blind spot for architecture developers. First, most contemporary academic
research ignored CISC architectures, in part due to the appeal of RISC as an
architecture that could be taught in a semester-long course. Since graduate
students feed the research pipeline, their initial areas of learning
frequently define the future research agenda, which remained focused on RISC.
Second, VLIW research tended to be driven by instruction traces generated from
scientific or numerical applications. These traces are different in two key
ways from the average systemwide non-scientific trace: the numerical traces
often have more consistent sequential memory access patterns, and the
numerical traces often reflect a greater degree of instruction-level
parallelism (ILP). Assuming these traces were typical could lead architecture
designers to optimize for cases found more rarely in commercial computing
workloads. Fred Weber echoed this latter point in a phone interview.
Bhandarkar also speculated that the decision to pursue VLIW was driven by the
prejudices of a few researchers, rather than by sound technical analysis."

(Page 6 of cited PDF.)

~~~
martinpw
> Assuming these traces were typical could lead architecture designers to
> optimize for cases found more rarely in commercial computing workloads.

At the time Itanium was actively being pushed by Intel I was working for a
company building content creation software. Intel was very interested in us
porting to Itanium and provided considerable help to that effort. One of the
big reasons was that they were interested in being able to see the traces,
precisely for the reason you state - most of their existing test code was not
representative of typical large scale commercial software.

We got it running but never released it - the binaries were huge (3x x86)
which was a big issue back then, and the performance just wasn't there.

------
bogomipz
From the article:

>"Intel can now focus on Xeon, which was rebranded last week to account for
new technologies like co-processors and faster interconnects."

I was confused by this, Xeon has been the mainstay in the high-end
workstation/server for what feels like forever now. Was the limited Itanium
market and development really affecting their focus that much?

I am curious if anyone knows what companies or verticals made big investments
in Itanium? I imagining Intel much have some pretty big Itanium customers if
it has been around this long. Maybe this answers my first question?

~~~
zargon
Xeon wasn't losing focus to Itanium... that's just the author writing filler
material.

The main customer of Itanium that I'm aware of is the VMS operating system. It
was ported from Alpha to Itanium in the early days of Itanium and anyone
dependent on VMS has been reliant on Itanium since then. HP paid Intel a few
years ago not to kill Itanium in the meantime while they were porting VMS to
x86.

~~~
bogomipz
Oh interesting, I did not know that. I have heard that the US government has a
lot of VMS systems. If that's true then this alone would make sense for HP to
fund Itanium to some extent.

------
faragon
The Anandtech article is better [1], in my opinion.

[1] [http://www.anandtech.com/show/11372/intels-itanium-takes-
one...](http://www.anandtech.com/show/11372/intels-itanium-takes-one-last-
breath-9700-series-released)

------
sfifs
Question seeking to understand - Why would anyone buy a tech component product
that is explicitly called out as last of its kind?

~~~
nickpsecurity
1\. To keep your stuff running that's stuck on that chip. Replacements are
being made but not finished for many customers. The article actually covers
this for HP's customers. OpenVMS was on Itanium as well with a x86 port
underway. SGI was on Itanium but moved to Xeon more quickly.

2\. To get benefits that chip has that others don't. Secure64 is the main use-
case I know for this. For me, I considered buying some SGI Altix's off ebay
that were going for $100-200 each. Reliable chip and server with built-in
security features that almost nobody is targeting on top of that. Such an
approach has always paid off for me in the past.

[https://secure64.com/secure-operating-system/](https://secure64.com/secure-
operating-system/)

------
chx
I much enjoyed
[http://www.pcmag.com/article2/0,2817,2339629,00.asp](http://www.pcmag.com/article2/0,2817,2339629,00.asp)
this review of Itanium history.

------
my123
32nm chip in 2017, probably to reduce costs.

------
arnon
I'm extremely surprised Intel was working on that at all. Probably retiring a
bunch of CPU people as a result

------
jabl
Oh dear, won't somebody think of Paul DeMone? Poor soul must be going through
an existential crisis..

