
The Itanium processor, part 1: Warming up - ingve
http://blogs.msdn.com/b/oldnewthing/archive/2015/07/27/10630772.aspx
======
nickpsecurity
One of the best benefits of Itanium are its security features:

[http://www.intel.com/content/dam/www/public/us/en/documents/...](http://www.intel.com/content/dam/www/public/us/en/documents/white-
papers/intel-itanium-secure-white-paper.pdf)

These are rarely discussed. However, Secure64's SourceT OS got plenty of
mileage out of them. You might not be able to argue for Itanium on cost,
performance, or ease of use. However, one could argue for it as a better start
on secure OS's or appliances. Unlike academic prototypes, it's also in
production with high speed and reliability. Let's also not forget you can do
reverse stacks and other bug prevention tricks with less performance hit or
clunkiness on a RISC architecture vs x86.

Personally, I'd just rather them have modernized the i960MX, minus other BiiN
stuff:

[https://en.wikipedia.org/wiki/BiiN](https://en.wikipedia.org/wiki/BiiN)

Targeting a robust, UNIX-compatible OS and C toolchain to it might have let it
survive and get continually updated. Then, when HLL's got popular, we'd have a
good hardware target for them that supported POLA inside applications. As
usual, I'm speaking of great stuff in a past or never-happened tense. Least
Itanium made it so far.

~~~
wglb
Here is an interesting (not) security feature:
[https://news.ycombinator.com/item?id=9956044](https://news.ycombinator.com/item?id=9956044)

~~~
nickpsecurity
Not doing it fully during a procedure call is an optimization. I responded to
the comment with links from 2002 and 2005 that showed this along with proper
way to save everything to memory. Shouldn't have been a surprise to someone
reading up on it.

~~~
wglb
So they used a securely designed component and got bit anyway.

~~~
nickpsecurity
That wasn't the security features. It was the features not designed for
security and targeting an audience wanting performance enhancements. The kind
of thing that often leads to unexpected problems. ;)

------
rwmj
You can buy old Itanium hardware super-cheap now on eBay. I got an HP
Integrity RX2620 for £58 which included tax and delivery.

As hardware goes it's .. interesting. It's fast. But it uses huge amounts of
power and requires massive cooling (if you disable any of the 4 or 5 fans in
my 2U machine, it overheats in 5 minutes). It has early EFI which should be
quite familiar if you've used UEFI on the command line. And it has excellent
iLO / remote / serial support so it's great practice for learning about
enterprise ops.

It's getting hard to find software that runs on it. The last Debian (Wheezy)
runs, but current Debian has dropped ia64 support. RHEL dropped support years
ago. You'll find there are lots of strange bugs because people no longer test
their software on this platform.

[https://rwmj.wordpress.com/2015/05/03/raise-the-itanic-
part-...](https://rwmj.wordpress.com/2015/05/03/raise-the-itanic-
part-2/#content)

[https://rwmj.wordpress.com/2014/09/08/raise-the-
itanic/#cont...](https://rwmj.wordpress.com/2014/09/08/raise-the-
itanic/#content)

~~~
zokier
> if you disable any of the 4 or 5 fans in my 2U machine, it overheats in 5
> minutes

Huh, I would have expected better redundancy given how expensive the hardware
must have been at the time.

~~~
rwmj
I agree it's strange. If you pull any fan, the machine hard shuts down a few
minutes later (it even warns you of this in a note printed inside the case).
Presumably if a fan fails and you don't manage to get to it in a few minutes,
then you're out of luck. The only good thing is that it _is_ possible to hot-
swap a fan in a few seconds if the fan is failing-but-not-failed (if that ever
happens - it seems unlikely).

Edit: Would love to know what the list price of my machine was back in 2006.
Probably thousands ...

~~~
jacquesm
> Probably thousands ...

Try 15K euros!

~~~
rwmj
Ouch! Thanks. I wonder if anyone ever paid list price for these? AIUI many
were given away for educational use and the like.

~~~
jacquesm
I never got further than a quote. The most exotic hardware I actually bought
were a DEC Alpha and a whole bunch of SGI gear (at considerable discount).

------
apaprocki
Brings back memories.. When running SpiderMonkey (interpreter, not JIT) on
IA64 it would randomly crash and burn with what looked like GC issues. Values
were being collected even though they were still in use. It turned out the
mark-and-sweep collector that would scan the registers and stack was working
properly, but was not aware that IA64 would not write out all of its registers
to the buffer provided to `setjmp()`. You would have to send the processor a
`flushrs` instruction to tell it to flush all stacked general registers in the
"dirty" partition (not yet written to the backing store) of the register stack
to the backing store. After that, you'd need to get the exact pointers to the
register backing store and then scan those. Fun times.

~~~
nickpsecurity
That comes with the RSE it uses for performance enhancement. The flushrs
requirement, etc were mentioned here:

Microsoft on RSE (2002)
[https://web.archive.org/web/20021018050724/http://portals.de...](https://web.archive.org/web/20021018050724/http://portals.devx.com/Intel/Article/6834/0/page/1)

Smotherman's notes (2002)
[http://people.cs.clemson.edu/~mark/subroutines/itanium.html](http://people.cs.clemson.edu/~mark/subroutines/itanium.html)

USENIX presentation on Itanium (2005)
[https://www.usenix.org/legacy/events/usenix05/tech/general/g...](https://www.usenix.org/legacy/events/usenix05/tech/general/gray/gray_html/)

I could see how you wouldn't expect it coming from another ISA and they could
be a bit more explicit. It was weird. It was documented, though, by different
people building on Itanium. Different workloads use it in a different way for
efficiency.

~~~
apaprocki
Sure.. it was straightforward to track down once you see that a value is in a
register but is being collected anyway. I seem to remember the other fun bits
were the fact that function pointers were not actually function pointers..
they were pointers into a giant lookup table that contained a struct that
contained the actual address of the call. Also, unwinding the stack was so
complicated that you could not reasonably do it manually like on nearly any
other architecture -- you needed to link in an Intel library to do it. None of
these things were giant issues -- it just showed how much of a departure from
other prevalent systems it was.

~~~
nickpsecurity
Wow, that does sound like a painful learning curve. Curious, were the function
pointer problems inherent to the architecture or the tool/lib you used? Might
be worth documenting in case a reader stumbles upon this before going through
what you did.

~~~
apaprocki
If you want all the nasty details, this post covers it all:

"This also has a side-effect on function pointers. Since function pointers are
generally used at some distance from allocation, they might be used in a
module with a different gp value. The compiler gets around this by not
compiling a function pointer to a single pointer-sized value; it compiles to a
pair of pointer-sized values, one representing the address of the first
instruction (bundle on IA64) in the function, the other being the correct gp
value to use."

[http://mikedimmick.blogspot.com/2004/01/ia64s-global-
pointer...](http://mikedimmick.blogspot.com/2004/01/ia64s-global-
pointer.html?m=1)

~~~
nickpsecurity
Wow. That's a huge mess of stuff to mentally track just for some optimization.
To be honest, I'd probably just end up writing a macro-assembler and coding it
directly against a C reference implementation just to avoid all the nonsense
lol.

------
nickpsecurity
HN readers interested in VLIW architectures might find the TRIPS project a
good read:

[http://www.cs.utexas.edu/~trips/overview.html](http://www.cs.utexas.edu/~trips/overview.html)

It tries to avoid some pitfalls of architectures such as Itanium. Seems fairly
complex to me, though. Might be inherent in EDGE and VLIW's, though.

~~~
Symmetry
And there's the Mill if you're willing to be somewhat more exotic.
[http://millcomputing.com/](http://millcomputing.com/)

~~~
nickpsecurity
It's a very interesting architecture that I've heard about on paper for a
while now but never seen on silicon. Have they done anything with it on
ASIC/FPGA or is it vaporware for now?

Talking exotic, look at No Instruction Set Computing (NISC) which was at least
prototyped and published synthesis/compilers:

[https://web.archive.org/web/20080302041756/http://www.ics.uc...](https://web.archive.org/web/20080302041756/http://www.ics.uci.edu/~nisc/)

Really interesting stuff. Reminded me of Tensillica's tools that create a
custom processor for your application. Need to accelerate your Hadoop, etc
application? Run most of it on Intel CPU with an onboard FPGA & NISC tools
doing the critical path. Intel's Altera acquisition might make something like
that achievable in future.

Note: Used archive because their site is having a configuration error.

~~~
WallWextra
Last I heard, the Mill guys don't even have a compiler working, but do have a
simulator. Not sure what they're simulating.

~~~
wtallis
Their June talk was about their compiler and toolchain, which borrows heavily
from LLVM. It's almost certainly not done, but they do have something more
than just an assembler. They've also started working on implementing it for
FPGA, but only as a proof of concept rather than something intended to be
ready to make into an ASIC.

------
protomyth
There are some embedded chips that use VLIW. RISC chips need to do instruction
scheduling to really get their performance up, but for a lot of embedded task
those extra transistors are a waste compared to a VLIW chip where the software
can actually do the scheduling well (unlike what happened to the Itanium).

~~~
pcwalton
In fact, you're quite possibly using a VLIW chip right now: your GPU.

~~~
protomyth
I though most of them switch when the whole GPGPU movement happened?

~~~
jcranmer
AMD is a VLIW architecture, NVidia is not.

Correction: AMD GPUs were VLIW when I took a class on GPGPU in 2011.
Apparently, AMD subsequently switched from VLIW:
<[https://en.wikipedia.org/wiki/Graphics_Core_Next>](https://en.wikipedia.org/wiki/Graphics_Core_Next>).

------
octatoan
What separates Itanium from other processors?

~~~
noipv4
VLIW architecture with predicated execution.

~~~
mappu
I don't think predicated execution is unheard of, ARM has the same concept
(although it's limited to flags instead of what seems like one-bit registers).
The feature has since been dropped in AArch64 though.

