
Hardware-acclerated text processing in SSE 4.2 - mbrubeck
http://blog.reverberate.org/2009/07/18/gazelle-is-going-to-love-sse-4-2/
======
mahmud
Hennessy, Sequin and Patterson must be rolling in their office chairs or
riterment hammocks right about now.

Whoever _needed_ this could have gotten it in custom hardware; Intel's
"instruction set is an API" is a really bad idea.

Can someone convince me why we need this in the processor?

~~~
DarkShikari
One thing you'll often find is that processor vendors like to release
instructions that expose already-existing capability within the chip, but
which previously wasn't directly accessible.

I doubt this takes a remotely significant amount of silicon.

Plus, it's all part of the XML hype: people include XML-related technologies
in software and now even hardware (they've marketed these as, among other
things, accelerators for XML parsing) solely for the purpose of being
"Enterprise-grade".

Here's a fun fact for you: a very very major network hardware vendor sells
extraordinarily expensive boards with "hardware XML parsing" (that's how they
can charge so much). In reality, it turned out to be a nightmare to implement,
so they just did it in software--and didn't tell their customers.

~~~
mahmud
It's the _exposure_ that's bad, not the existance of the capability in
silicon. Think of all the programmer-hours that will go into retrofiting
compilers, libraries and runtime with "hardware accelerated text processing",
just because some bonehead PHB heard this news and suddenly _wants_ it?

They could have done the right thing and started stripping x86-32 cruft from
x86-64 as it gained more traction. As time moved on, we would have had a clean
architecture with features being _deprecated_ instead of being added.

~~~
DarkShikari
_Think of all the programmer-hours that will go into retrofiting compilers,
libraries and runtime with "hardware accelerated text processing"_

Don't worry, odds are almost nobody will actually use it.

 _They could have done the right thing and started stripping x86-32 cruft from
x86-64 as it gained more traction._

There isn't really any "cruft" that matters; the old useless instructions do
nothing but waste instruction code space, but that's not really a big issue;
the real significant improvement would come by re-doing x86 as a three-operand
architecture (and make other similar improvements that would be impossible to
do "bit by bit"). Another potential improvement would be to just re-do the
instruction coding to make it faster to parse; instruction decoding is
becoming a significant bottleneck on x86 already. If Intel was going to do an
overhaul like that, though, they'd just do it all at once.

AVX is actually beginning to go that way; we're going to have three-operand
for SIMD, even though we won't have it for regular instructions.

~~~
limmeau
What's the difference between "re-doing x86 as a three-operand architecture"
and switching to a different 64-bit RISC architecture?

~~~
DarkShikari
The former could probably be done without a complete retooling of the chip
designs.

~~~
limmeau
I don't think it will happen, though; the ratio of people who care about the
elegance of their processor's assembly language to people who buy computers is
just too low.

Not in form of a clean cut, at least: x86-64 brought us eight additional
registers and removed the silly BCD-instructions.

------
DarkShikari
Remember, the #1 rule of new instruction sets released by Intel (at least in
the past ~5 years) is that they are always close to useless in the first
architecture that supports them. For example, PHADDW took 6 cycles on the Core
2 Conroe, making it almost useless in real code. But the Penryn doubled its
speed, making it potentially useful.

The string operations are, last I saw, something on the order of 9 cycles
latency, making them rather unfortunately slow in practice... fitting
perfectly with the trend mentioned above.

