

SSE4.2 and the new CRC32 instruction - rbranson
http://byteworm.com/2010/10/13/crc32/

======
kia
VIA has hardware AES encryption, SHA hashing as well as hardware RNG in its
CPUs for quite some time now.

<http://www.via.com.tw/en/initiatives/padlock/hardware.jsp>

~~~
yatsyk
<http://en.wikipedia.org/wiki/AES_instruction_set> it seems Intel also has
similar instruction set. Could be useful if you need a lot of encryption
(truecrypt partition encryption etc.)

~~~
pasbesoin
I've read varying comments regarding this with respect to the Intel CORE CPU
models that have it active. Early on, a review comment or two claimed to
observe little effect e.g. with TrueCrypt. More recent comments I've seen have
claimed to observe a much greater effect; IIRC TrueCrypt 7, recently released,
is the first version to use the AES instructions when available.

P.S. Note, if you are interested in this feature, that not all levels of the
CORE product line have it. For example, in the current mobile line, I believe
it's present/activated only at the 520M level and above (although my knowledge
is some months dated, at this point).

Ah, I see this information is echoed in the Wikipedia reference:

[http://en.wikipedia.org/wiki/AES_instruction_set#CPUs_with_A...](http://en.wikipedia.org/wiki/AES_instruction_set#CPUs_with_AES_instruction_set)

------
there
also: <http://www.strchr.com/strcmp_and_strlen_using_sse_4.2>

_SSE 4.2 introduces four instructions (PcmpEstrI, PcmpEstrM, PcmpIstrI, and
PcmpIstrM) that can be used to speed up text processing code (including
strcmp, memcmp, strstr, and strspn functions)._

------
almost
Does anyone know any systems where generating CRC32s is a bottleneck?

~~~
rbranson
Perhaps not a bottleneck, but for very high network throughput systems,
offloading the TCP checksumming can make more CPU available for other tasks.
Often the offload engines built into network cards are either slow or have
very poorly written drivers.

~~~
almost
I forgot about TCPs use of CRC32. Still, I wonder if the time spent in CRC32
is a noticeable fraction of the total time. I guess it must be in some
situations, otherwise why would they put it in the chip...

~~~
dfox
That might be caused by the fact, that TCP does not use CRC, but it's own
checksum algorithm (faster, less relaible).

~~~
vilda
...and very often is calculated by network card itself and left blank by OS.
(See checksum errors in wireshark.)

------
rbanffy
A sha256 would be handy for block-level deduplication.

------
andyv
He probably should have compared one assembler program with another, but CRC
computation is so simple that the C program probably generates optimal
assembly code.

~~~
lrm242
Even though the C program might generate optimal assembly, it doesn't generate
optimal assembly for calculating a CRC32 checksum -- that's the whole point of
the article: new Intel processors have a single instruction to do this.

