
The CRC Wild Goose Chase - nkurz
http://www.embeddedrelated.com/showarticle/669.php
======
mrb
It is strange that the author stops short of making a recommendation for a CRC
algorithm. He mentions he likes the one used in PKZIP and PNG, but out of the
dozens of options in
[http://en.wikipedia.org/wiki/Cyclic_redundancy_check](http://en.wikipedia.org/wiki/Cyclic_redundancy_check)
that is not the one I would recommend. Instead CRC-32C, named after its
designer Castagnoli, is probably the best choice:

CRC-32C is the algorithm that Intel implemented in its processors ("CRC32"
SSE4.2 instruction). CRC-32C is also used by both ext4 and btrfs for bit rot
detection. The first and most prominent user of CRC-32C was iSCSI.

Its mathematical definition (at least one of the most "official" definitions)
is found in
[http://tools.ietf.org/html/rfc3720#section-12.1](http://tools.ietf.org/html/rfc3720#section-12.1)
as a generator given in hex. The RFC gives 2 other references: an analysis
where it was first published and discussed
([http://ieeexplore.ieee.org/ielx1/26/5993/00231911.pdf?tp=&ar...](http://ieeexplore.ieee.org/ielx1/26/5993/00231911.pdf?tp=&arnumber=231911&isnumber=5993))
and another RFC
([http://tools.ietf.org/html/rfc3385](http://tools.ietf.org/html/rfc3385))
that contains test vectors and implementations. However they are hardware
implementations, not software. The number of documents you have to reference
for a complete spec (definition + code + test vectors) is annoying... I share
the author's frustration in regard to this.

If you need more than 32 bits for an integrity check, a CRC is probably not
what you need (want to prevent intentional malicious data tampering? use a
crypto hash function).

------
Isamu
CRCs (and other error checking codes) are routinely misused and misunderstood
by the engineers that implement systems with them.

For instance, how do you know a CRC will detect the errors you care about? For
that matter, do you have any idea of the most common errors your system will
have to deal with?

Error-detecting codes have an implicit model for the kinds of errors they will
effectively detect. They are valid for systems that fit this model, and less
effective (possibly staggeringly ineffective) for systems that don't fit that
model.

CRCs date from the days of serial communications, and they were well suited to
the kinds of errors you would see: bit errors that were independent and
randomly distributed, and short burst errors. If you conducted experiments to
measure the probability distribution of errors on your communication medium,
you could calculate the probability of an undetected error with a given code.
Without that you have _no model_ of the reliability of your communication
system.

For instance, with CD's (or DVD's) they didn't just slap a CRC on there - they
made a model of the kinds of errors they expected, in this case they expected
big scratches on a CD to create very long burst errors, and they need to be
able to not just detect but also correct them. Based on the density of
recording, they made a reasonable estimation of the length of errors they had
to deal with.

To do this they used a powerful Reed-Solomon code, and interleaved the blocks
in order to handle the extremely long burst error requirement, and meet their
reliability goals.

Look at it another way. So you slap a CRC on a file. Great, now it can detect
one- or two-bit errors, and burst errors as big as the CRC. Are those the most
common errors in your file system? That is, what is the likelihood that the
first file corruption you see will be of a kind that your CRC will detect with
only a low probability? File corruptions are poorly characterized.

Still, a CRC is cheap and better than nothing, so we slap them on.

Anyway if you like math at all, I recommend delving into codes, it really is
fun and fascinating.

------
PhantomGremlin
CRCs make sense for a quick sanity check for short messages sent over serial
links. But nowadays there's a much better, very well specified, method for
checking the integrity of data blocks or files, and that's SHA-256. I don't
think that anyone has even found a single example of a collision for it. (Yes
of course collisions exist, they're guaranteed once you go past 32 bytes of
input).

SHA-256 is truly a "no brainer" for a lot of different situations.

~~~
tyho
This is not what CRC's are used for. CRC's are faster to compute, which really
matters on Gbit network links. Also CRC's can allow you to not only detect
errors but fix them within a certain threshold without retransmitting.

~~~
PhantomGremlin
You're of course correct that CRCs are very useful for low level operations
such as individual Ethernet packets.

But I was trying to respond to scenarios such as at the start of the article:

    
    
       You want to send data from point A to point B.
       ...
       you should use a CRC so you can detect errors.
    

I think that's obsolete thinking. Yes, you can use a CRC for checking
individual blocks being transferred. But you're much better off also verifying
a complete data transfer using something like SHA-256.

When, 25 years ago, PKZIP stored a 32 bit CRC together with a file, that was
great for its time. But nowadays an archiver should use something more robust,
such as SHA-256.

I've personally encountered miscorrections on the old ST-506 disk drives
because the 32-bit CRC used was junk. People like Neal Glover gave us much
better computer generated polynomials. The industry switched to better schemes
such as cross-interleaved Reed Solomon. And that was the extent of the error
detection and correction.

But now we know better. Given how close to the bleeding edge disk drives are,
it's probably necessary that they continue to do error correction for each
sector (although that _has_ been debated). But a _system_ is better off
overlaying that with software such as ZFS which uses SHA-256 on individual
blocks and continues up to the root node, thereby creating a Merkle tree which
provides very strong data integrity guarantees.

That's the point I was trying to make. In the old days we used CRC for
everything, because we didn't know better. Nowadays we can continue to use CRC
at the lower levels, but we should add SHA-256.

------
ksherlock
The last time I needed to generate a CRC, I used this
([http://www.tty1.net/pycrc/](http://www.tty1.net/pycrc/)) to generate the
code to generate the CRC.

