
Reed Solomon codes are cool - djhworld
https://djhworld.github.io/post/2019/02/24/reed-solomon-codes-are-cool/
======
subjoriented
Wow. I love explaining Reed Solomon codes and this didn't do it justice.

The basic concept is that if I give you five points all on the same line, you
only need any two of them to reconstruct the line.

Reed Solomon doesn't use lines (it isn't optimal) and the geometry they use
isn't the Cartesian plane (this requires infinite precision) - but this is
what the codes do!

Just like it requires two points to reconstruct a line, three points are
required to reconstruct a simple polynomial (degree 2). Reed Solomon uses
"high degree polynomials" which require n+1 points to reconstruct the degree-n
polynomial.

The codes take data, make a polynomial out of it, and then send points from
the polynomial to the other side, which can interpolate it as soon as enough
points are gathered!

All of this looks like impenetrable discretization and matrix operations if
you just get a description of the algorithms, which makes it seem a lot less
approachable than it is.

~~~
wlesieutre
I remember playing a computer game in math class where you were presented with
a XY plane and had to "hit" points (really circles, didn't need to be exact)
by plotting equations through them. With some trial and error you could make a
polynomial go through basically as many as you wanted, it's neat to see this
applied to a real problem!

~~~
kragen
Doing this kind of thing was my first introduction to Lagrange interpolation,
too, but it turns out that you can do Lagrange interpolation without trial and
error; you can just use linear algebra, since the points are a linear function
of the coefficients, regardless of the degree of the polynomial.

This was in fact the first decoding algorithm for Reed–Solomon codes, but it's
not very efficient when you don't know which of your points are the ones with
the corrupted data; you kind of have to guess, which gets expensive fast when
you're trying to tolerate more than one or two errors.

Although it turns out that there are better algorithms for that, the most
commonly used RS codes _don 't_ actually encode a series of points on a
polynomial, as subjoriented's comment at the root of this thread suggests;
instead they are so-called "BCH codes", where you consider the data you
transmit (both the original data and the "parity" symbols) to be actually a
sequence of _coefficients_. So where does the redundancy come from? After all,
any set of numbers is valid as the coefficients of a polynomial.

BCH codes require the polynomial to be _divisible_ by a known _generator
polynomial_ , and that's where you get the redundancy you need to correct
errors. Gorenstein and Zierler published an efficient error-correction
("decoding") algorithm for BCH codes in 1960; the first efficient decoding
algorithm for original-view RS codes (where you transmit a series of points)
wasn't found until 1986, and even today, BCH-code decoding algorithms are more
efficient than original-view decoding algorithms.

The Gorenstein–Zierler algorithm works by evaluating the received polynomial
at the roots of the generator polynomial. Since it's supposed to be a multiple
of the generator, it should be zero at those roots, as the generator is; if
it's nonzero, the values at those roots are due to some "error polynomial"
that's been conceptually _added_ to your message in transit. If you suppose
that it has only a few nonzero coefficients, there's a clever way to compute
the error polynomial from those values that should have been zero. This allows
you to subtract it from the received message to correct the errors.

At least, I think that's how it works. I haven't implemented an RS decoder
yet, so I might be getting some of this wrong.

But something like that was what I was actually hoping to read at the above
link.

~~~
nayuki
I explained all the linear algebra involved in the Peterson-Gorenstein-Zierler
decoder and offer a working implementation in
[https://www.nayuki.io/page/reed-solomon-error-correcting-
cod...](https://www.nayuki.io/page/reed-solomon-error-correcting-code-decoder)
;
[https://news.ycombinator.com/item?id=13565939](https://news.ycombinator.com/item?id=13565939)

It runs in O(n^3) time, and unfortunately is not the state of the art. What
actual Reed-Solomon implementations use are the Berlekamp-Massey and Forney
algorithms, which run in O(n^2) time.

~~~
abraae
Incredible. This is the sort of thing my non-programmer friends think that my
work entails. Thank god it doesn't.

------
317070
If this interests you, make sure to look into Golay codes:
[https://en.wikipedia.org/wiki/Binary_Golay_code](https://en.wikipedia.org/wiki/Binary_Golay_code)
This code is a mathematical miracle. Apparently, number theory comes together
exactly such as to allow you to code 12 bits into 23 bits. Such that you can
correct all possible 3 bit errors, but none of the possible 4 bit errors
(which means it is efficient). AND it detects all of the possible 7 bit
errors, but none of the possible 8 bit errors.

That is very efficient. So in exchange for storing all your data about twice,
you can detect all errors where ~30% of the bits are flipped. You can even
_correct_ all errors where only 12% of the bits were flipped!

I still consider it magic that such an efficient and 'perfect' code exists.
There is unfortunately a proof that is the best one, and that there is no
bigger perfect one.

~~~
jacobolus
Everything related to the sporadic groups is crazy
[https://en.wikipedia.org/wiki/Sporadic_group](https://en.wikipedia.org/wiki/Sporadic_group)

------
aaaaaaaaaaab
This post should be titled “Error correcting codes are cool”, as it contains
zero bits of information that’s specific to Reed-Solomon codes.

If you’re interested in Reed-Solomon codes (and other error correcting codes),
I recommend reading Huffman’s (NB: _not_ the same Huffman as in Huffman
codes!) textbook on the subject:

Fundamentals of Error Correcting Codes

[https://www.amazon.com/Fundamentals-Error-Correcting-
Codes-C...](https://www.amazon.com/Fundamentals-Error-Correcting-Codes-Cary-
Huffman/dp/0521131707)

~~~
djhworld
Fair comment, I'm new to this subject so wanted to write about what I was
reading about, but looking forward to reading about this topic further!

------
akalin
I got excited about erasure codes a while ago and tried my best to give an
accessible intro at [https://www.akalin.com/intro-erasure-
codes](https://www.akalin.com/intro-erasure-codes) (with HN discussion at
[https://news.ycombinator.com/item?id=18702843](https://news.ycombinator.com/item?id=18702843)
).

Glad to see other people getting excited by them too!

~~~
kragen
This looks great! I haven't finished reading it, but does it explain Gallager
decoding or RS decoding? At a first skim, it doesn't seem to, but that might
be my lack of knowledge of the topic.

~~~
akalin
Thanks! It doesn't explain Gallager codes since I don't know much about them
(but will look into them now!). It only briefly mentions RS decoding, since
the RS erasure code uses Vandermonde matrices, and Cauchy matrices seem more
foolproof for an intro article. I was planning on writing a follow-up article
on PAR1 (which uses RS erasure codes, but slightly incorrectly) but alas, my
free time isn't what it used to be...

------
gwbas1c
One of the things that I miss about analog electronics is how the signal
degrades nicely. For example, with FM, if I drive under a bridge the radio
gets a little fuzzy or switches to mono. With newer digital satellite radio,
the playback just stops.

Evan back with old analog TV broadcasts, if there was a signal issue, the
picture would just get fuzzy. Now with HDTV, the picture goes away completely.

I know a lot of this has to do with analog formats being highly inefficient
and thus having lots of redundancy.

~~~
dfox
I distinctly remember that one of the marketing features of DVB-T was "perfect
picture or none at all" while I haven't seen any consumer grade DVB-T receiver
that actually behaves that way and instead marginal signal results in various
ugly MPEG artifacts.

~~~
toast0
I've not used DVB-T, but ATSC (which on Linux uses the DVB-T subsystem); the
range of conditions where you can get perfect picture is much larger than from
analog broadcast TV, but the experience with marginal signals is indeed much
less watchable. If you've got nothing else to watch, you can watch an analog
broadcast that's more snow than picture, as long as you're getting the sync
signals well enough.

------
major
After learning about Hamming code (the predecessor of Reed-Solomon), I wrote a
tutorial that explains Hamming code and how to implement it in a simple
simulator. [https://manuelfi.com/blog/hamming-code-
simulator/](https://manuelfi.com/blog/hamming-code-simulator/)

~~~
FR10
Hi major, I just read your post and its great, thank you, it really has
inspired me to learn more on erasure codes. I only have one doubt: in the
Example Problem, I understand the 4th parity bit should be the 8th position
(starting the first at 1), however you placed it in the 9th position, and I
think this is the reason why (according to me) in the 2nd parity check the bit
should be 1 instead of 0, however your answer was still right, so could you
tell me what is the correct place for the 4th parity bit?

------
christophclarke
Computerphile did a video recently giving a 10,000 foot overview of Reed
Solomon codes
[https://www.youtube.com/watch?v=fBRMaEAFLE0](https://www.youtube.com/watch?v=fBRMaEAFLE0)

------
philsnow
My favorite class of error correcting codes is one I haven't seen mentioned
here: rateless / "fountain" codes.
[https://en.wikipedia.org/wiki/Fountain_code](https://en.wikipedia.org/wiki/Fountain_code)

You take any message and can generate any amount of coding symbols from the
message, and a receiver that gathers _any_ K of those coding symbols can
decode the message.

The mental image I have is using these to send messages into space -- a
receiver could pick up at any point and just wait until they have any K coding
symbols and they'll be able to decode the message.

My understanding of all this stuff comes from USENET and par/par2/yenc (where,
IIRC, the message gets packaged up in plaintext blocks and if your local
server is missing N blocks you have to collect N parity blocks and you can
recover from there, but my recollection is probably fuzzy), and I imagine
rateless codes would be quite useful for USENET binaries.

------
ignoramous
You know what? Not just Reed-Solomon codes, but Information Theory is cool.

People don't realise how ubiquitous different forms of encoding schemes are (I
am leaving out compression/decompression schemes altogether here, but it has
always amazed me as to what profound impact Information Theory has, and yet it
is so out of the way):

Forward error-correction codes (FEC) at L2: IBM's elegant 8b/10b encoding
[https://en.wikipedia.org/wiki/8b/10b_encoding](https://en.wikipedia.org/wiki/8b/10b_encoding)
used in pre-10g era Ethernet, and its successor 64b/66b
[https://en.wikipedia.org/wiki/64b/66b_encoding](https://en.wikipedia.org/wiki/64b/66b_encoding)
used in post-10g era Ethernet / Infiniband / RoCE, and the sister 128b/132b
encoding used by PCI-e 3 and USB 3 standards.

Linear error-correcting codes (LEC) for Volatile Memory:
[https://en.wikipedia.org/wiki/Hamming_code](https://en.wikipedia.org/wiki/Hamming_code)

FEC for HD Voice Transmission over Cellular Networks: AMR used in 3G and VoIP
[https://en.wikipedia.org/wiki/Adaptive_Multi-
Rate_audio_code...](https://en.wikipedia.org/wiki/Adaptive_Multi-
Rate_audio_codec) and AMR WB for VoLTE aka HD Voice:
[https://en.wikipedia.org/wiki/Adaptive_Multi-
Rate_Wideband](https://en.wikipedia.org/wiki/Adaptive_Multi-Rate_Wideband)

Fountain Codes for Digital TV on Mobile (DVB-H): Qualcomm's Raptors
[https://en.wikipedia.org/wiki/Raptor_code](https://en.wikipedia.org/wiki/Raptor_code)

Turbo Codes for 3G/4G Connectivity (OFDM), Satellite communication:
[https://en.wikipedia.org/wiki/Turbo_code](https://en.wikipedia.org/wiki/Turbo_code)

Optimal Erasure Coding (for data at rest, and for Mobile TV, and for 3G/4G):
Reed-Solomon and its variants are a clear winner here (used by dropbox
[https://blogs.dropbox.com/tech/2016/07/pocket-
watch/](https://blogs.dropbox.com/tech/2016/07/pocket-watch/) and rumored to
have been used by S3 [https://perspectives.mvdirona.com/2009/06/erasure-
coding-and...](https://perspectives.mvdirona.com/2009/06/erasure-coding-and-
cold-storage/) )

Upcoming: [https://en.wikipedia.org/wiki/Low-density_parity-
check_code](https://en.wikipedia.org/wiki/Low-density_parity-check_code) LDPC
codes are recommended for use by data-channel in 5G replacing Reed-Solomon
whilst Polar Codes are adopted for the control-channel in 5G
[https://en.wikipedia.org/wiki/Polar_code_(coding_theory)](https://en.wikipedia.org/wiki/Polar_code_\(coding_theory\))

\---

Ref:
[https://en.wikipedia.org/wiki/Error_correction_code](https://en.wikipedia.org/wiki/Error_correction_code)

~~~
pmarreck
Are any of the ones that are better than Reed-Solomon either open-source or
otherwise out of patent protection yet? A library I could play with, perhaps?
(would love to get my hands on a Raptor or Turbo implementation)

~~~
zokier
libraptorq (lgpl3+):
[https://fenrirproject.org/Luker/libRaptorQ/wikis/home](https://fenrirproject.org/Luker/libRaptorQ/wikis/home)

zfec (gpl2+): [https://github.com/tahoe-lafs/zfec](https://github.com/tahoe-
lafs/zfec) (discussion:
[https://news.ycombinator.com/item?id=12976168](https://news.ycombinator.com/item?id=12976168))

liberasurecode (bsdesque):
[https://github.com/openstack/liberasurecode](https://github.com/openstack/liberasurecode)

cm256 (3 clause bsd):
[https://github.com/catid/cm256](https://github.com/catid/cm256)

openrq (apache2.0): [http://openrq-team.github.io/openrq/](http://openrq-
team.github.io/openrq/)

openfec: [http://openfec.org/](http://openfec.org/)

lt-codes-python: [https://github.com/Spriteware/lt-codes-
python](https://github.com/Spriteware/lt-codes-python)

~~~
pmarreck
The libraptorq link isn't loading, but I will try to Google.

Regardless, thanks for these!!

------
ZoomZoomZoom
Would love to read popular explanations as well as opinions on Mojette
Transform[1] as and alternative (superior?) technique for erasure coding.

1\.
[https://en.wikipedia.org/wiki/Mojette_Transform](https://en.wikipedia.org/wiki/Mojette_Transform)

------
userbinator
This is a very practical explanation of how to actually implement RS codes
with a focus on QR codes, and one of the best ones I've come across that
doesn't immediately go into the deep theoretical maths of it:
[https://en.wikiversity.org/wiki/Reed%E2%80%93Solomon_codes_f...](https://en.wikiversity.org/wiki/Reed%E2%80%93Solomon_codes_for_coders)

It's very enlightening to see finite-field arithmetic implemented in a few
lines of Python --- it gives something concrete to look at and think about.

------
filipo
Last month I tried to use Reed Solomon ECC in order to do error correction. I
stopped using it, the moment I understand that error correction does not
necessarily mean "error detection". If one does not exactly know the number of
possible errors within encoded data, RS-ECC may "correct" to a wrong result
(if the data contains more errors than the ECC is able to correct). So
checksums might be the better approach (in certain scenarios) if data
integrity has the highest priority... I wish, Reed-Solomon would handle that.

~~~
sliken
Or both. Much like encryption, to avoid various attacks you need to add a
checksum and then encrypt.

------
jl2718
RS codes give great reliability but galois polynomial solvers are slow. Parity
(xor) bits are fast, but they can’t fix more than one error, which means it’s
really not reliable to fix an error at all. Use parity bits to check for an
error, and then RS to fix it.

That backblaze speed quoted seems really slow.

~~~
sliken
Yes, but it's Java. Read the comments for a version in go that's "an order of
magnitude faster". For their use case though I suspect that handling GigE per
server is plenty.

------
Waterluvian
So is the idea basically that you can recover any missing n% of a dataset by
using an extra n% of disk space to store specially calculated data, rather
than 100% extra disk space (because you don't know which pieces are going to
go missing)?

------
rurban
I do use it in LibreDWG, but I don't find it cool or useful enough. For CD's
or WiFi with lossy transports yes, but for harddiscs (bitflips don't happen)
not needed and not so easy to plug in.

~~~
planteen
Consumer HDDs get bit flips all the time. I have plenty of backup files over
100 GB. Put the file on the drive, compute the MD5SUM of it, check the MD5SUM,
and come back to check again in 6 months. On many magnetic HDDs, you will get
a different value and no SMART errors.

~~~
rurban
Whow, thanks

------
albru123
I'm quite disappointed that nobody mentioned Viterbi algorithm, as it's often
used together with RS and really interesting.

------
glandium
Obligatory computerphile video:
[https://www.youtube.com/watch?v=fBRMaEAFLE0](https://www.youtube.com/watch?v=fBRMaEAFLE0)

