Reed Solomon codes are cool

subjoriented · on Feb 25, 2019

Wow. I love explaining Reed Solomon codes and this didn't do it justice.

The basic concept is that if I give you five points all on the same line, you only need any two of them to reconstruct the line.

Reed Solomon doesn't use lines (it isn't optimal) and the geometry they use isn't the Cartesian plane (this requires infinite precision) - but this is what the codes do!

Just like it requires two points to reconstruct a line, three points are required to reconstruct a simple polynomial (degree 2). Reed Solomon uses "high degree polynomials" which require n+1 points to reconstruct the degree-n polynomial.

The codes take data, make a polynomial out of it, and then send points from the polynomial to the other side, which can interpolate it as soon as enough points are gathered!

All of this looks like impenetrable discretization and matrix operations if you just get a description of the algorithms, which makes it seem a lot less approachable than it is.

wlesieutre · on Feb 25, 2019

I remember playing a computer game in math class where you were presented with a XY plane and had to "hit" points (really circles, didn't need to be exact) by plotting equations through them. With some trial and error you could make a polynomial go through basically as many as you wanted, it's neat to see this applied to a real problem!

kragen · on Feb 25, 2019

Doing this kind of thing was my first introduction to Lagrange interpolation, too, but it turns out that you can do Lagrange interpolation without trial and error; you can just use linear algebra, since the points are a linear function of the coefficients, regardless of the degree of the polynomial.

This was in fact the first decoding algorithm for Reed–Solomon codes, but it's not very efficient when you don't know which of your points are the ones with the corrupted data; you kind of have to guess, which gets expensive fast when you're trying to tolerate more than one or two errors.

Although it turns out that there are better algorithms for that, the most commonly used RS codes don't actually encode a series of points on a polynomial, as subjoriented's comment at the root of this thread suggests; instead they are so-called "BCH codes", where you consider the data you transmit (both the original data and the "parity" symbols) to be actually a sequence of coefficients. So where does the redundancy come from? After all, any set of numbers is valid as the coefficients of a polynomial.

BCH codes require the polynomial to be divisible by a known generator polynomial, and that's where you get the redundancy you need to correct errors. Gorenstein and Zierler published an efficient error-correction ("decoding") algorithm for BCH codes in 1960; the first efficient decoding algorithm for original-view RS codes (where you transmit a series of points) wasn't found until 1986, and even today, BCH-code decoding algorithms are more efficient than original-view decoding algorithms.

The Gorenstein–Zierler algorithm works by evaluating the received polynomial at the roots of the generator polynomial. Since it's supposed to be a multiple of the generator, it should be zero at those roots, as the generator is; if it's nonzero, the values at those roots are due to some "error polynomial" that's been conceptually added to your message in transit. If you suppose that it has only a few nonzero coefficients, there's a clever way to compute the error polynomial from those values that should have been zero. This allows you to subtract it from the received message to correct the errors.

At least, I think that's how it works. I haven't implemented an RS decoder yet, so I might be getting some of this wrong.

But something like that was what I was actually hoping to read at the above link.

nayuki · on Feb 25, 2019

I explained all the linear algebra involved in the Peterson-Gorenstein-Zierler decoder and offer a working implementation in https://www.nayuki.io/page/reed-solomon-error-correcting-cod... ; https://news.ycombinator.com/item?id=13565939

It runs in O(n^3) time, and unfortunately is not the state of the art. What actual Reed-Solomon implementations use are the Berlekamp-Massey and Forney algorithms, which run in O(n^2) time.

abraae · on Feb 26, 2019

Incredible. This is the sort of thing my non-programmer friends think that my work entails. Thank god it doesn't.

kragen · on Feb 25, 2019

Oh, that's wonderful! Your explanations are always the best. Thank you, magical girl Nayuki!

aaaaaaaaaaab · on Feb 25, 2019

That’s an interesting duality between the coefficient and pointwise representation. I wonder whether this connects to the discrete Fourier transform, which can be viewed as the evaluation of a polynomial at the complex n-th roots of unity. (and the inverse DFT must be equivalent to Lagrange interpolation I guess - insert some handwaving here)

Also, the BCH encoder multiplies the input sequence with the generator polynomial, which is a convolution of their coefficients. Fourier-type transforms (i.e. number theoretic transform) relate convolution and pointwise multiplication, so I feel there’s an underlying connection here, but I don’t have enough experience with finite fields to connect the dots...

JadeNB · on Feb 25, 2019

You seem to have answered your own question! For any set of `n + 1` sample values for degree-at-most-`n` polynomials, there's a map from the coefficient representation to the pointwise representation (obvious), and one going the other way (Lagrange interpolation). The DFT is just one instance of this, but a particularly nice one, because the map is essentially involutive, and subject to considerable computational speed-up.

What dots do you want to connect? (Anyway you can connect them with a high-degree polynomial curve. :-) )

aaaaaaaaaaab · on Feb 25, 2019

Indeed, the wikipedia article on RS codes [1] mentions this equivalence between the DFT and RS encoding/decoding.

They mention that both procedures are basically a multiplication by a Vandermonde matrix, with the Fourier transform having the additional requirement of using the n-th roots of unity.

https://en.m.wikipedia.org/wiki/Reed–Solomon_error_correctio...

tempguy9999 · on Feb 25, 2019

I believe it was Green Globs. I read about it years ago as being a noteworthy pedagogical aid. Wish I'd had it when I was in school.

philsnow · on Feb 25, 2019

I played such a game, called Green Globs, in junior high school. I spent probably 20 minutes tweaking an ellipse equation to hit 7 or 8 globs.

choochootrain · on Feb 25, 2019

what a blast from the past - i remember this game and it blew my mind when i first realized i had the power to plot a polynomial through all those points with little effort required

scottlocklin · on Feb 25, 2019

You win at explaining Reed Solomon codes, forever.

Please do this for Berlekamp-Massey algorithm! Is it EM algorithm for this problem, or am I smoking banana peels?

djhworld · on Feb 25, 2019

I'm sure there are simpler ways to explain it, I'm new to the topic, sorry if you didn't like it.

subjoriented · on Feb 25, 2019

I don't think anything you wrote is wrong!

I have a background in error codes - hamming, and luby and raptor and, and, and. Reed-Solomon is particularly "beautiful" from a mathematics perspective (more so than Raptor) - something I can usually explain to anyone and get them interested.

kragen · on Feb 25, 2019

Do you want to give it a try? I just attempted to explain RS in https://news.ycombinator.com/item?id=19248444 but I am significantly impeded by the fact that I don't actually understand it myself, so I'm sort of summarizing the Wikipedia article. Am I focusing on the right algorithms? Did I explain them correctly, as far as my short explanation goes?

_underfl0w_ · on Feb 25, 2019

The explanation from BackBlaze that you linked to goes into more depth.

It looks to be built on matrix algebra. Kinda nifty how simple and elegant the underlying concept is, really. Reminds me of the surprising conceptual simplicity behind Diffie-Hellman.

speleo_engr · on Feb 25, 2019

I guess simple is relative, but in my opinion, coding theory is not simple. It requires some high level math to understand what is going on. Many people's eyes will glaze over when you go into Galois (finite) fields. Abstract algebra is not part of the typical undegrad CS/engineering math curriculum. And then the efficient algorithms for the decoding procedure were found much later than Reed & Solomon's original paper by Berlekamp.

JadeNB · on Feb 25, 2019

> Abstract algebra is not part of the typical undegrad CS/engineering math curriculum.

It should be (says the mathematician)! That it doesn't probably comes from confusing the elegant theoretical perspective on finite fields to the computational perspective, which (at least from my pure-math point of view) can get pretty hairy.

isoprophlex · on Feb 25, 2019

Well it might not be the most in depth explanation, but your enthusiasm is appreciated, and contagious too.

I'm down the Wikipedia rabbit hole too, now.

dnautics · on Feb 25, 2019

if you want to learn finite fields somewhat more formally, but not so formally, this is a pretty good introduction (yes, it's for AES, but the principles are the same): https://www.youtube.com/watch?v=x1v2tX4_dkQ

MereInterest · on Feb 26, 2019

Conceptually, that sounds the same as Shamir's Secret Sharing. Do these have similar implementations as well as similar underlying concepts?

jwerle · on Feb 26, 2019

I believe they both use Lagrange interpolation

dicroce · on Feb 25, 2019

This reminds me of DCT's...

317070 · on Feb 25, 2019

If this interests you, make sure to look into Golay codes: https://en.wikipedia.org/wiki/Binary_Golay_code This code is a mathematical miracle. Apparently, number theory comes together exactly such as to allow you to code 12 bits into 23 bits. Such that you can correct all possible 3 bit errors, but none of the possible 4 bit errors (which means it is efficient). AND it detects all of the possible 7 bit errors, but none of the possible 8 bit errors.

That is very efficient. So in exchange for storing all your data about twice, you can detect all errors where ~30% of the bits are flipped. You can even _correct_ all errors where only 12% of the bits were flipped!

I still consider it magic that such an efficient and 'perfect' code exists. There is unfortunately a proof that is the best one, and that there is no bigger perfect one.

jacobolus · on Feb 25, 2019

Everything related to the sporadic groups is crazy https://en.wikipedia.org/wiki/Sporadic_group

aaaaaaaaaaab · on Feb 25, 2019

This post should be titled “Error correcting codes are cool”, as it contains zero bits of information that’s specific to Reed-Solomon codes.

If you’re interested in Reed-Solomon codes (and other error correcting codes), I recommend reading Huffman’s (NB: not the same Huffman as in Huffman codes!) textbook on the subject:

Fundamentals of Error Correcting Codes

https://www.amazon.com/Fundamentals-Error-Correcting-Codes-C...

djhworld · on Feb 25, 2019

Fair comment, I'm new to this subject so wanted to write about what I was reading about, but looking forward to reading about this topic further!

aDfbrtVt · on Feb 25, 2019

I'd also recommend Error Control Coding by Lin/Costello

akalin · on Feb 25, 2019

I got excited about erasure codes a while ago and tried my best to give an accessible intro at https://www.akalin.com/intro-erasure-codes (with HN discussion at https://news.ycombinator.com/item?id=18702843 ).

Glad to see other people getting excited by them too!

kragen · on Feb 25, 2019

This looks great! I haven't finished reading it, but does it explain Gallager decoding or RS decoding? At a first skim, it doesn't seem to, but that might be my lack of knowledge of the topic.

akalin · on Feb 25, 2019

Thanks! It doesn't explain Gallager codes since I don't know much about them (but will look into them now!). It only briefly mentions RS decoding, since the RS erasure code uses Vandermonde matrices, and Cauchy matrices seem more foolproof for an intro article. I was planning on writing a follow-up article on PAR1 (which uses RS erasure codes, but slightly incorrectly) but alas, my free time isn't what it used to be...

gwbas1c · on Feb 25, 2019

One of the things that I miss about analog electronics is how the signal degrades nicely. For example, with FM, if I drive under a bridge the radio gets a little fuzzy or switches to mono. With newer digital satellite radio, the playback just stops.

Evan back with old analog TV broadcasts, if there was a signal issue, the picture would just get fuzzy. Now with HDTV, the picture goes away completely.

I know a lot of this has to do with analog formats being highly inefficient and thus having lots of redundancy.

dfox · on Feb 25, 2019

I distinctly remember that one of the marketing features of DVB-T was "perfect picture or none at all" while I haven't seen any consumer grade DVB-T receiver that actually behaves that way and instead marginal signal results in various ugly MPEG artifacts.

toast0 · on Feb 25, 2019

I've not used DVB-T, but ATSC (which on Linux uses the DVB-T subsystem); the range of conditions where you can get perfect picture is much larger than from analog broadcast TV, but the experience with marginal signals is indeed much less watchable. If you've got nothing else to watch, you can watch an analog broadcast that's more snow than picture, as long as you're getting the sync signals well enough.

quink · on Feb 25, 2019

Hierarchical Modulation. It’s in the standard, but essentially nobody used or used it.

In fact, Europe (or rather large parts of) seems to be on the verge of shutting down terrestrial TV entirely.

major · on Feb 25, 2019

After learning about Hamming code (the predecessor of Reed-Solomon), I wrote a tutorial that explains Hamming code and how to implement it in a simple simulator. https://manuelfi.com/blog/hamming-code-simulator/

FR10 · on Feb 26, 2019

Hi major, I just read your post and its great, thank you, it really has inspired me to learn more on erasure codes. I only have one doubt: in the Example Problem, I understand the 4th parity bit should be the 8th position (starting the first at 1), however you placed it in the 9th position, and I think this is the reason why (according to me) in the 2nd parity check the bit should be 1 instead of 0, however your answer was still right, so could you tell me what is the correct place for the 4th parity bit?

christophclarke · on Feb 25, 2019

Computerphile did a video recently giving a 10,000 foot overview of Reed Solomon codes https://www.youtube.com/watch?v=fBRMaEAFLE0

philsnow · on Feb 26, 2019

My favorite class of error correcting codes is one I haven't seen mentioned here: rateless / "fountain" codes. https://en.wikipedia.org/wiki/Fountain_code

You take any message and can generate any amount of coding symbols from the message, and a receiver that gathers _any_ K of those coding symbols can decode the message.

The mental image I have is using these to send messages into space -- a receiver could pick up at any point and just wait until they have any K coding symbols and they'll be able to decode the message.

My understanding of all this stuff comes from USENET and par/par2/yenc (where, IIRC, the message gets packaged up in plaintext blocks and if your local server is missing N blocks you have to collect N parity blocks and you can recover from there, but my recollection is probably fuzzy), and I imagine rateless codes would be quite useful for USENET binaries.

ignoramous · on Feb 25, 2019

You know what? Not just Reed-Solomon codes, but Information Theory is cool.

People don't realise how ubiquitous different forms of encoding schemes are (I am leaving out compression/decompression schemes altogether here, but it has always amazed me as to what profound impact Information Theory has, and yet it is so out of the way):

Forward error-correction codes (FEC) at L2: IBM's elegant 8b/10b encoding https://en.wikipedia.org/wiki/8b/10b_encoding used in pre-10g era Ethernet, and its successor 64b/66b https://en.wikipedia.org/wiki/64b/66b_encoding used in post-10g era Ethernet / Infiniband / RoCE, and the sister 128b/132b encoding used by PCI-e 3 and USB 3 standards.

Linear error-correcting codes (LEC) for Volatile Memory: https://en.wikipedia.org/wiki/Hamming_code

FEC for HD Voice Transmission over Cellular Networks: AMR used in 3G and VoIP https://en.wikipedia.org/wiki/Adaptive_Multi-Rate_audio_code... and AMR WB for VoLTE aka HD Voice: https://en.wikipedia.org/wiki/Adaptive_Multi-Rate_Wideband

Fountain Codes for Digital TV on Mobile (DVB-H): Qualcomm's Raptors https://en.wikipedia.org/wiki/Raptor_code

Turbo Codes for 3G/4G Connectivity (OFDM), Satellite communication: https://en.wikipedia.org/wiki/Turbo_code

Optimal Erasure Coding (for data at rest, and for Mobile TV, and for 3G/4G): Reed-Solomon and its variants are a clear winner here (used by dropbox https://blogs.dropbox.com/tech/2016/07/pocket-watch/ and rumored to have been used by S3 https://perspectives.mvdirona.com/2009/06/erasure-coding-and... )

Upcoming: https://en.wikipedia.org/wiki/Low-density_parity-check_code LDPC codes are recommended for use by data-channel in 5G replacing Reed-Solomon whilst Polar Codes are adopted for the control-channel in 5G https://en.wikipedia.org/wiki/Polar_code_(coding_theory)

---

Ref: https://en.wikipedia.org/wiki/Error_correction_code

rando231 · on Feb 25, 2019

I am amazed by the brilliant application of 8B/10B encoding to recover clocks: https://en.wikipedia.org/wiki/Clock_recovery

Such a brilliant system.

pmarreck · on Feb 25, 2019

Are any of the ones that are better than Reed-Solomon either open-source or otherwise out of patent protection yet? A library I could play with, perhaps? (would love to get my hands on a Raptor or Turbo implementation)

zokier · on Feb 25, 2019

libraptorq (lgpl3+): https://fenrirproject.org/Luker/libRaptorQ/wikis/home

zfec (gpl2+): https://github.com/tahoe-lafs/zfec (discussion: https://news.ycombinator.com/item?id=12976168)

liberasurecode (bsdesque): https://github.com/openstack/liberasurecode

cm256 (3 clause bsd): https://github.com/catid/cm256

openrq (apache2.0): http://openrq-team.github.io/openrq/

openfec: http://openfec.org/

lt-codes-python: https://github.com/Spriteware/lt-codes-python

pmarreck · on Feb 27, 2019

The libraptorq link isn't loading, but I will try to Google.

Regardless, thanks for these!!

speleo_engr · on Feb 25, 2019

"Better than Reed-Solomon" depends on what you mean. R-S is an optimal erasure code and for storage applications, erasure code is usually what is desired.

In terms of turbo codes, I think the CCSDS turbo code is out of patent, at least 5,446,747 has been expired for over 5 years.

https://public.ccsds.org/Pubs/101x0b4s.pdf

ZoomZoomZoom · on Feb 25, 2019

Would love to read popular explanations as well as opinions on Mojette Transform[1] as and alternative (superior?) technique for erasure coding.

1. https://en.wikipedia.org/wiki/Mojette_Transform

userbinator · on Feb 26, 2019

This is a very practical explanation of how to actually implement RS codes with a focus on QR codes, and one of the best ones I've come across that doesn't immediately go into the deep theoretical maths of it: https://en.wikiversity.org/wiki/Reed%E2%80%93Solomon_codes_f...

It's very enlightening to see finite-field arithmetic implemented in a few lines of Python --- it gives something concrete to look at and think about.

filipo · on Feb 25, 2019

Last month I tried to use Reed Solomon ECC in order to do error correction. I stopped using it, the moment I understand that error correction does not necessarily mean "error detection". If one does not exactly know the number of possible errors within encoded data, RS-ECC may "correct" to a wrong result (if the data contains more errors than the ECC is able to correct). So checksums might be the better approach (in certain scenarios) if data integrity has the highest priority... I wish, Reed-Solomon would handle that.

sliken · on Feb 27, 2019

Or both. Much like encryption, to avoid various attacks you need to add a checksum and then encrypt.

jl2718 · on Feb 25, 2019

RS codes give great reliability but galois polynomial solvers are slow. Parity (xor) bits are fast, but they can’t fix more than one error, which means it’s really not reliable to fix an error at all. Use parity bits to check for an error, and then RS to fix it.

That backblaze speed quoted seems really slow.

sliken · on Feb 27, 2019

Yes, but it's Java. Read the comments for a version in go that's "an order of magnitude faster". For their use case though I suspect that handling GigE per server is plenty.

Waterluvian · on Feb 26, 2019

So is the idea basically that you can recover any missing n% of a dataset by using an extra n% of disk space to store specially calculated data, rather than 100% extra disk space (because you don't know which pieces are going to go missing)?

rurban · on Feb 25, 2019

I do use it in LibreDWG, but I don't find it cool or useful enough. For CD's or WiFi with lossy transports yes, but for harddiscs (bitflips don't happen) not needed and not so easy to plug in.

mark-r · on Feb 25, 2019

Actually it does happen all the time for hard discs, but the disk itself uses error correcting codes so that it's rare to see an error exposed. The bigger use is in the Backblaze example where you have multiple discs and one may get flakey or fail entirely.

speleo_engr · on Feb 25, 2019

Consumer HDDs get bit flips all the time. I have plenty of backup files over 100 GB. Put the file on the drive, compute the MD5SUM of it, check the MD5SUM, and come back to check again in 6 months. On many magnetic HDDs, you will get a different value and no SMART errors.

rurban · on Feb 27, 2019

Whow, thanks

ithkuil · on Feb 26, 2019

When you have tons of disks you notice how often you have whole disk failures

albru123 · on Feb 26, 2019

I'm quite disappointed that nobody mentioned Viterbi algorithm, as it's often used together with RS and really interesting.

glandium · on Feb 25, 2019

Obligatory computerphile video: https://www.youtube.com/watch?v=fBRMaEAFLE0