Hacker News new | past | comments | ask | show | jobs | submit login
Reed-Solomon error-correcting code decoder (nayuki.io)
62 points by nayuki on Feb 4, 2017 | hide | past | web | favorite | 21 comments

My first real exposure to RS ECC was when I had to reverse-engineer a hardware ECC. There's plenty of material out there, and I spent a significant amount of time reading through various articles, but IMHO the maths/theory really got in the way of understanding what actually happens, i.e. all of them talk about finite fields and polynomials and such, but almost none answered the main question I had: "what does this mean for the bits and bytes I'm looking at?" Furthermore, there is also a scarcity of material discussing RE of RS (i.e. finding the parameters used.)

Eventually I came across https://en.wikiversity.org/wiki/Reed%E2%80%93Solomon_codes_f... which was really enlightening, and I wrote some code to essentially bruteforce combinations of generator polynomials and bit/byte ordering --- and luckily discovered the details of the implementation on the first try. (The first polynomial in my list was 0x211, and it was the right one.)

The final implementation turned out to be relatively straightforward, less than 250 lines of C for a correcting decoder including the unusual (but probably trivial/circuit-simplifying) bit permutation/expansion scheme the original hardware used. The core algorithm itself is mostly composed of XORs and table lookups. I believe I used BkM and Forney for the correction part. If anyone is curious, I'll see if I can find and release some of the code I used.

Do you still have that bruteforce code? I've been trying on and off to reverse engineer the RS ECC in Chirp, a (now discontinued) app for sharing tokens over speaker/microphone.

My current progress is summarised in https://math.stackexchange.com/questions/663643/discover-par... and https://github.com/moreati/chirppy/blob/master/trials.py

A few references I've found suggest that is RS(9,5) and might not actually be using GF(32); although you have 18 "symbols" of 90 bits total, with 50 bits of data and 40 parity, I doubt those are the actual RS symbols.

In fact, those parameters line up with 9 * 10-bit symbols in each block, 5 10-bit data and 4 10-bit ECC. I would try a shortened RS(1023,1019) in GF(1023), taking pairs of the 5-bit "symbols".

Here is the code I originally used. I was bruteforcing a non-shortened RS(511,503) in GF(511) which had a somewhat complex transformation between bytes and symbols, fortunately partly documented at the time:

http://pastebin.com/6nK2GSBh (partly anonymised. public domain.)

Something else you may find useful, more Python code for bruteforcing RS parameters:


A good example of this being implemented is in parchives. Parchives were originally made as a way to recover from errors on unreliable usenet servers, but now I think they are mostly used for bitrot recovery. https://en.wikipedia.org/wiki/Parchive

I wonder why this kind of thing is not implemented transparently at the filesystem level.

Yup. I manually use par2 for my family photos on my Mac and it's a bit of a hassle.

That's what raid5/raidz basically is.

but RAID by definition involves multiple supports. So it is not very practical for long term archiving.

@nayuki In your Python code, the module 'fieldmath' imports 'numbers'. However it only uses a single reference - 'numbers.Integral'. Did you mean to remove this when you modified the code from your _Gauss-Jordan elimination over any field_ page?

ETA: Scratch that, numbers is a module in the stdlib https://docs.python.org/3/library/numbers.html#module-number... sorry for the confusion

This is why you can scratch a cd-rom and it still works fine. If I recall correctly the version used in cd-roms is an eleventh order (x^11 + x^10...) polynomial.

As someone who can go through algebra and linear math this was wayyyy too heavy me. It was very hard to follow and map it to how to code such an algorithm.

You really need to give a number of examples. Remember human brains are fast example based learners. We see patterns and make abstract concepts out of it. Here the abstract concepts is the crazy math.

You are not alone. Perhaps you may like https://en.wikiversity.org/wiki/Reed%E2%80%93Solomon_codes_f...

Thanks for the link. This was a great read.

Another thing - the linear algebra presented in the article is all at high school or early-university level: systems of linear equations, matrices, row reduction, unique solutions, solution space. It might be heavy in terms of quantity, but I would not consider it to be terribly advanced math.

I think the only part that most students have not seen is the finite field arithmetic, because that knowledge is pretty specialized for people who actually design math for computers to execute.

This is probably best learned from a good lecturer. There is a lot of background on field theory required to make this click.

Sorry about not meeting your expectations. I was planning to make an interactive JavaScript demo in the future to show all the steps of the computation with actual numbers as examples.

For the time being, try running the Python code and adding print statements to the places of interest.

What is linear math?

This stuff is taught in intro discrete mathematics courses.

Another example of Reed-Solomon codes in the wild: RS codes are used in QR codes because they're particularly resistant to localized errors, e.g. a physically-contiguous section of the code becoming damaged or unreadable.

physically-contiguous section of the code becoming damaged or unreadable.

This is because they operate on multi-bit symbols. To the algorithm, one bad bit in a symbol is no worse than all bad bits.

There's a lot more info on that here:


I was motivated to learn Reed-Solomon decoding precisely because it was the only technical aspect of QR Code decoding that I couldn't implement from scratch!

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact