Note that the size of the field only has to be as large as the size of the redundancy set (data + code). So if you are striping across 16 or fewer disks, you can use GF(2^4), but if you are striping across 17 disks you can’t use GF(2^4).
You're off by one. There's a "minus 1" because Reed Solomon is over the cyclical multiplication group, not over the cyclical additive group. (TL;DR: The number 0 cannot participate in... some portion of the formulation of the polynomials)
GF(2^3) is over blocksize of 7. GF(2^4) is over blocksize of 15. GF(2^8) is over 255.
---------
GF(2^8) is the smallest size in practice, because most people's data is organized into 8-bits-at-a-time today. (aka: a byte). But you're mostly right (once you account for the off-by-one error): there's a limit to 255 elements per block in GF(2^8) Reed Solomon.
But GF(2^3) or GF(2^4) are great for hand-exercises, the smaller size makes it far easier to compute calculations by hand and allow for the human to gain understanding. (aka: Octal 2^3 numbers or Hexadecimal 2^4 numbers). GF(2^2) is technically possible... but all the matrix math is over 1x1 matricies and not really a good opportunity to learn from. GF(2^1) is literally impossible for Reed Solomon to work in. So GF(2^3) is the smallest practical hand-exercise size (RS(7, 5) over GF(2^3) has a 5x7 generator matrix and a 7x2 check matrix. Small enough to do by hand though a bit tedious).
If you actually wanted to "split symbols up" so that you better detect bit-errors, then LDPC or Turbocodes are straight up superior over Reed Solomon for "small symbol sets". Reed Solomon is best used over "large symbols" (aka: 2^8 or larger).
----------
Any Reed Solomon code can be punctured (aka: fewer parity symbols), or shortened (aka: fewer data symbols) by simply setting locations of the Reed-Solomon generator and/or check matrix to 0.
As such, if you organize "smaller sets" into 2^8 (aka: 8-bit bytes), its just very natural to use a punctured + shortened Reed Solomon code instead of trying to get a 2^4 Reed Solomon code to work.
> IIRC GF(2^4) is actually used in practice. Long story short, the CPU overhead of rebuilding data is a real concern.
Interesting. I don't know about how these systems are implemented in practice, but I'd be surprised if GF(2^8) was "too big".
GF(2^8) was small enough that it was implemented in 1977 on the Voyager probe project. Yeah, it took a literal supercomputer to decode that RS-code in the 1980s, but modern computers are basically as fast as 1980s-era supercomputers.
I mean, I've done GF(2^4) RS codes by hand. That's a very, very, very small system, human calculable with just pen and paper. So you're right that its far easier to calculate. I'm just kind of surprised that anyone would bother with a computer RS code of that miniscule size.
Ex: RS(15, 9) is a 9x15 generator matrix and a 6x15 check matrix over GF(2^4). Its on the large-side of hand-doable calculations, but absolutely doable by pen-and-paper (I recommend graph paper). Every symbol is a 2^4 hexadecimal symbol, so a single hand-written character (0123456789ABCDEF) is stored on a single cell in graph paper.
-------
Are you sure those systems are working in GF(16) fields? Not GF(2^16) AKA gf(65536) fields? GF(2^16) is more of the size I'd expect a computer to do, more so than GF(2^4).
Yes, I am sure that it’s GF(2^4) and not GF(2^16). The problem is not that GF(2^8) or GF(2^16) are “too big”, it’s just that GF(2^4) is faster and uses less CPU.
The purpose of using GF(2^128) or other large sizes is usually cryptographic, which has different tradeoffs. For example, Galois counter mode. The linked paper describes secure storage mechanisms.
Also note that you’re not just doing one field operation per byte, but several. If I’m remembering correctly, let’s say you’re using an (11,8) code, then you’re doing 24 multiply and 21 addition operations to calculate the encoded message. You might think of various ways to accelerate this which benefit from a smaller field size, especially if you are doing the calculations on commodity CPUs. The only point I’m making here is that the computation consumes some nontrivial amount of CPU resources, and so it can be fruitful to optimize.
The fact that space probes like Voyager use a large field size is just because the cost of resources is different. For space probes, bandwidth is a precious resource. If you spent $250M on a single space probe, you can afford to use a bit more CPU to encode and decode your puny 100 kHz downlink. On the other hand, if you are running a data center, the power consumption of CPUs is one of your largest costs.
> Yes, I am sure that it’s GF(2^4) and not GF(2^16). The problem is not that GF(2^8) or GF(2^16) are “too big”, it’s just that GF(2^4) is faster and uses less CPU.
A multiply in GF(2^8) is just a single multiplication lookup. 256 x 256 == 64kB lookup table. I'm not sure how much faster you can get than a single 8-bit lookup on a table small enough to fit in a typical CPU's cache.
GF(2^8) is so small that you don't even need to do the log-table / antilog-table trick. Just store the entire multiplication table, since the multiplication table fits in L2 cache.
A GF(2^4) multiply would be a single multiplication lookup 16 x 16 or 256 byte (0.25kB) lookup table. Very, very tiny, probably no faster than the GF(2^8) lookup.
> Also note that you’re not just doing one field operation per byte, but several. If I’m remembering correctly, let’s say you’re using an (11,8) code, then you’re doing 24 multiply and 21 addition operations to calculate the encoded message.
That RS(11,8) code (probably shortened / punctured) could be a GF(2^8) code. The question is: do you want to organize your data in 8-bit bytes in GF(2^8), or 4-bit nibbles in GF(2^4)?
I expect that 8-bit bytes is actually faster for modern computers than 4-bit nibbles. Even if you are using a severely shortened / punctured code, there are strong benefits to sticking with 8-bit bytes on a modern computer.
That's the thing: computers usually have 8-bit bytes as the smallest pragmatic unit to use. GF(2^8) lines up with that perfectly, and is used in a variety of high-performance applications (including AES).
> The fact that space probes like Voyager use a large field size is just because the cost of resources is different. For space probes, bandwidth is a precious resource. If you spent $250M on a single space probe, you can afford to use a bit more CPU to encode and decode your 100 kHz downlink.
The Voyager probe launched in 1977, it was with an utter crap CPU by today's standards. Your USB power-cable has more CPU power and RAM than the voyager probe... that tiny USB CPU is used to negotiate the voltage level (5V vs 12V USB power negotiation)
--------
Voyager launched with 32k 18-bit words of RAM and 4MHz for its 6x parallel CPUs (handling different parts of the craft: flight vs imaging unit vs radio unit... etc. etc.). I think you're severely overestimating the strength of Voyager probe, or the level of technology available to 1977.
> A multiply in GF(2^8) is just a single multiplication lookup. 256 x 256 == 64kB lookup table. I'm not sure how much faster you can get than a single 8-bit lookup on a table small enough to fit in a typical CPU's cache.
64 KB often won’t fit in L1 cache anyway.
> A GF(2^4) multiply would be a single multiplication lookup 16 x 16 or 256 byte (0.25kB) lookup table. Very, very tiny, probably no faster than the GF(2^8) lookup.
Or you can do three operations using a single 16 x 16 x 16 x 16 lookup table, and introduce fewer data dependencies into your CPU pipeline.
The operations work on half as many bits, but complete three times as many steps. Sounds like a win to me.
> I expect that 8-bit bytes is actually faster for modern computers than 4-bit nibbles. Even if you are using a severely shortened / punctured code, there are strong benefits to sticking with 8-bit bytes on a modern computer.
You keep saying this. I understand how CPUs are byte-addressed. When you operate on smaller sizes, you have to do packing / unpacking. If your CPU core spends most of its time doing table lookups, then a little bit of packing / unpacking is nearly free, since your ALUs are otherwise fairly idle, assuming you have a couple registers free.
Well, if I remember the math correctly.