One difference is that with secret sharing unless you have enough chunks you do not have any information about the result. (That's why it's called secret sharing--you can share a secret with a bunch of people without any of them knowing it unless a minimum number get together.) The neat trick is that any subset of the chunks big enough will get you all of the information.
In fact, the algorithm (at a high level) is exceptionally simple: encode your data as the coefficients of a polynomial of degree n and transmit it as a bunch of points in the polynomial. You then need n+1 points to recover the polynomial, but which particular points you have does not matter.
What I love about this algorithm is that it is useful, nontrivial and can be explained in a single sentence (by a better writer than I :)).
An infinite number of Euclidean planes can be drawn containing a single point, but you only need three of these—and it can be any three—to find the original point.
Its always bothered me to read about inventors who discovered beautiful and amazing things, only to die penniless while powerful interests reaped all of the financial benefits. Hopefully the people who discovered these codes have been richly rewarded for their work.
1. Is it non-obvious enough?
2. Would the patent realistically benefit the inventor?
3. Would the patent benefit the society at large?
1 - Physical devices often require lots of engineering work, prototyping, testing and refinement before they really work. Patents aim to protect your investment of money and time into all that stuff. This is not applicable to pure ideas. Once you thought of something like fountain codes, you instantly know they will work and you even know how. There is no investment to protect.
2 - Just because something is useful doesn't mean people will use it, even for free, and the act of patenting a generic algorithm will certainly drive most potential users away. Not because licensing costs money, but because it's a hassle and a liability.
3 - This was covered in many other posts here. I'll add one thing. The fact that it's just an idea (and a generic one that that) means other bright people who need that idea are likely to come up with it on their own. In that case a patent would be clearly counter-productive for the society. Why should the first 'inventor' of the algorithm has all the rights to it, while the second would have none?
I put creation in quotes, because of an issue that nicely demonstrates how fundamentally different ideas are from physical property: consider the many scientific/mathematic discoveries that were made in parallel around the world, with no communication between the parties, who gains the right to "own" these ideas, and why should they? Did not somebody else take exactly the same amount of effort and process to create the exact same idea? This even applies to very complex systems that are far from obvious, quite novel, and form the foundations of our modern world.
People need to get past the idea that somebody has a brilliant idea one day out of the blue, and the world is changed forever because of it. Every idea anyone has ever had owes almost everything to what proceeded it, and the environment it arose in, co-discovery/invention is very solid proof of that.
"If I have seen a little further it is by standing on the shoulders of Giants."
When you take the time to realize the very thing that propels us forward is the ability to build on existing ideas, you realize how patents and IP law in general are impeding, not encouraging.
(there are several more, but i do not have the time just now to find them)
I'd suggest that the usefulness of algorithms like fountain codes (and other innovative algorithms like RSA and many compression techniques) leads to the opposite conclusion: such algorithms prove so broadly useful that patents on them have even more of a negative effect than usual.
With 3), the inventor never gets a chance to commercialize his invention: as soon as he publishes the algorithm, he will be destroyed (or assimialted) by large players. With 2), we never know how many cool, useful algorithms are rotting away in commercial products. As far as the society is concerned, the algorithm in question does not exist. With 1), the inventor benefits, AND also the society (in the long-term).
IMO, the patent system should be reformed so that having a large portfolio of unused patents becomes a liability. For example, the patent holder would have to periodically (e.g., every 2 years) apply [with fee, of course] for patent renewal. For this, the holder must be able to document that he is actively exploiting the patent (1). There should also probably be a minimum limit for the required income from patent exploitation. Failing to do so, the patent would automatically expire. The patent office should publish an on-line database of expired patents.
There should also be restrictions on what can be patented: for example, work funded by government funds must not be patentable: the inventors' (in this case, mostly researchers at universities) day job is by definition designing things that will benefit society, are paid for that by the society, and therefore have no further rights to patent protection. (This is, IMO, fair since research is [should be] free, so the society has no firm guarantees that the researchers will produce something valuable, or anything at all.)
(1) By exploitation I mean either making a device covered by the patent, or licensing it to somebody who making such device.
Aside from that, I'd point out that in the rare case when an algorithm does get published as a patent and licensed, that makes it entirely unusable in Free and Open Source Software as well as in open standards, for a period of ~20 years. At least when the algorithm remains unpublished, someone can re-discover it (or something more useful that would build on it); patents force willful ignorance of an entire class of algorithms, as well as anything in the general vicinity of those algorithms.
So you would like to have had eg. any useful non-obvious algorithm or data structure which at some point in history was novel been patented ? Yikes.
1. A ^ B ^ C
2. A ^ B
3. B ^ C
The algorithm as described would in this situation cause the receiver to wait until it's received a single block, either A or B or C, before decoding anything. This strikes me as inefficient.
Edited to add: I think this is analogous to forward vs. backwards chaining: you can either start by combining single blocks with composite blocks to simplify them, or start by combining blocks whose component block lists differ by only a single block. Or you could apply both, which should get the greatest simplification with the smallest amount of input.
The algorithm is also not so much about computational efficiency, to check at any point if you can or cannot decode requires quite a bit of work anyhow. The main advantage of the algorithm is in its relative low complexity and high efficiency on the transmission line.
In practice however, depending on the receiver, you may not want it to solve a general system of linear equations for every packet it receives when n is big. It might be better to stick to a simple algorithm (which has a better time complexity if certain conditions are satisfied by the linear system) and just wait to accumulate enough equations for it to become applicable.
I can see also use cases for this in a unicast transmissions but even there the idea would be about simplifying the communications layer, for example in the case of usage for a fixed rate transmission and with complete disregard for the loss rate. Not sure what I'd think about bandwidth usage and the stop criteria in this case.
Where do I find stuff like this?!
Edit: Discussion here: http://news.ycombinator.com/item?id=2141542
The D.J.C.MacKay paper linked from the article lists applications in broadcast and data storage, but Reed-Soloman would work equally well in both those applications.
Hypothesis: With Reed-Soloman and friends, if you're unlucky and loose the /right/ n blocks, for small n, you can't recover your data. Is that n larger for LT? If I could maliciously corrupt your LT blocks, would I have to corrupt more with LT?
Hypothesis: If I randomly corrupt blocks with Reed-Soloman, does my likelihood of not being able to recover all my data drop faster than if I did so with LT?
Can someone say authoritatively?
By the way, I've been talking about channels with an erasure rate (the Binary Erasure Channel) because fountain codes were originally invented in that context. But you can find in the literature papers on "Rateless codes for noisy channels" that show that the fountain codes also work fine for bit-flipping channels like the binary symmetric channel and the additive white Gaussian noise channel.
So could we think of fountain codes as a kind of adaptive on-the-fly RS code in which one sends a block protected against one's best guess of the erasure rate and if the recipient fails to decode it one resends with a higher erasure rate, or if the recipient succeeds one sends the next block with a lower erasure rate?
If you just used RS, the only way to overcome really long bursts of errors (thousands or millions of bits long) would be to either to request acknowledgements from the receivers and retransmit unacknowledged blocks. If acknowledgements weren't possible, you would need to transmit everything multiple times. And, if you had enough receivers, all missing random packets, you would end up retransmitting everything even if you had acknowledgements. By using fountain codes, you can transmit an extra few percent of blocks and still succeed in the face of long dropouts.
That's one way to think of it, but more generally they correct symbol errors by transmitting extra symbols. In some cases a symbol is a bit, but if you want a packet erasure code you just define a symbol to be a packet (or a disk block or whatever). A practical example of this is par2.
jsyedidia explained the benefits of ratelessness, so I won't repeat it.
Looks like they were acquired by Qualcomm a few years back:
To make the page readable, pull up the dev console and run `document.body.style.color='black'`
I expect I was first introduced to it through squarefree, but the Beatie page is a simpler place for a surfer to land for that one specific thing, which is why I chose it. I didn't realize it was also a straight-up, uncredited copy.
Why is this better than just broadcasting messages that always contain a single block? What do you gain by forming combinations? You still have an expectation of eventually getting all the information. I assume it just turns out to take longer on average because your expected gain in information is lower from single chunks once you already have many of them?
If you assume the same network environment that the article does, namely one where no feedback occurs from the recipient to the sender, then if you miss a single block in the message, you'll have to wait for the sender to send that one block again. Since the sender has no information about what block you missed, you'll have to wait on average for half the blocks to get retransmitted, resulting in 50% overhead or more. It gets worse if you miss enough blocks that you have a non-trivial chance of missing those blocks again. (On top of that, some transmission mechanisms have properties that make particular data patterns particularly prone to loss, which makes it significantly more likely that you'll lose particular blocks repeatedly.)
With fountain codes and similar mechanisms, on the other hand, every message provides several chances to obtain the missing blocks, resulting in much less overhead.
In particular, if you intend to coexist fairly with TCPIP on an internet you might find your rate limiting code carries all the most of the work and timing issues of TCPIP anyway.
In the application I worked on we used multicast and implemented a form of multicast congestion control that was orthogonal to the coding but exploited the fact that we can send however much data as we wanted and the only thing that changed was the time it will take to complete the transfer. It worked beautifully with the coding and co-habitated nicely with TCP.
For the curious, Knuth wrote an interesting letter on the topic back in 1994: http://www.pluto.it/files/meeting1999/atti/no-patents/brevet...
When filing an algorithm patent, patent lawyer will tell you to remove every instance of the word "algorithm" and replace it with some other word. This seems to be necessary for getting the patent through.
[...] at some point you may find yourself pulling diagrams from thin air and thinking "why am I doing this?" (but then remembering that it's because of the bonus)
The complacency in the latter quote is (probably) why so many well-meaning engineers inadvertently exacerbate the problem.
Those same software patents also included, in addition to all the illustrations of the algorithm, a diagram of a stack of computer discs capable of containing the program, in order to demonstrate that (at least in some embodiments) the patent actually covered a physical thing rather than an algorithm.