Hacker News new | past | comments | ask | show | jobs | submit login
Damn Cool Algorithms: Fountain Codes (notdot.net)
306 points by nl on Jan 6, 2012 | hide | past | web | favorite | 62 comments

This sounds really similar to secret sharing[1]. While not doing exactly the same thing, a simple secret sharing algorithm is also interesting to read about now--it's a different algorithm chunking up and reassembling data in a similar way with different properties.

One difference is that with secret sharing unless you have enough chunks you do not have any information about the result. (That's why it's called secret sharing--you can share a secret with a bunch of people without any of them knowing it unless a minimum number get together.) The neat trick is that any subset of the chunks big enough will get you all of the information.

[1]: http://en.wikipedia.org/wiki/Shamir%27s_Secret_Sharing

In fact, the algorithm (at a high level) is exceptionally simple: encode your data as the coefficients of a polynomial of degree n and transmit it as a bunch of points in the polynomial. You then need n+1 points to recover the polynomial, but which particular points you have does not matter.

What I love about this algorithm is that it is useful, nontrivial and can be explained in a single sentence (by a better writer than I :)).

Another way of phrasing the same concept:

An infinite number of Euclidean planes can be drawn containing a single point, but you only need three of these—and it can be any three—to find the original point.

(if the original point is contained within a line that is also known. For example, if the original/desired point is on the x-axis)

Alas, digital fountains are tied up in patents, and people have been leery:


I know there are lots of anti-patent folks on HN, but if anything deserves patent protection, digital fountains certainly do. This is useful, novel, and certainly non-obvious stuff. And really quite beautiful, IMHO.

Its always bothered me to read about inventors who discovered beautiful and amazing things, only to die penniless while powerful interests reaped all of the financial benefits. Hopefully the people who discovered these codes have been richly rewarded for their work.

I would challenge this by asking the following questions:

1. Is it non-obvious enough? 2. Would the patent realistically benefit the inventor? 3. Would the patent benefit the society at large?

1 - Physical devices often require lots of engineering work, prototyping, testing and refinement before they really work. Patents aim to protect your investment of money and time into all that stuff. This is not applicable to pure ideas. Once you thought of something like fountain codes, you instantly know they will work and you even know how. There is no investment to protect.

2 - Just because something is useful doesn't mean people will use it, even for free, and the act of patenting a generic algorithm will certainly drive most potential users away. Not because licensing costs money, but because it's a hassle and a liability.

3 - This was covered in many other posts here. I'll add one thing. The fact that it's just an idea (and a generic one that that) means other bright people who need that idea are likely to come up with it on their own. In that case a patent would be clearly counter-productive for the society. Why should the first 'inventor' of the algorithm has all the rights to it, while the second would have none?

You seem to be operating from the premise the creator of an idea has some sort of intrinsic right to solely profit from ideas of his own "creation". There is no intrinsic right there, ideas are not physical property, we should not treat them like they are.

I put creation in quotes, because of an issue that nicely demonstrates how fundamentally different ideas are from physical property: consider the many scientific/mathematic discoveries[1] that were made in parallel around the world, with no communication between the parties, who gains the right to "own" these ideas, and why should they? Did not somebody else take exactly the same amount of effort and process to create the exact same idea? This even applies to very complex systems that are far from obvious, quite novel, and form the foundations of our modern world.

People need to get past the idea that somebody has a brilliant idea one day out of the blue, and the world is changed forever because of it. Every idea anyone has ever had owes almost everything to what proceeded it, and the environment it arose in, co-discovery/invention is very solid proof of that.

"If I have seen a little further it is by standing on the shoulders of Giants."

When you take the time to realize the very thing that propels us forward is the ability to build on existing ideas, you realize how patents and IP law in general are impeding, not encouraging.

[1] https://en.wikipedia.org/wiki/Calculus#Modern





(there are several more, but i do not have the time just now to find them)

While I don't begrudge Luby his patents on the specific codes he invented, IIRC he has claimed that his patents cover the entire class of all possible rateless codes, which is trollish behavior. One startup that I know of has gone out of business due to Luby.

> I know there are lots of anti-patent folks on HN, but if anything deserves patent protection, digital fountains certainly do.

I'd suggest that the usefulness of algorithms like fountain codes (and other innovative algorithms like RSA and many compression techniques) leads to the opposite conclusion: such algorithms prove so broadly useful that patents on them have even more of a negative effect than usual.

How does it have negative effect? Consider three cases: 1) an algorithm is designed and patented, and the inventor has the right to use or license it. 2) an algorithm is designed and never published, i.e., held secret for proprietary use of the inventor. 3) an algorithm is designed and put into public domain.

With 3), the inventor never gets a chance to commercialize his invention: as soon as he publishes the algorithm, he will be destroyed (or assimialted) by large players. With 2), we never know how many cool, useful algorithms are rotting away in commercial products. As far as the society is concerned, the algorithm in question does not exist. With 1), the inventor benefits, AND also the society (in the long-term).

IMO, the patent system should be reformed so that having a large portfolio of unused patents becomes a liability. For example, the patent holder would have to periodically (e.g., every 2 years) apply [with fee, of course] for patent renewal. For this, the holder must be able to document that he is actively exploiting the patent (1). There should also probably be a minimum limit for the required income from patent exploitation. Failing to do so, the patent would automatically expire. The patent office should publish an on-line database of expired patents.

There should also be restrictions on what can be patented: for example, work funded by government funds must not be patentable: the inventors' (in this case, mostly researchers at universities) day job is by definition designing things that will benefit society, are paid for that by the society, and therefore have no further rights to patent protection. (This is, IMO, fair since research is [should be] free, so the society has no firm guarantees that the researchers will produce something valuable, or anything at all.)

(1) By exploitation I mean either making a device covered by the patent, or licensing it to somebody who making such device.

You seem to suggest that algorithms won't get published if their authors can't patent them and commercialize them. Given that the vast majority of algorithms don't get patented, and that the vast majority of patents never get licensed except in bulk cross-licensing deals, this seems almost entirely untrue.

Aside from that, I'd point out that in the rare case when an algorithm does get published as a patent and licensed, that makes it entirely unusable in Free and Open Source Software as well as in open standards, for a period of ~20 years. At least when the algorithm remains unpublished, someone can re-discover it (or something more useful that would build on it); patents force willful ignorance of an entire class of algorithms, as well as anything in the general vicinity of those algorithms.

This is useful, novel, and certainly non-obvious stuff.

So you would like to have had eg. any useful non-obvious algorithm or data structure which at some point in history was novel been patented ? Yikes.

Why does the described algorithm only decode transmitted blocks when the receiver has decoded all but one of the constituent blocks? For example, if the receiver has received the following 3 blocks:

  1. A ^ B ^ C
  2. A ^ B
  3. B ^ C
it can combine #1 and #2 to decode C, then combine C and #3 to decode B, then combine B and #2 to decode A.

The algorithm as described would in this situation cause the receiver to wait until it's received a single block, either A or B or C, before decoding anything. This strikes me as inefficient.

Edited to add: I think this is analogous to forward vs. backwards chaining: you can either start by combining single blocks with composite blocks to simplify them, or start by combining blocks whose component block lists differ by only a single block. Or you could apply both, which should get the greatest simplification with the smallest amount of input.

Normally one would use some distribution of the number of encoded source blocks in a block. So one would send 30% single blocks, 30% double blocks, 15% triple, and so on until you get some relatively small number of high order encoded blocks. This means that while you can find a solution the way you posted it is a rare case that you have no single block for decoding and since in any case you need to receive a bit over 100% of the blocks before you can decode anything there is no real reason to go to exquisite lengths to find the perfect solution when the trivial case will work 100% of the times (well 99.9999% if you cared to be accurate :-)

The algorithm is also not so much about computational efficiency, to check at any point if you can or cannot decode requires quite a bit of work anyhow. The main advantage of the algorithm is in its relative low complexity and high efficiency on the transmission line.

My point isn't about computational efficiency, but bandwidth efficiency (or alternatively, decoding the full message as early as possible). As I understand it, the Moore's law doubling time for bandwidth is longer than for CPU speed, suggesting that processing is getting cheaper relative to bandwidth, so using more CPU in order to have to send fewer blocks is probably a long-term win.

The decoding amounts to solving a system of linear equation where each equation corresponds to one received packet. Assuming the file to be sent was broken into n parts, minimum n independent equations are required for reconstruction. To do that optimally, receiver needs to keep trying to solve the linear equation system once it has received at least n packets. This means O(n^3) in general case for each received packet. If the equations are generated randomly, the expected overhead seems to be a mere 1.61 extra packets! http://virtual.vtt.fi/virtual/abi/abi-seminar07/abiseminar07...

In practice however, depending on the receiver, you may not want it to solve a general system of linear equations for every packet it receives when n is big. It might be better to stick to a simple algorithm (which has a better time complexity if certain conditions are satisfied by the linear system) and just wait to accumulate enough equations for it to become applicable.

Raptor codes require O(1) time to generate an encoding symbol. Decoding a message of length k with a belief propagation decoding algorithm require O(k) time for the appropriate choice of inner/outer codes. http://en.wikipedia.org/wiki/Raptor_code#Computational_compl... See also: http://www.inference.phy.cam.ac.uk/mackay/itprnn/ps/555.573....

If this was transmission to a single client you were right but when this is used for broadcast or multicast there is less interest to minimize the transmission for each client but rather to optimize it for the entire population at once, since in these cases feedback is scarce in the first place there is little reason to check at every message, it is sufficient to test it once in a while anyhow.

I can see also use cases for this in a unicast transmissions but even there the idea would be about simplifying the communications layer, for example in the case of usage for a fixed rate transmission and with complete disregard for the loss rate. Not sure what I'd think about bandwidth usage and the stop criteria in this case.

With large enough blocks and efficient coding, the noisy-channel coding theorem guarantees nearly error-free transmission rates up to the maximum possible channel capacity. Modern LDPC, Raptor, and Turbo codes can get very close (to within 0.0045dB) of this Shannon limit, meaning that far from being less efficient, FEC allows for maximal bandwidth efficiency.

RaptorQ codes, an efficient fountain code implementation, have a relative reception overhead of K+2 symbols with probability of 99.9999%. See: http://tools.ietf.org/html/draft-ietf-rmt-bb-fec-raptorq-06

"Systematic code" is the term of art in coding. There are systematic fountain codes. The raptor code adopted by 3GPP is systematic.

Where do people keep these algorithms around? I've asked in the past about reference guides for stuff like this they throw Knuth's "Art of Computer Programming" which is a neat book but doesn't have esoteric and useful but hard to find stuff like this.

Where do I find stuff like this?!

http://en.wikipedia.org/wiki/List_of_algorithms (seriously! Scan it, choose something that looks interesting and you'll have hours of fun ahead of you.)

As much as it seems like common sense as that page would exist on Wikipedia, it had never occurred to me, Brilliant - Thank you.

Thanks, but neither fountain nor rateless codes appear on that page. :(

No, but it does link to Forward error correction, which links to fountain codes as well as rateless codes.

I don't pretend to understand most of it, but to my untrained eye, "Clever Algorithms" seems to be a phenomenal resource and available as a free PDF, or in print: http://www.cleveralgorithms.com/

Edit: Discussion here: http://news.ycombinator.com/item?id=2141542

Related, also recommended: _Essentials of Metaheuristics_ (http://cs.gmu.edu/~sean/book/metaheuristics/).

Whats the advantage of Fountain Codes over more well known and deterministic forward error correction techniques like Reed-Soloman Codes?

The D.J.C.MacKay paper linked from the article lists applications in broadcast and data storage, but Reed-Soloman would work equally well in both those applications.

Hypothesis: With Reed-Soloman and friends, if you're unlucky and loose the /right/ n blocks, for small n, you can't recover your data. Is that n larger for LT? If I could maliciously corrupt your LT blocks, would I have to corrupt more with LT?

Hypothesis: If I randomly corrupt blocks with Reed-Soloman, does my likelihood of not being able to recover all my data drop faster than if I did so with LT?

Can someone say authoritatively?

The advantage is you don't need to know the channel characteristics ahead of time. In particular, if you used a Reed-Solomon code, you'd use one designed for one particular erasure rate. But that code would be too weak for a channel with a higher erasure rate (it would fail to decode) and it would be too strong for a channel with a lower erasure rate (it would be wasteful and use too many parity bits). The advantage of fountain codes is that they work efficiently whatever the channel's erasure rate.

By the way, I've been talking about channels with an erasure rate (the Binary Erasure Channel) because fountain codes were originally invented in that context. But you can find in the literature papers on "Rateless codes for noisy channels" that show that the fountain codes also work fine for bit-flipping channels like the binary symmetric channel and the additive white Gaussian noise channel.

Combine the adaptability of fountain codes with feedback, and you can vary the transmission rate as the channel quality changes. With applications like cellular radios, this lets you use just enough power to transmit messages intact, optimizing the power usage across a range of interference levels.

> But that code would be too weak for a channel with a higher erasure rate (it would fail to decode) and it would be too strong for a channel with a lower erasure rate (it would be wasteful and use too many parity bits). The advantage of fountain codes is that they work efficiently whatever the channel's erasure rate.

So could we think of fountain codes as a kind of adaptive on-the-fly RS code in which one sends a block protected against one's best guess of the erasure rate and if the recipient fails to decode it one resends with a higher erasure rate, or if the recipient succeeds one sends the next block with a lower erasure rate?

More or less, but it's really best not to think of them in terms of RS codes at all. The reason they're called "fountain codes" is that they're used like a water fountain. The transmitter just starts sending out bits for a file like a water fountain spews out water. The receiver(s) collect bits until they've got enough to reconstruct the file. It's like they're collecting water in a bucket, and when they've got enough water (good bits) they can reconstruct the file. The receivers with a good channel are like people standing close to the fountain who can catch lots of bits. Those with a bad channel are like people standing further away. But everybody can eventually fill up their bucket. And the transmitter is indifferent to what channel they've got.

Reed-Solomon and similar codes correct for bit errors in a particular block by transmitting extra bits in each block. Fountain codes correct for missing blocks by transmitting extra blocks.

If you just used RS, the only way to overcome really long bursts of errors (thousands or millions of bits long) would be to either to request acknowledgements from the receivers and retransmit unacknowledged blocks. If acknowledgements weren't possible, you would need to transmit everything multiple times. And, if you had enough receivers, all missing random packets, you would end up retransmitting everything even if you had acknowledgements. By using fountain codes, you can transmit an extra few percent of blocks and still succeed in the face of long dropouts.

Reed-Solomon and similar codes correct for bit errors in a particular block by transmitting extra bits in each block.

That's one way to think of it, but more generally they correct symbol errors by transmitting extra symbols. In some cases a symbol is a bit, but if you want a packet erasure code you just define a symbol to be a packet (or a disk block or whatever). A practical example of this is par2.

Parity codes (of which fountains are a subset) are based on XOR, making them much faster than RS's Galois field arithmetic. Parity codes are generally less efficient than RS (you have to receive 5-10% extra data to recover), but the net performance is better due to faster decoding.

jsyedidia explained the benefits of ratelessness, so I won't repeat it.

What you said is correct but just to be nit-picky: parity codes constitute Galois Field arithmetic as well. XOR is the addition operation of polynomials in GF(2^n).

Didn't know about that algorithm, pretty cool. Also, the way the author explained it made it a joy to read.. clear, simple examples, meaningful graphics. Can't wait to start reading the previous blog posts (About other cool algorithms).

Seems the website has gone over quota. Shame, because the content is absolutely fantastic. Had it opened and had begun reading it on my computer at home.

NYUD gives me a 324 error (no bytes returned). This one worked for me: http://webcache.googleusercontent.com/search?q=cache:blog.no...

It wasn't immediately clear to me the ideal soliton distribution always summed to one. There is a simple proof by induction. I wrote it up here: http://kevinlawler.com/soliton

It is a damn cool algorithm, I was introduced to it in a startup I worked in and we actually tried to make use of it. The algorithm as described has a problem to converge fast enough and we had to make some extra structure on top that made it converge very fast and in a very consistent way. We used it for multicast communications where feedback to the server is impractical and such codes are making it a very effective way to distribute the data.

Luby had a startup named Digital Fountain in SF in the late 90's where they attempted to productize this, primarily for video streaming.

Looks like they were acquired by Qualcomm a few years back: http://www.qualcomm.com/solutions/broadcast-streaming/media-...

Fountain codes look very interesting. There was a project at OLPC, I think written in Forth for OpenFirmware, that used something like fountain codes to reflash massive numbers of computers. It worked like this. Turn one XO-1 on in OFW mode, insert a USB with the disk image you wanted to be on all machines, and turn it on as the source server. Boot into OFW, the other XO-1s you want to reflash, and tell them to receive a new image. I think there was a key shortcut of some kind that told a machine to receive. The server XO would continue to send packets until all other machines had received enough packets to make up a complete disk image. If I remember correctly, they used it to reflash 3000 machines in the field that I heard about. There may have been more elsewhere or since.

Gray on black text. :~(

To make the page readable, pull up the dev console and run `document.body.style.color='black'`

That bookmarklet comes from squarefree (and looking at his about page, Russell Beatie should learn to credit other people): https://www.squarefree.com/bookmarklets/zap.html

I guess it's an encoded version of the same JavaScript, all right.

I expect I was first introduced to it through squarefree, but the Beatie page is a simpler place for a surfer to land for that one specific thing, which is why I chose it. I didn't realize it was also a straight-up, uncredited copy.

It's somewhat intuitive, but I feel like there is some cool math or attempt at explanation behind this that was left out.

Why is this better than just broadcasting messages that always contain a single block? What do you gain by forming combinations? You still have an expectation of eventually getting all the information. I assume it just turns out to take longer on average because your expected gain in information is lower from single chunks once you already have many of them?

If you only broadcast messages that contain a single block, anyone who already has that block gets no value out of the message. On the other hand, if you broadcast messages generated via fountain codes, messages give you almost a full block worth of information on average.

If you assume the same network environment that the article does, namely one where no feedback occurs from the recipient to the sender, then if you miss a single block in the message, you'll have to wait for the sender to send that one block again. Since the sender has no information about what block you missed, you'll have to wait on average for half the blocks to get retransmitted, resulting in 50% overhead or more. It gets worse if you miss enough blocks that you have a non-trivial chance of missing those blocks again. (On top of that, some transmission mechanisms have properties that make particular data patterns particularly prone to loss, which makes it significantly more likely that you'll lose particular blocks repeatedly.)

With fountain codes and similar mechanisms, on the other hand, every message provides several chances to obtain the missing blocks, resulting in much less overhead.

One wonders how it does rate limiting to avoid pointless resource consumption of faster, near links when the packets are going to be dumped by a slower, farther link.

In particular, if you intend to coexist fairly with TCPIP on an internet you might find your rate limiting code carries all the most of the work and timing issues of TCPIP anyway.

The fountain code itself is not concerned with it, it's merely the coding layer. There needs to be a communications layer that takes care of that and the way it does that will need to consider co-habitating with TCP.

In the application I worked on we used multicast and implemented a form of multicast congestion control that was orthogonal to the coding but exploited the fact that we can send however much data as we wanted and the only thing that changed was the time it will take to complete the transfer. It worked beautifully with the coding and co-habitated nicely with TCP.

This is not relevant to fountain codes.

This seems most useful for broadcast. You could just broadcast the fountain code of a file out as long as you have listening receivers; the only feedback needed would be "send to me" and "done"

Thanks for this post. Patented or not, this is damn cool.

Arg. I thought algorithms weren't supposed to be patentable. I know there are workarounds (like referring to the "algorithm" as a "system"). But it's frustrating nonetheless.

For the curious, Knuth wrote an interesting letter on the topic back in 1994: http://www.pluto.it/files/meeting1999/atti/no-patents/brevet...

We had a patent officer visit our senior design class in engineering school and this question came up. We were all of the opinion that algorithms were purely mathematical expressions, as Dr. Knuth insists in the letter you cited. I don't recall the patent officer ever giving a precise answer on the subject, but I do recall considerable dancing as he attempted to explain why math isn't always math to a room full of engineers.

Actually, I ran into a relevant discussion on the topic with a few choice quotes (from [1]):

When filing an algorithm patent, patent lawyer will tell you to remove every instance of the word "algorithm" and replace it with some other word. This seems to be necessary for getting the patent through.

[...] at some point you may find yourself pulling diagrams from thin air and thinking "why am I doing this?" (but then remembering that it's because of the bonus)

The complacency in the latter quote is (probably) why so many well-meaning engineers inadvertently exacerbate the problem.

[1] http://yaroslavvb.blogspot.com/2011/02/how-to-patent-algorit...

Having worked with a patent attorney in the past, I can give you specific wording examples. The software patents I've seen stated the claims in three different ways, to take advantage of different patentable areas: "A method for $foo", "A system for $foo" (as in a computer system), and "A computer program product for $foo".

Those same software patents also included, in addition to all the illustrations of the algorithm, a diagram of a stack of computer discs capable of containing the program, in order to demonstrate that (at least in some embodiments) the patent actually covered a physical thing rather than an algorithm.

heavily patented

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact