
Damn Cool Algorithms: Fountain Codes - nl
http://blog.notdot.net/2012/01/Damn-Cool-Algorithms-Fountain-Codes
======
tikhonj
This sounds really similar to secret sharing[1]. While not doing _exactly_ the
same thing, a simple secret sharing algorithm is also interesting to read
about now--it's a different algorithm chunking up and reassembling data in a
similar way with different properties.

One difference is that with secret sharing unless you have enough chunks you
do not have _any_ information about the result. (That's why it's called secret
sharing--you can share a secret with a bunch of people without any of them
knowing it unless a minimum number get together.) The neat trick is that _any_
subset of the chunks big enough will get you all of the information.

[1]: <http://en.wikipedia.org/wiki/Shamir%27s_Secret_Sharing>

In fact, the algorithm (at a high level) is exceptionally simple: encode your
data as the coefficients of a polynomial of degree n and transmit it as a
bunch of points in the polynomial. You then need n+1 points to recover the
polynomial, but which particular points you have does not matter.

What I love about this algorithm is that it is useful, nontrivial and can be
explained in a single sentence (by a better writer than I :)).

~~~
zacharyvoase
Another way of phrasing the same concept:

An infinite number of Euclidean planes can be drawn containing a single point,
but you only need three of these—and it can be any three—to find the original
point.

~~~
lurker14
(if the original point is contained within a line that is also known. For
example, if the original/desired point is on the x-axis)

------
epaulson
Alas, digital fountains are tied up in patents, and people have been leery:

[http://www.eecs.harvard.edu/~michaelm/postscripts/itw2004.pd...](http://www.eecs.harvard.edu/~michaelm/postscripts/itw2004.pdf)

~~~
michaelf
I know there are lots of anti-patent folks on HN, but if anything deserves
patent protection, digital fountains certainly do. This is useful, novel, and
certainly non-obvious stuff. And really quite beautiful, IMHO.

Its always bothered me to read about inventors who discovered beautiful and
amazing things, only to die penniless while powerful interests reaped all of
the financial benefits. Hopefully the people who discovered these codes have
been richly rewarded for their work.

~~~
JoshTriplett
> I know there are lots of anti-patent folks on HN, but if anything deserves
> patent protection, digital fountains certainly do.

I'd suggest that the usefulness of algorithms like fountain codes (and other
innovative algorithms like RSA and many compression techniques) leads to the
opposite conclusion: such algorithms prove so broadly useful that patents on
them have even more of a negative effect than usual.

~~~
zvrba
How does it have negative effect? Consider three cases: 1) an algorithm is
designed and patented, and the inventor has the right to use or license it. 2)
an algorithm is designed and never published, i.e., held secret for
proprietary use of the inventor. 3) an algorithm is designed and put into
public domain.

With 3), the inventor never gets a chance to commercialize his invention: as
soon as he publishes the algorithm, he will be destroyed (or assimialted) by
large players. With 2), we never know how many cool, useful algorithms are
rotting away in commercial products. As far as the society is concerned, the
algorithm in question does not exist. With 1), the inventor benefits, AND also
the society (in the long-term).

IMO, the patent system should be reformed so that having a large portfolio of
_unused_ patents becomes a liability. For example, the patent holder would
have to periodically (e.g., every 2 years) apply [with fee, of course] for
patent renewal. For this, the holder must be able to document that he is
actively exploiting the patent (1). There should also probably be a minimum
limit for the required income from patent exploitation. Failing to do so, the
patent would automatically expire. The patent office should publish an on-line
database of expired patents.

There should also be restrictions on what can be patented: for example, work
funded by government funds must not be patentable: the inventors' (in this
case, mostly researchers at universities) day job is _by definition_ designing
things that will benefit society, are paid for that by the society, and
therefore have no further rights to patent protection. (This is, IMO, fair
since research is [should be] free, so the society has no firm guarantees that
the researchers will produce something valuable, or anything at all.)

(1) By exploitation I mean either making a device covered by the patent, or
licensing it to somebody who making such device.

~~~
JoshTriplett
You seem to suggest that algorithms won't get published if their authors can't
patent them and commercialize them. Given that the vast majority of algorithms
don't get patented, and that the vast majority of patents never get licensed
except in bulk cross-licensing deals, this seems almost entirely untrue.

Aside from that, I'd point out that in the rare case when an algorithm _does_
get published as a patent and licensed, that makes it entirely unusable in
Free and Open Source Software as well as in open standards, for a period of
~20 years. At least when the algorithm remains unpublished, someone can re-
discover it (or something more useful that would build on it); patents force
willful ignorance of an entire class of algorithms, as well as anything in the
general vicinity of those algorithms.

------
jleader
Why does the described algorithm only decode transmitted blocks when the
receiver has decoded all but one of the constituent blocks? For example, if
the receiver has received the following 3 blocks:

    
    
      1. A ^ B ^ C
      2. A ^ B
      3. B ^ C
    

it can combine #1 and #2 to decode C, then combine C and #3 to decode B, then
combine B and #2 to decode A.

The algorithm as described would in this situation cause the receiver to wait
until it's received a single block, either A or B or C, before decoding
anything. This strikes me as inefficient.

Edited to add: I think this is analogous to forward vs. backwards chaining:
you can either start by combining single blocks with composite blocks to
simplify them, or start by combining blocks whose component block lists differ
by only a single block. Or you could apply both, which should get the greatest
simplification with the smallest amount of input.

~~~
baruch
Normally one would use some distribution of the number of encoded source
blocks in a block. So one would send 30% single blocks, 30% double blocks, 15%
triple, and so on until you get some relatively small number of high order
encoded blocks. This means that while you can find a solution the way you
posted it is a rare case that you have no single block for decoding and since
in any case you need to receive a bit over 100% of the blocks before you can
decode anything there is no real reason to go to exquisite lengths to find the
perfect solution when the trivial case will work 100% of the times (well
99.9999% if you cared to be accurate :-)

The algorithm is also not so much about computational efficiency, to check at
any point if you can or cannot decode requires quite a bit of work anyhow. The
main advantage of the algorithm is in its relative low complexity and high
efficiency on the transmission line.

~~~
jleader
My point isn't about computational efficiency, but bandwidth efficiency (or
alternatively, decoding the full message as early as possible). As I
understand it, the Moore's law doubling time for bandwidth is longer than for
CPU speed, suggesting that processing is getting cheaper relative to
bandwidth, so using more CPU in order to have to send fewer blocks is probably
a long-term win.

~~~
abhaga
The decoding amounts to solving a system of linear equation where each
equation corresponds to one received packet. Assuming the file to be sent was
broken into n parts, minimum n independent equations are required for
reconstruction. To do that optimally, receiver needs to keep trying to solve
the linear equation system once it has received at least n packets. This means
O(n^3) in general case for each received packet. If the equations are
generated randomly, the expected overhead seems to be a mere 1.61 extra
packets! [http://virtual.vtt.fi/virtual/abi/abi-
seminar07/abiseminar07...](http://virtual.vtt.fi/virtual/abi/abi-
seminar07/abiseminar07-tuomas.pdf)

In practice however, depending on the receiver, you may not want it to solve a
general system of linear equations for every packet it receives when n is big.
It might be better to stick to a simple algorithm (which has a better time
complexity if certain conditions are satisfied by the linear system) and just
wait to accumulate enough equations for it to become applicable.

~~~
hadronzoo
Raptor codes require O(1) time to generate an encoding symbol. Decoding a
message of length k with a belief propagation decoding algorithm require O(k)
time for the appropriate choice of inner/outer codes.
[http://en.wikipedia.org/wiki/Raptor_code#Computational_compl...](http://en.wikipedia.org/wiki/Raptor_code#Computational_complexity)
See also:
[http://www.inference.phy.cam.ac.uk/mackay/itprnn/ps/555.573....](http://www.inference.phy.cam.ac.uk/mackay/itprnn/ps/555.573.pdf)

------
sown
Where do people keep these algorithms around? I've asked in the past about
reference guides for stuff like this they throw Knuth's "Art of Computer
Programming" which is a neat book but doesn't have esoteric and useful but
hard to find stuff like this.

Where do I find stuff like this?!

~~~
nl
<http://en.wikipedia.org/wiki/List_of_algorithms> (seriously! Scan it, choose
something that looks interesting and you'll have hours of fun ahead of you.)

~~~
sown
Thanks, but neither fountain nor rateless codes appear on that page. :(

~~~
nl
No, but it does link to Forward error correction, which links to fountain
codes as well as rateless codes.

------
repiret
Whats the advantage of Fountain Codes over more well known and deterministic
forward error correction techniques like Reed-Soloman Codes?

The D.J.C.MacKay paper linked from the article lists applications in broadcast
and data storage, but Reed-Soloman would work equally well in both those
applications.

Hypothesis: With Reed-Soloman and friends, if you're unlucky and loose the
/right/ n blocks, for small n, you can't recover your data. Is that n larger
for LT? If I could maliciously corrupt your LT blocks, would I have to corrupt
more with LT?

Hypothesis: If I randomly corrupt blocks with Reed-Soloman, does my likelihood
of not being able to recover all my data drop faster than if I did so with LT?

Can someone say authoritatively?

~~~
jsyedidia
The advantage is you don't need to know the channel characteristics ahead of
time. In particular, if you used a Reed-Solomon code, you'd use one designed
for one particular erasure rate. But that code would be too weak for a channel
with a higher erasure rate (it would fail to decode) and it would be too
strong for a channel with a lower erasure rate (it would be wasteful and use
too many parity bits). The advantage of fountain codes is that they work
efficiently whatever the channel's erasure rate.

By the way, I've been talking about channels with an erasure rate (the Binary
Erasure Channel) because fountain codes were originally invented in that
context. But you can find in the literature papers on "Rateless codes for
noisy channels" that show that the fountain codes also work fine for bit-
flipping channels like the binary symmetric channel and the additive white
Gaussian noise channel.

~~~
gwern
> But that code would be too weak for a channel with a higher erasure rate (it
> would fail to decode) and it would be too strong for a channel with a lower
> erasure rate (it would be wasteful and use too many parity bits). The
> advantage of fountain codes is that they work efficiently whatever the
> channel's erasure rate.

So could we think of fountain codes as a kind of adaptive on-the-fly RS code
in which one sends a block protected against one's best guess of the erasure
rate and if the recipient fails to decode it one resends with a higher erasure
rate, or if the recipient succeeds one sends the next block with a lower
erasure rate?

~~~
jsyedidia
More or less, but it's really best not to think of them in terms of RS codes
at all. The reason they're called "fountain codes" is that they're used like a
water fountain. The transmitter just starts sending out bits for a file like a
water fountain spews out water. The receiver(s) collect bits until they've got
enough to reconstruct the file. It's like they're collecting water in a
bucket, and when they've got enough water (good bits) they can reconstruct the
file. The receivers with a good channel are like people standing close to the
fountain who can catch lots of bits. Those with a bad channel are like people
standing further away. But everybody can eventually fill up their bucket. And
the transmitter is indifferent to what channel they've got.

------
phzbOx
Didn't know about that algorithm, pretty cool. Also, the way the author
explained it made it a joy to read.. clear, simple examples, meaningful
graphics. Can't wait to start reading the previous blog posts (About other
cool algorithms).

------
calloc
Seems the website has gone over quota. Shame, because the content is
absolutely fantastic. Had it opened and had begun reading it on my computer at
home.

~~~
Edootjuh
Here's a mirror: [http://blog.notdot.net.nyud.net/2012/01/Damn-Cool-
Algorithms...](http://blog.notdot.net.nyud.net/2012/01/Damn-Cool-Algorithms-
Fountain-Codes)

~~~
InnocentB
NYUD gives me a 324 error (no bytes returned). This one worked for me:
[http://webcache.googleusercontent.com/search?q=cache:blog.no...](http://webcache.googleusercontent.com/search?q=cache:blog.notdot.net%2F2012%2F01%2FDamn-
Cool-Algorithms-Fountain-Codes)

------
kcl
It wasn't immediately clear to me the ideal soliton distribution always summed
to one. There is a simple proof by induction. I wrote it up here:
<http://kevinlawler.com/soliton>

------
baruch
It is a damn cool algorithm, I was introduced to it in a startup I worked in
and we actually tried to make use of it. The algorithm as described has a
problem to converge fast enough and we had to make some extra structure on top
that made it converge very fast and in a very consistent way. We used it for
multicast communications where feedback to the server is impractical and such
codes are making it a very effective way to distribute the data.

------
derfclausen
Luby had a startup named Digital Fountain in SF in the late 90's where they
attempted to productize this, primarily for video streaming.

Looks like they were acquired by Qualcomm a few years back:
[http://www.qualcomm.com/solutions/broadcast-
streaming/media-...](http://www.qualcomm.com/solutions/broadcast-
streaming/media-delivery)

------
sethbuzz
Fountain codes look very interesting. There was a project at OLPC, I think
written in Forth for OpenFirmware, that used something like fountain codes to
reflash massive numbers of computers. It worked like this. Turn one XO-1 on in
OFW mode, insert a USB with the disk image you wanted to be on all machines,
and turn it on as the source server. Boot into OFW, the other XO-1s you want
to reflash, and tell them to receive a new image. I think there was a key
shortcut of some kind that told a machine to receive. The server XO would
continue to send packets until all other machines had received enough packets
to make up a complete disk image. If I remember correctly, they used it to
reflash 3000 machines in the field that I heard about. There may have been
more elsewhere or since.

------
evmar
Gray on black text. :~(

To make the page readable, pull up the dev console and run
`document.body.style.color='black'`

~~~
Lagged2Death
Or you could zap colors:

[http://www.russellbeattie.com/blog/the-zap-colors-
bookmarkle...](http://www.russellbeattie.com/blog/the-zap-colors-bookmarklet)

~~~
obtu
That bookmarklet comes from squarefree (and looking at his about page, Russell
Beatie should learn to credit other people):
<https://www.squarefree.com/bookmarklets/zap.html>

~~~
Lagged2Death
I guess it's an encoded version of the same JavaScript, all right.

I expect I was first introduced to it through squarefree, but the Beatie page
is a simpler place for a surfer to land for that one specific thing, which is
why I chose it. I didn't realize it was also a straight-up, uncredited copy.

------
karpathy
It's somewhat intuitive, but I feel like there is some cool math or attempt at
explanation behind this that was left out.

Why is this better than just broadcasting messages that always contain a
single block? What do you gain by forming combinations? You still have an
expectation of eventually getting all the information. I assume it just turns
out to take longer on average because your expected gain in information is
lower from single chunks once you already have many of them?

~~~
JoshTriplett
If you only broadcast messages that contain a single block, anyone who already
has that block gets no value out of the message. On the other hand, if you
broadcast messages generated via fountain codes, messages give you almost a
full block worth of information on average.

If you assume the same network environment that the article does, namely one
where no feedback occurs from the recipient to the sender, then if you miss a
single block in the message, you'll have to wait for the sender to send that
one block again. Since the sender has no information about what block you
missed, you'll have to wait on average for half the blocks to get
retransmitted, resulting in 50% overhead or more. It gets worse if you miss
enough blocks that you have a non-trivial chance of missing those blocks
_again_. (On top of that, some transmission mechanisms have properties that
make particular data patterns particularly prone to loss, which makes it
significantly more likely that you'll lose particular blocks repeatedly.)

With fountain codes and similar mechanisms, on the other hand, every message
provides several chances to obtain the missing blocks, resulting in much less
overhead.

------
jws
One wonders how it does rate limiting to avoid pointless resource consumption
of faster, near links when the packets are going to be dumped by a slower,
farther link.

In particular, if you intend to coexist fairly with TCPIP on an internet you
might find your rate limiting code carries all the most of the work and timing
issues of TCPIP anyway.

~~~
baruch
The fountain code itself is not concerned with it, it's merely the coding
layer. There needs to be a communications layer that takes care of that and
the way it does that will need to consider co-habitating with TCP.

In the application I worked on we used multicast and implemented a form of
multicast congestion control that was orthogonal to the coding but exploited
the fact that we can send however much data as we wanted and the only thing
that changed was the time it will take to complete the transfer. It worked
beautifully with the coding and co-habitated nicely with TCP.

------
kareemsabri
Isn't this network coding?

<http://en.wikipedia.org/wiki/Network_coding>
<http://research.microsoft.com/pubs/67246/tr-2004-80.pdf>

------
aidenn0
This seems most useful for broadcast. You could just broadcast the fountain
code of a file out as long as you have listening receivers; the only feedback
needed would be "send to me" and "done"

------
invalidOrTaken
Thanks for this post. Patented or not, this is damn cool.

~~~
beambot
Arg. I thought algorithms weren't supposed to be patentable. I know there are
workarounds (like referring to the "algorithm" as a "system"). But it's
frustrating nonetheless.

For the curious, Knuth wrote an interesting letter on the topic back in 1994:
[http://www.pluto.it/files/meeting1999/atti/no-
patents/brevet...](http://www.pluto.it/files/meeting1999/atti/no-
patents/brevetti/docs/knuth_letter_en.html)

~~~
cydonian_monk
We had a patent officer visit our senior design class in engineering school
and this question came up. We were all of the opinion that algorithms were
purely mathematical expressions, as Dr. Knuth insists in the letter you cited.
I don't recall the patent officer ever giving a precise answer on the subject,
but I do recall considerable dancing as he attempted to explain why math isn't
always math to a room full of engineers.

~~~
beambot
Actually, I ran into a relevant discussion on the topic with a few choice
quotes (from [1]):

 _When filing an algorithm patent, patent lawyer will tell you to remove every
instance of the word "algorithm" and replace it with some other word. This
seems to be necessary for getting the patent through._

 _[...] at some point you may find yourself pulling diagrams from thin air and
thinking "why am I doing this?" (but then remembering that it's because of the
bonus)_

The complacency in the latter quote is (probably) why so many well-meaning
engineers inadvertently exacerbate the problem.

[1] [http://yaroslavvb.blogspot.com/2011/02/how-to-patent-
algorit...](http://yaroslavvb.blogspot.com/2011/02/how-to-patent-algorithm-in-
us.html)

~~~
JoshTriplett
Having worked with a patent attorney in the past, I can give you specific
wording examples. The software patents I've seen stated the claims in three
different ways, to take advantage of different patentable areas: "A method for
$foo", "A system for $foo" (as in a computer system), and "A computer program
product for $foo".

Those same software patents also included, in addition to all the
illustrations of the algorithm, a diagram of a stack of computer discs capable
of containing the program, in order to demonstrate that (at least in some
embodiments) the patent actually covered a physical _thing_ rather than an
algorithm.

------
cma
heavily patented

