The pieces on this blog are always entertaining and clearly justify the excitement and purpose of the algorithm under discussion.
Computer science and the STEM world in general needs more stuff like this, and I'm happy that we are in fact seeing more and more of it. Well done.
If you don't like standard textbook treatments like CLRS, you might like these: Introduction to Algorithms - A Creative Approach by Udi Manber (out of print but worth getting). Algorithm Design by Klein and Tardos. How to Think About Algorithms by Jeff Edmonds. Jeff Erickson's lecture notes on algorithms: http://www.cs.uiuc.edu/~jeffe/teaching/algorithms/
Jon Bentley's Programming Pearls is about what I'd call algorithm engineering. The algorithms are never anything fancy but it's a good example of algorithms applied to real-world problems. It's possibly the best single book on programming I know. Make sure to do all the exercises.
You may want to peruse Aho and Ullman's excellent C edition of Foundations of Computer Science. (That's Aho as in awk.) What you need for a good grounding all in one place. You can still pick up a paper edition if you want but they've also made it available as PDFs.
What's really cool is that matrix multiplication is associative, i.e. x(yz) = (xy)z, so you can perform the hashing in parallel: Hash the left half, hash the right half, and combine the results by multiplication. It also allows distributed multi-party hashing, as in this article's application.
Unfortunately, classical TZ hashing has fallen completely to practical collision attacks as well as first and second preimage attacks. However, those attacks all rely on very special properties of power-of-two Galois fields, so there's no reason generalized TZ-style matrix group hashing wouldn't be feasible.
It might help that I've recently been re-reading "The Mathematics of Ciphers" by S.C. Coutinho which has the best easy-on-ramp introduction to the number theory behind RSA that I've encountered.
The stated goal is to be able to verify pieces of a file that has been chunked up. But the first step in the algorithm is get a list of hashes for each chunk from a central server - if you have that, aren't you done?
edit: actually, the receiver should be able to use the homomorphism to generate hashes for the check blocks as well. so the list of authoritative hashes to download should only be for the original chunks. the goal of the technique is to be able to verify the check blocks before being able to reconstruct the original blocks.
In all other particulars you're dead right.
The basic idea is that, instead of sending N blocks in the content file f, you have a block generator gen_block(f) which can create an infinite number of blocks as random combinations of input blocks (plus a little "magic"). A peer who downloads any 1.01N or so blocks can reconstruct the original file (with high probability); because of the "magic," it doesn't matter which 1.01N blocks they are!
Fountain codes work beautifully in broadcast protocols, where the receiver has no way of communicating back to the sender (this is not generally true for TCP/IP outside of exotic scenarios, for example multicast).
For Bittorrent-like protocols (peer-to-peer large file distribution over TCP/IP), the practical problem that is addressed by fountain codes is making sure that the entire file will still usually be recoverable from data that exists within the network if all the seeders (peers with complete a copy) leave the network; in traditional Bittorrent, often there are a few missing blocks in this situation.
The practical problem addressed by homomorphic hashing is how a receiver can rapidly identify peers that are attacking a fountain-code-based P2P network by deliberately sending corrupt downloads (and as a side-effect can of course detect accidental corruption as well.)
Not-patenting a (practically, not legally) patentable algorithm doesn't come easily to academic researchers these days. The authors deserve kudos (unlike the rapacious Michael Luby, fountain code inventor) for trying to make sure people can use their algorithm sometime in the next 20 years.