
Damn Cool Algorithms: Homomorphic Hashing - moobirubi
http://blog.notdot.net/2012/08/Damn-Cool-Algorithms-Homomorphic-Hashing
======
breckinloggins
I just want to say how much I love this blog. I'm a self-taught programmer,
which means that I tend to learn algorithms as I feel I need them (a big
problem given that I have to know what I don't know before I can know it!).
I've recently gotten more into computer science for its own sake, but books
and papers about algorithms are often dry and unmotivating.

The pieces on this blog are always entertaining and clearly justify the
excitement and purpose of the algorithm under discussion.

Computer science and the STEM world in general needs more stuff like this, and
I'm happy that we are in fact seeing more and more of it. Well done.

~~~
psykotic
> books and papers about algorithms are often dry and unmotivating.

If you don't like standard textbook treatments like CLRS, you might like
these: Introduction to Algorithms - A Creative Approach by Udi Manber (out of
print but worth getting). Algorithm Design by Klein and Tardos. How to Think
About Algorithms by Jeff Edmonds. Jeff Erickson's lecture notes on algorithms:
<http://www.cs.uiuc.edu/~jeffe/teaching/algorithms/>

Jon Bentley's Programming Pearls is about what I'd call algorithm engineering.
The algorithms are never anything fancy but it's a good example of algorithms
applied to real-world problems. It's possibly the best single book on
programming I know. Make sure to do all the exercises.

~~~
breckinloggins
Thanks! I'll take a look. It's not so much that I don't like them, it's that I
seem to be able to "get into" a paper or textbook _after_ I understand what's
so cool about what they're talking about.

~~~
petercooper
In that case Programming Pearls will probably be a breath of fresh air. It
doesn't start from the algorithm but from the problem to be solved and tends
to first dip into the most naive solution before rationalizing why specific
approaches are important.

~~~
abhijat
From that perspective Skiena's book is also good, as it offers war stories
where you can see first hand some of the implementations of the algorithms
mentioned.

~~~
petercooper
Cool! I've now ordered it based on this comment alone.. :-)

------
psykotic
If I'm understanding the application correctly, this would also work with
Tillich-Zemor-style hashing based on matrix groups. The idea is to map the 0
and 1 message bits onto a pair of generators of a subgroup of the invertible
2x2 matrices with entries in some GF(2^n). The message hash is just the
product of the generators corresponding to the message bits.

What's really cool is that matrix multiplication is associative, i.e. x(yz) =
(xy)z, so you can perform the hashing in parallel: Hash the left half, hash
the right half, and combine the results by multiplication. It also allows
distributed multi-party hashing, as in this article's application.

Unfortunately, classical TZ hashing has fallen completely to practical
collision attacks as well as first and second preimage attacks. However, those
attacks all rely on very special properties of power-of-two Galois fields, so
there's no reason generalized TZ-style matrix group hashing wouldn't be
feasible.

------
pacaro
Cool stuff, my head hurts less than expected from the description, I can't
speak for the accuracy of the explanation, but the clarity was just about
where I needed it.

It might help that I've recently been re-reading "The Mathematics of Ciphers"
by S.C. Coutinho[1] which has the best easy-on-ramp introduction to the number
theory behind RSA that I've encountered.

[1] [http://www.amazon.com/The-Mathematics-Ciphers-Number-
Cryptog...](http://www.amazon.com/The-Mathematics-Ciphers-Number-
Cryptography/dp/1568810822/ref=sr_1_1?ie=UTF8&qid=1346174543&sr=8-1&keywords=mathematics+of+ciphers)

------
kenferry
This seems cool, but I don't understand how the given algorithm addresses the
use case he gives.

The stated goal is to be able to verify pieces of a file that has been chunked
up. But the first step in the algorithm is get a list of hashes for each chunk
from a central server - if you have that, aren't you done?

~~~
mindslight
The paper includes details that show the motivation a bit better. Broadcast
erasure codes will multiply the number of blocks by a large amount, say 16.
With traditional hashing, you have to generate all of these extra 'check
blocks' to hash them as part of the publishing process. Using this technique,
you only have to hash the actual file and can then generate all of the other
hashes using the homomorphism.

edit: actually, the receiver should be able to use the homomorphism to
generate hashes for the check blocks as well. so the list of authoritative
hashes to download should only be for the original chunks. the goal of the
technique is to be able to verify the check blocks before being able to
reconstruct the original blocks.

~~~
csense
Read up on fountain codes. A good starting point is the previous post on the
same blog [1], actually linked in the first sentence of the article.

The basic idea is that, instead of sending N blocks in the content file f, you
have a block generator gen_block(f) which can create an infinite number of
blocks as random combinations of input blocks (plus a little "magic"). A peer
who downloads _any_ 1.01N or so blocks can reconstruct the original file (with
high probability); because of the "magic," it doesn't matter _which_ 1.01N
blocks they are!

Fountain codes work beautifully in broadcast protocols, where the receiver has
no way of communicating back to the sender (this is not generally true for
TCP/IP outside of exotic scenarios, for example multicast).

For Bittorrent-like protocols (peer-to-peer large file distribution over
TCP/IP), the practical problem that is addressed by fountain codes is making
sure that the entire file will still usually be recoverable from data that
exists within the network if all the seeders (peers with complete a copy)
leave the network; in traditional Bittorrent, often there are a few missing
blocks in this situation.

The practical problem addressed by homomorphic hashing is how a receiver can
rapidly identify peers that are attacking a fountain-code-based P2P network by
deliberately sending corrupt downloads (and as a side-effect can of course
detect accidental corruption as well.)

[1] [http://blog.notdot.net/2012/01/Damn-Cool-Algorithms-
Fountain...](http://blog.notdot.net/2012/01/Damn-Cool-Algorithms-Fountain-
Codes)

~~~
mindslight
I had written my comment after reading the performance section of the paper
where they point out that homomorphic hashing uses much less disk I/O than
hashing the entire output of a fixed-rate code while missing the point that
the receiver takes advantage of the homomorphism as well, and thus arrived at
the actual capabilities somewhat circuitously.

------
simcop2387
One thing probably worth keeping in mind is that you can keep everything in
much saner terms for CPU by doing something like [1]. This is how most
implementations of RSA handle large exponents also. It can make it very much
more efficient compared to handling some 2k bit numbers for multiplication in
intermediate steps.

[1] <http://en.wikipedia.org/wiki/Modular_exponentiation>

~~~
nicksdjohnson
Absolutely. It's still pretty expensive to take a 257-bit power of a 1024-bit
number with modular exponentiation, though.

------
biomcgary
Could this be used to secure a majority vote by asking people to submit their
chunk of one of two files? The only way to get either file is by having the
majority of the votes.

------
cyarvin
Best of all, the authors seem to have somehow evaded MIT's IPR office. (Shh!)
At least, I get squat for the lead author, Maxwell Krohn:

[https://www.google.com/search?tbo=p&tbm=pts&hl=en...](https://www.google.com/search?tbo=p&tbm=pts&hl=en&q=ininventor:krohn+ininventor:maxwell&num=10)

Not-patenting a (practically, not legally) patentable algorithm doesn't come
easily to academic researchers these days. The authors deserve kudos (unlike
the rapacious Michael Luby, fountain code inventor) for trying to make sure
people can use their algorithm sometime in the next 20 years.

------
sproketboy
Not that there's anything wrong with it.

------
sshconnection
Listening to the RNC in another tab, read this as homophobic hashing. Sigh.

~~~
mindslight
^ This is your brain on mass media. Any questions?

~~~
xianshou
Is it morally wrong for two threads to be writing files out of deadlock?

