
Bob outsmarts Alice's one way function - jgrahamc
http://blog.jgc.org/2013/04/bob-outsmarts-alices-one-way-function.html
======
rlpb
"In the real, mathematical world of one way functions Bob's reverse dictionary
is called a 'rainbow table'..."

No; it's just a reverse lookup table. A "rainbow table" is a specific space
optimization on top of a reverse lookup table. See the Wikipedia article you
linked to:

"To address this issue of scale, reverse lookup tables were generated that
stored only a smaller selection of hashes that when reversed could generate
long chains of passwords. Although the reverse lookup of a hash in a chained
table takes more computational time, the lookup table itself can be much
smaller, so hashes of longer passwords can be stored. Rainbow tables are a
refinement of this chaining technique and provide a solution to a problem
called chain collisions."

~~~
jgrahamc
Yes, that was sloppy of me. I will correct.

~~~
withoutthis
Link before edit: <http://archive.is/3i48g>

~~~
lucb1e
Didn't know archive.is, cool site

------
drostie
In fact I think this would be more informative as a description of rainbow
tables, so here goes:

Okay, so Bob in principle can write down a lookup table, but there's a problem
-- that lookup table is going to be as long as the dictionary! What if he has
to take it somewhere, or if he wants to make a copy for someone? That's
serious work each time. We'd like to take the 100,000 words in the dictionary
and convert them to only, say, 1,000 words.

Fortunately for Bob, and unfortunately for Alice, one-way functions come with
a really simple _compression_ method, where you can reduce the look up table
as much as you want. This will make extracting the actual word slower, but you
don't have to store a huge book anywhere.

Here is how it works. We work with 'chains' of one-way computations. See, our
one-way function maps ORNATE → CEASE. Now we 'chain' the function by looking
up CEASE and we find that it maps CEASE → SNIDE. We can "chain" it again to
find SNIDE → JOYRIDE.

After a hundred of these chained together, we find that ORNATE ⇒ HOMOPHONE. In
our lookup table we store the entry 'homophone: chain starts with ornate.' And
I claim that now the lookup table can be about a hundred times smaller. Why?
Because CEASE and SNIDE now do not need to start chains of their own! They are
covered by the chain which starts with ORNATE. These chain tables are called
"rainbow tables."

It might help to see how we use the rainbow table to look up some word near
the end of this chain. Let's say that Alice has the word VEGAN and calculates
that VEGAN → ORATION, so she sends us ORATION. Bob can't easily reverse this,
but he can do the one-way function to ORATION and find some other word,
ORATION → PERSECUTOR. But Bob can't find that in our rainbow table! So he does
the one-way function again, PERSECUTOR → PAY. He can't find that either! So he
does the one-way function again, PAY → CREATIVE. Nope, still no help! But
then, CREATIVE → HOMOPHONE, and we find in our rainbow table that yes, we know
how to make HOMOPHONE, because the rainbow table, remember, stored the fact
that HOMOPHONE ⇐ ORNATE.

Now we use the one-way function again, starting with ORNATE → CEASE → SNIDE →
... → VEGAN → ORATION. Hey look, we found ourself back at ORATION, but now we
know a word (VEGAN) which maps to ORATION! It's a long process of about 100
queries of the one-way function, but it's not so long -- because remember, the
one-way function was fast.

There is a one-way function called SHA1 sometimes used to protect passwords.
Today you can download a big file -- 86 GB -- which is just a rainbow table
for passwords hashed with it. This rainbow table covers all passwords from 1
to 7 characters -- any combination of upper- or lower-case letters, numbers,
spaces, special symbols, anything. That's about 95^7 = 70 trillion passwords!
But it doesn't take ~70 trillion bytes, it only takes ~70 billion bytes,
because it's been compressed by a factor of 1000 this way.

There's also a lookup table for 1-10 character passwords which are made of
lowercase letters and numbers only, so about 36^10 = 3.6 quadrillion
passwords. The file size is 588 GB, it's been compressed by a factor of 5,000
to make it possible to store on a modern hard drive. So if your password is
all lower-case letters and numbers, it had better be a very long password, at
least 12 characters long.

~~~
cs702
drostie: this is the most accessible plain-English explanation of rainbow
tables I've ever seen, and it's short to boot. From now on, whenever anyone
asks me how stolen lists of hashed passwords get hacked, I will point them to
your comment. Thank you.

~~~
aidenn0
Of course a lot of hashed passwords have been hacked not because of rainbow
tables, but by brute force because the site used a single round of a fast hash
function.

They think they are good with a 4-byte salt and one round of sha-1, since that
is effectively immune to rainbow-table attacks, but its' not immune to "I have
a massively powerful processor in my computer called a 'video card'"

------
dkwak
I think a more salient point than "Bob only has to do the work once" is "Bob
has done the work ahead of time", no? In fact, even in the long run, Bob is
doing a lot more work than necessary (computing inverse one-way functions for
all/most possible inputs). It's just that he's preemptively computing them.

~~~
Retric
The point is he wants to look-up more than one example because the crossword
contains several words. Going though the full dictionary N times is harder
even if on average he only needed to try half of it than doing it once.

------
marcin_kw
The author did not realize that the "repetitive dictionary lookup" function is
not a 1-to-1 mapping. There could be, for example, multiple words whose
definition starts with "a type". All these words map to the same ciphertext,
eventually.

The "repetitive dictionary lookup" function is very useful for validating
Alice's claims of solving parts of the puzzle, but imperfect for deriving the
solution from a ciphertext.

~~~
jgrahamc
I didn't? Or perhaps you didn't read my blog post where at the end I say...

"PS In reality, Alice's one way function is not 'bijective'. That means that
some starting words will end up at the same word. For example, when FOLIO,
PIECE and WORLD are passed through Alice's one way function they all end up at
THINGS. In a follow up post I'll take about this and its implications."

~~~
peteretep
That property is that it's not _injective_. It is also ("implies") not
bijective, but that's a different thing.

~~~
jgrahamc
True. Given the desire not to talk about mathematics in these blog posts I am
opting to excise that rather than fix.

------
lucb1e
I'm not sure if I'd call this brute force attack "outsmarting", but I see the
point. Good way to explain how reverse lookup tables work.

------
petsos
> In the real, mathematical world of one way functions something similar to
> Bob's reverse dictionary can be created (they are called 'rainbow tables')
> and they are part of the reason passwords get broken easily when companies'
> password databases get stolen.

In real life I don't think rainbow tables play any part at all.

~~~
B-Con
They very likely come into play whenever a password hash database is leaked
and the hashes were not salted/iterated properly.

------
sophacles
jgrahamc - I think you should introduce the concept of collision here. In the
last article when you got to "things", my thought was: there have to be a few
words that get to "things". In your reverse lookup table of the dictionary
algorithm, there will be a lot of words for, e.g. "the", "or", "and" and so
on. This is actually nice - because you start getting to other security
concerns from here. For the crossword, you can talk about search spaces:
instantly eliminate all words not of length N. For passwords, of course keys
with lots of collisions reduce the search space, so that passwords with a high
collision probability are less secure. And so on.

I'm not exactly sure what direction you're planning on taking this series, but
astute readers, even without a solid grounding in crypto concepts, will
notice.

~~~
jgrahamc
Yes. I mention this in the first PS. I plan to address this in a separate blog
post but have been trying to keep these bite-sized.

~~~
sophacles
Oh right on. I either missed the PS on my first reading, or read it before you
added it. Either way, cool :)

