
Abnormal String Hashing - r4um
http://pzemtsov.github.io/2019/10/07/abnormal-string-hashing.html
======
saagarjha
If you're curious on how to avoid this in your applications, take a look at
universal hashing:
[https://en.wikipedia.org/wiki/Universal_hashing](https://en.wikipedia.org/wiki/Universal_hashing)

~~~
jontro
How does that prevent the result being 0? Seems like an implementation detail
easily fixed as the author explains.

~~~
saagarjha
Well, there's two issues here. One is that zero is both a valid hash and a
sentinel, which is a Java implementation problem. The other issue is that an
attacker can pick a bunch of things that hash to zero and gum up your
application's performance, which is what universal hashing solves.

------
hinkley
Do all of those lookup tables for powers of 31 work once integer overflow
happens?

~~~
tom_mellior
What lookup tables for powers of 31 do you mean?

The HashMap implementation internally uses lookup tables (arrays) with sizes
that are powers of _2_ , which might be what you mean? The incoming hash code
can happily use all 32 bits of the int type, and its calculation is allowed to
overflow and wrap. When indexing into the hash bucket array, HashMap masks off
the extra bits that would lead to an index out of bounds exception otherwise.

~~~
hinkley
Look at the github repository.

He's practically turned the hash code calculations inside out to do
memoization, and one of the things he's doing is precalculating the hashes and
merging them.

He's making the observation that ((n * 31 + x) * 31 + y) * 31 + z = 31^3 * n +
31^2 * x + 31 * y + z

His code calculates (31^3 * n) + (31^2 * x + 31 * y + z ), memoizing the first
and second terms and combining them for each pairing of n, x, y, and z.

That's algebra, and of course it holds for all real numbers... if you have
infinite precision arithmetic. We do not. We have 32 bit integers, and
overflow behaves oddly in some cases. My question is does the overflow behave
_uniformly_ for all combinations of those two terms. I thought the answer was
'no', so I'm wondering if he's missing some matches.

~~~
tom_mellior
Thanks, I missed the point that this was related to the GitHub repository.

Integer addition and multiplication are associative and commutative, even in
the presence of two's complement wraparound. So this transformation is valid.
I don't have a source to point you to, though. I'd be interested in one, if
anyone has one. (Or just a proof, if it fits here.)

------
rurban
He really didn't understand Java's take on this. Being zero-insensitive
obviously is totally insecure. Java knew that. But Java decided to fight those
kind of attacks better than most others. Java has a still trivial insecure
hash function, which it decided to keep, because of an API blunder. But they
convert the collisions from a linked list to a tree on too many collisions
which indicate an active attack. Those attacks are rare, the common case is
still fast.

Zero-insensitivity would have been fixable trivially, perl fixed that with
5.18, but they couldn't, so they came up with a proper and much better fix.
Unlike perl and everyone else.

~~~
fanf2
This isn’t about hash collisions, it’s about unnecessarily recalculating the
hash value

------
hinkley
I'm looking at all of the lookup tables in that code and wondering how much
slower it would be to do a depth-first search and calculate as you went.

And then with a trie thrown in.

------
sorokod
This is fun, but with the test code running 10 million iterations to generate
"slower" numbers, is this of practical interest?

~~~
jbapple
It's a deterministic attack vector, so significant offline computation prep
work is not unusual.

[https://en.wikipedia.org/wiki/Algorithmic_complexity_attack](https://en.wikipedia.org/wiki/Algorithmic_complexity_attack)

[https://www.freecodecamp.org/news/hash-table-
attack-8e4371fc...](https://www.freecodecamp.org/news/hash-table-
attack-8e4371fc5261/)

~~~
sorokod
A DAV and a SOV huh? Can you share a practical implication?

------
jepcommenter
Arbitrary length string of null characters also produces zero hash, e.g.:
System.out.println("\0\0\0\0\0".hashCode());

