Hacker News new | past | comments | ask | show | jobs | submit login
Reversing an integer hash function (taxicat1.github.io)
87 points by flebron 4 days ago | hide | past | favorite | 16 comments

I'm collecting such inverters at https://github.com/rurban/smhasher/tree/inverse/inverse

But only 3 so far.

For the cryptopals challenge, you have to invert the Mersenne Twister scramble function:



I wrote an inverter for XXHASH64, in python: https://gist.github.com/DavidBuchanan314/13020be9e2a251e2a2a...

Is there a category of hash functions that hash a 64/32 bit input to exactly 64/32 bits output, such that all inputs are uniquely preserved? This could be an interesting property for a hash table of integers, because a hash match implies a key match.

Even if you used such a bijective hash, your hash table would have to have 2^32 available buckets in order for the hash match to be all you need for lookup. Or 2^64 for 64-bit… which is why nobody really does this. (And if you did this, why even hash the input? The input integer could just be the key, and you’re basically using a really big sparse array.)

I agree that for standard hash tables this won’t work. However, I recently read about “Ideal Hash Trees” [0], that preserves the entire hash. That design also requires all bits to be “random” because it lacks the usual prime division.

[0] https://lampwww.epfl.ch/papers/idealhashtrees.pdf

I think the word you are looking for is "permutation".

However, if you can use a permutation as a hash function, you might be fine with just using the identity function.

> However, if you can use a permutation as a hash function, you might be fine with just using the identity function.

This certainly isn't true for 8-bit characters using ASCII.

The top-bit of all ASCII strings is always zero, its effectively a wasted bit. Permuting all ASCII characters to randomly use all 8-bits means a better distribution in virtually any hash-based data-structure. (ex: 64 slot hashtable will have fewer collisions after you permute the 7-bit ASCII into an 8-bit random permutation)

This would be a “bijective function” or a “perfect hash function”, although the latter is usually used when the input is much smaller than the output.

Linear congruential generators in some cases can do that, though more often they are used to convert n bits of input data into a uniform distribution over a range from 0-m. For instance simulating chance in a game (dice rolls, or % probability).

Basically every cipher works like this.

XXHASH64 is bijective for 64-bit input strings.

For 64bit: (triple)DES should do the trick.

A hash is typically used in the context of mapping arbitrary size input to fixed size output.

> otherwise you could simply trace backwards and generate an input that produces a specific hash

This is when I know for sure that I'm not included in that "you".

Excellent tutorial for bitwise arithmetic this is. The key is the motivation you receive from the prospect of being able to do something that seems "l33t".

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact