
Show HN: Tifuhash - Tiny Fast Universal Hash, using 64-bit continued fractions - 19eightyfour
https://www.npmjs.com/package/tifuhash
======
dchest

       > t.hash(0.00000000000000001)
       '0000000000000000'
       > t.hash(0.00000000000000002)
       '00000000ffffffff'
       > t.hash(0.00000000000000003)
       '00000000ffffffff'
       > t.hash(0.00000000000000004)
       '00000000ffffffff'
       > t.hash(0.00000000000000005)
       '00000000fdffffff'
       > 0.00000000000000001
       > t.hash(1e17)
       '5555555597d44646'
       > t.hash(1e18)
       '55555555ac43d2d1'
       > t.hash(1e19)
       '55555555acd2b64f'
       > t.hash('\000')
       '0000000000000000'
       > t.hash('\000\000')
       '5555555592244992'
       > t.hash('\000\000\000')
       '0000000000000000'
       > t.hash('\001')
       '9224499200000000'
       > t.hash('\002')
       '33333333dedddddd'

~~~
19eightyfour
_Edit: Update. I pushed a new version that addresses the collisions presented
by dchest. I invite you to take a look. Thanks!_

\--

Thanks for the feedback. I really appreciate it. Well done finding all those
collisions. This is serious! Here's an issue: [https://github.com/dosaygo-
coder-0/tifuhash/issues/3](https://github.com/dosaygo-
coder-0/tifuhash/issues/3)

Tho it's already valuable enough you pointing this out -- if you have any
ideas on how to improve for these cases, please let me know!

Note to self: This community is so good. If something technical gets a bit of
traction, you can count on people to find some holes in it. Very valuable.
Just wanna say words can't express my gratitude at people's efforts at this.
It's very helpful.

~~~
pawadu
Dumb questions, why is this a problem?

    
    
       t.hash(0.00000000000000002) == t.hash(0.00000000000000003)
    

How often would this cause issues in real-world?

~~~
elevensies
If you use it for a hash table, every collision has to be "resolved" in a way
that takes time:
[https://en.wikipedia.org/wiki/Hash_table#Collision_resolutio...](https://en.wikipedia.org/wiki/Hash_table#Collision_resolution)
.

So if a hash function has lots of collisions it will use the memory of the
hash table less efficiently and cause inserts and lookups to be slower. I
think collisions are undesirable for other reasons too, none of which are
coming to me right now.

~~~
pawadu
But doesn't every hash function have some collisions? Why would this
particular example be very bad?

~~~
19eightyfour
I'm of the same mindset. I mean -- surely there are some collisions always.
And I actually liked the property of the original version that a zero value
hashed to all zeroes. While the rest of the pre-image was shuffled. But I
suppose that being able to compute collisions is not a great thing. You may
think that if it is not intended to be used in a cryptographic sense it does
not matter. However, if you can easily compute collisions you could,
theoretically of course, launch a denial-of-service attacks on a service
(cache denial) by sending lots of values to the same hash slot. I mean, with
probing and so on there are ways around this and in the end maybe the
conventional wisdom ought to be revised, maybe there isn't so much to worry
about from a few collisions, even easy to compute ones. But, as it stands,
right now, the market demands collision resistance. So I damn well gave it to
them.

------
lorenzhs
In addition to the things others have pointed out, note that "universality" is
a well-defined concept w.r.t. hash functions:
[https://en.wikipedia.org/wiki/Universal_hashing](https://en.wikipedia.org/wiki/Universal_hashing)
\- you should probably not use it to describe your hash function if you can't
show universality.

~~~
19eightyfour
I think it is probably universal, but I have to show it to know for sure. I'm
going to keep calling it universal until I know more.

I think I note it in the README, but some reasons I think it is probably
universal are because tifuhash can be parameterized, and it has good
independence properties ( passes PractRand ).

Showing it is universal or not is another step. I suppose I could experiment,
to see if the collision probabilities match the criteria for universal or
k-universal. Or maybe I could show it. For showing it, my next step is to read
over this paper[1] to see if I can use its methods to show universality. I
think it is using properties of the bits under multiplication. And I believe
one avenue would be to show it using properties of the bits under division. I
don't know how to approach the bits under division right now.

[1]: [https://arxiv.org/abs/1504.06804](https://arxiv.org/abs/1504.06804)

~~~
lorenzhs
I admire your confidence but I don't share it - showing universality for a
complex construct like yours is rather nontrivial. You should really start by
testing with SMHasher, though. Best of luck.

~~~
19eightyfour
Thanks for encouraging me to use SMHahser.

I posted the SMHasher results here: [https://github.com/dosaygo-
coder-0/smhasher/blob/master/tifu...](https://github.com/dosaygo-
coder-0/smhasher/blob/master/tifuhash.results.txt)

And my SMHasher fork testing tifuhash is here: [https://github.com/dosaygo-
coder-0/smhasher](https://github.com/dosaygo-coder-0/smhasher)

------
rjeli
Sorry, I can only read this as "Today I Fucked Up hash"

~~~
AstralStorm
It probably is. So what is wrong with say MurmurHash that you have to hack
your own?

~~~
jimktrains2
Yeah, it's one thing to create it for fun, but why publish it as a library for
general consumption?

~~~
rjeli
I strongly disagree with this idea - code should always be released! If it's
good, we can avoid rewriting code and have more options for hashing, and if
some people point out problems with the code, the author can learn from their
mistakes and fix them if possible - they wouldn't want to use it in their
private code with the mistakes.

I think this thread is more negative than the author deserves. The method of
hashing is novel and could possibly lead to exploration in the area, and if
not that, where else could the author post to get feedback from the author of
murmurhash??

I think the only problem here is that they advertise it as production
ready/universal without evidence or rigorous testing, rather than "hey, here's
a cool hash I came up with."

~~~
jimktrains2
> I strongly disagree with this idea - code should always be released!

There is a difference between releasing code, say on github, and putting it
out there like it's ready for use in a production system, like in a package
manager. I only said the later shouldn't be done, not the former.

Rereading, I guess that wasn't exactly what I said. I can't seem to edit now,
though.

------
richdougherty
> Novel: using division, or using floating point division, and discarding the
> high-order-bits ( as we do here ) is not used nor studied so much, if at
> all, in hash construction

Is that because division is slow compared to bit operations?

~~~
19eightyfour
I think it's something like that, that division is slower than multiplication
in the chip.

Also, a lot of hashing had to do with prime fields, and polynomials in them,
and people seem to be able to do most of the maths without division. Or when
they do divide, then are using powers of two, so thru can just use bit shifts,
which are faster.

So there's a lot of "discrete" type math in hashing...i don't see why we
couldn't have more continuous or factional math.

Afterall, Huffman coding is good, but arithmetic coding, which i believe uses
fractional math, is better. Probably more hashing can be done the way that I'm
doing here. So happy to find this new thing.

I'm thinking FPGA... They do floating point right? GPU? Same right? I think
continued Egyptian fraction hashing could be on the up.

------
johnklos
"Uses two 64 bit floating point integers for calculation"? Floating point
integers?

~~~
bhaak
My snarky self woke up and said "That's an appropriate name for the integer
simulation of JavaScript".

------
davman
This is connected with a truly amazing website that does not make my eyes
bleed in any way.

[https://dosaygo.com/](https://dosaygo.com/)

~~~
Asooka
I'll say this though - the site is absolutely 1000% more usable than the
single-page-app-that's-actually-a-document style du jour. Just tone down the
colours and remove the horizontal scrollbar (put the two things one below the
other). Other than that, this is definitely a motherfucking website :) (
[http://motherfuckingwebsite.com/](http://motherfuckingwebsite.com/) )

~~~
19eightyfour
Thank you for saying this. I am fond of my robust as nails site. I like that
it can work on lynx. I shall take your improvement ideas under advisement.

