Hacker News new | past | comments | ask | show | jobs | submit login

> 9-16 was a better spot and it simplified from 2 multiplications to just one and a rotation

I'm very confused as to why rotation was at all useful. Xoring with a random-ish constant makes sense, because the constant has high entropy and is likely to decorrelate bits from the input (also can use a different constant per hash table). But rotating by a constant—and a fixed one at that—seems like it just accounts for expected input distribution. Especially (assuming this is intended for text) if shifting by a value >8 makes a significant difference (vs shifting by the same value mod 8), it smells like serious overfit. Could be useful for something like a perfect hash, but seems problematic and prone to issues as a general hash.

Edit: to make my objection clearer: the hash simply replaces lo with rotr(lo, 53). If rotr(lo, 53) performs significantly better than rotr(lo, 53 mod 8), then that implies the following. I can take a set of strings, and I can apply the same permutation to the characters of all of the strings in the set, and this will significantly affect the quality of the hash function. That seems like an undesirable property, even for a non-cryptographic hash.




Does that end up moving high bits into the low bits? That could possibly be very helpful for all sets of strings, since the equality test will start with the first byte, so it could better to have the rotr on the hash so that the hash is less affected by the first byte (and more affected by later bytes). Just hypothetically speaking that is where the implication could break down, since it doesn’t consider the non-uniform cost of the equality.


> the equality test will start with the first byte

I would expect the equality test to compare at least a full word at a time, just as the hash hashes at least a full word at a time.


I didn’t look at their implementation, but in general, strings don’t have to be aligned so you can only peek one byte at a time looking for the end, besides not wanting to annoy valgrind and other similar UB detection tools by reading past the end of the string.


Strawman; nul-terminated strings are horribly slow for nearly every application. Hence I assume (especially given that they are using c++) that they are using length-accompanied strings rather than nul-terminated ones.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: