
New fastest portable hash: wyhash - rurban
https://github.com/rurban/smhasher/
======
panic
Here's a link to wyhash itself: [https://github.com/wangyi-
fudan/wyhash](https://github.com/wangyi-fudan/wyhash)

~~~
WallWextra
Maybe the link could be updated to point to this instead?

~~~
throwawaymath
Yeah the submitter didn't link to wyhash, they linked to their own
project...without the title there'd be no context here. SMHasher is well known
(at least the original Google project is, maybe not this fork), but why not
just submit wyhash directly? There's no explanation of how wyhash works here,
and I can't find a paper.

------
haberman
I would love for this to be solid, but something here seems too good to be
true. Modern hashing is well-studied, with lots of smart people attacking the
problem. Modern hashes like Google's FarmHash are thousands of lines of code
with platform-specific intrinsics for acceleration. This hash claims to beat
other modern hashes in speed, without using platform-specific intrinsics, and
without quality problems, and in a mere 100 lines of code to boot.

I would love for this to be true, but I'd love to see a more thorough
explanation of how it manages to be both smaller and faster than other
competing hashes.

~~~
stcredzero
_I would love for this to be solid, but something here seems too good to be
true._

How so? It's much slower than AES based hashes. It's about the throughput same
as xxHash64. It uses fewer cycles/hash than xxHash64.

~~~
haberman
> It's much slower than AES based hashes.

I must have misinterpreted the following:

> So the fastest hash functions on x86_64 without quality problems are: > > \-
> wyhash > \- t1ha > \- [...]

I interpreted that to be an ordered list, saying that wyhash was the fastest.

But looking at the actual table I am confused. There are MB/s numbers and
cycles/hash numbers. In cycles/hash wyhash appears to beat all FarmHash
variants, but in MiB/s it is slower than some of them. I don't understand why
these two performance measures would not track perfectly, since the CPUs
should be running a constant number of cycles/second.

~~~
stcredzero
_I don 't understand why these two performance measures would not track
perfectly, since the CPUs should be running a constant number of
cycles/second._

x86 isn't RISC and not all operations take the same number of cycles. Also,
the code might have different levels of possible parallelism and might impact
the pipeline differently.

Comp Sci education for the low level basics is often completely neglected
nowadays. This should be freshman year stuff. I'm certainly not an expert,
myself, just familiar with the issues. (Know enough to know what you don't
know.)

~~~
haberman
> x86 isn't RISC and not all operations take the same number of cycles.

Yes but it doesn't say _instructions_ per hash, it says _cycles_ per hash.
Unless some kind of frequency scaling is going on, the number of cycles per
second should be very consistent. If bytes/hash is held constant and
cycles/second is constant, then MiB/second and cycles/hash should be exact
inverses. I don't understand why this is not the case in these tables.

> Also, the code might have different levels of possible parallelism and might
> impact the pipeline differently.

Again this can affect the number of instructions being retired, but not the
number of cycles. A cycle is a cycle, regardless of how much work is actually
being accomplished.

> Comp Sci education for the low level basics is often completely neglected
> nowadays. This should be freshman year stuff.

I'm a low-level junkie who lives in godbolt.org and Agner Fog's tables, writes
JIT compilers, and does FPGA design for fun on the side. It's possible that
I'm mistaken here, but I do have a fair amount of background in this.

~~~
ncmncm
I wonder if the first number is sustained, pipelined rate for big values, and
the second number is cycles for the first iteration, still assuming everything
is in L1 cache.

Those would be the useful numbers.

------
WallWextra
The small key performance is very nice. It would be good to have more written
about this new hash function, and less misinformed rant about siphash.

~~~
zingmars
For someone who isn't following the hashing function 'scene', what's
misinformed about this rant?

~~~
WallWextra
Most of his arguments simply do not apply to the relevant use case and
attacks. The only one that does, the claim that the seed is recoverable, is
not an argument at all but a vague, unsubstantiated claim.

------
shin_lao
From what I see Farmhash is still faster on IA32/AMD64 architectures (although
they need to leverage the AES instruction for that, granted).

~~~
jandrewrogers
Algorithms using AES intrinsics are generally fastest at all key sizes now.
For small keys, the AquaHash[0] algorithm is extremely fast, and several
algorithms (e.g. MeowHash[1] and the aforementioned AquaHash) are essentially
at the hardware limits for large keys. Algorithms like wyhash are useful for
things like embedded CPUs without AES support, but if you are targeting
x86/ARM then AES is often a better choice.

[0]
[https://github.com/jandrewrogers/AquaHash](https://github.com/jandrewrogers/AquaHash)
[1]
[https://github.com/cmuratori/meow_hash](https://github.com/cmuratori/meow_hash)

------
orasis
I’d really like to see progress on hashes that fit into a few lines of code.
It seems FNV-1a is still the best option in that regard.

~~~
bradleyjg
If this is all there is to it: [https://github.com/wangyi-
fudan/wyhash/blob/master/wyhash.h](https://github.com/wangyi-
fudan/wyhash/blob/master/wyhash.h)

That doesn’t seem like much.

~~~
haberman
Agreed, if that's the whole hash function I think small code size is a selling
point of this hash compared with other modern fast hashes.

This hash function looks like it will do unaligned reads if you're hashing an
unaligned string. This doesn't matter on x86, but portable code should avoid
this. It would be helpful to have a wrapper function that could handle
unaligned strings by special-casing the begin and end of buffer.

------
majke
> SipHash is not secure enough for security purposes and not fast enough for
> general usage.

I beg your pardon? The author seems very confused.

~~~
klodolph
Oof.

To expand on this, SipHash is designed to mitigate a certain kind of DOS
attack. It’s not a cryptographic hash. SipHash will still be a top choice for
general purpose hashing with inputs that can be chosen by an adversary.

------
switch33
[https://github.com/switch33/sha2592](https://github.com/switch33/sha2592)
something I just coded wondering how it measures up.

and i made another one too:
[https://github.com/switch33/sha29893](https://github.com/switch33/sha29893)

and a third one:
[https://github.com/switch33/sha5987](https://github.com/switch33/sha5987)

and an even better one:
[https://github.com/switch33/sha2999999](https://github.com/switch33/sha2999999)

and maybe the best for quite some time:
[https://github.com/switch33/sha130000000-](https://github.com/switch33/sha130000000-)

------
GoblinSlayer
>The fast hash functions tested here are recommendable as fast for file
digests and maybe bigger databases, but not for 32bit hash tables.

Do people really try to optimize file hashing this way?

------
winnin_the_game
To me, the most interesting part:

>> Even if [...] worse hash functions will lead to more collisions, the
overall speed advantage beats the slightly worse quality.

~~~
mises
I'm not sure if that part is correct. If I use something like SHA1 or MD5
(yes, I know they're no longer considered secure), __know __with essentially
100% certainty that I will not get a collision. This means I don 't have to
implement chaining, quadratic probing, or another method of collision
avoidance. Though it might cost some speed on the end use of the app, it's
often easier for initial development because I simply have to write less code.

Maybe a more experienced dev can comment on this? Still learning here, and
haven't spent time in a lot of large codebases.

~~~
winnin_the_game
The strong collision avoidance guarantee is mostly a result of the large space
of hashes those functions generate. So, yes, if you have a hashmap with 2^128
slots, you're technically right. But you don't have a hashmap with 2^128
slots. So you have to smash that output space into a smaller space somehow.
And then you're back to considering what to do when you have collisions.

------
ape4
Not an expert, would have thought that collision avoidance would be the most
important criteria.

~~~
rurban
Not in hash tables. There the smaller, the faster. The selection criteria is:
not bad (not failing any test), small and fast. Esp. it needs to be inlinable.

There will always be collisions on dynamic workloads. Otherwise you would
choose perfect hashes. In the usual programming language case SPOOKY32 has the
least collisions, and is pretty fast too. But it has no chance against the
small hash functions.

The smhasher speed test doesn't tell you which hash will be the fastest in a
hash table with small key lengths, only when used as digest. e.g. for bigger
files, db or network blobs. The icache footprint in comparison to all the hash
table code is very important.

~~~
asdkhadsj
> Not in hash tables. There the smaller, the faster. The selection criteria
> is: not bad (not failing any test), small and fast. Esp. it needs to be
> inlinable.

Can you explain why? Like what use cases are there where you don't actually
care how likely something is to collide, you just want it to be fast?

I've always thought being able to predict the chance of collision to be the
most important factor on a hasher. When is it not?

~~~
dagss
First off, for small tables, you always take the hash value modulo some small
number like 10000. Who cares how good the hash function is, if you end up with
1/10000 chance of collision in the end anyway? Speed is obviously more
important in this setting.

Second, assume very large hash tables...assume that with noncollision using
the data structure takes time T1(H) and with collission it takes T2(H) for
hash function H, and that the probability of collision is P(H).

So your total cost is then

P(H) T2(H) + (1 - P(H)) T1(H)

Which is approximately

P(H) T2(H) + T1(H)

Easy to play with numbers so that a worse but cheaper hash has a lower total
cost. In fact this will usually be the case I would expect... T2 is multiplied
with P and disappears for all but the very worst hashes/extremely expensive T2

------
NicoJuicy
For security porpose, i thought -> slower performance = better.

This would delay rainbow tables

~~~
shin_lao
These kind of hashes are used for hash tables, where you want a good
compromise between speed and low collision probability. They can also be used
as checksums to detect accidental data modification.

They are not meant for cryptographically strong fingerprinting.

