I wish there was a consensus non-cryptographic hash algorithm. Something that hashes arbitrary data to 128 or 256 bit keys with good randomness, few collisions on typical input data, and universal implementation.
Most programmers I know reach for SHA-1 (or if we're being honest, MD5). But a cryptohash is really wasteful if you don't need the cryptographic properties.
Last I looked at this Murmurhash was the closest to widely used with some folks saying Metrohash was better. But that was years ago. Judging by this discussion xxHash is the new hotness?
You want to have XXH128 for that right? 128 bit, portable, virtually impossible to collide, and only slightly slower than XXH3 while still way faster than most options.
> I wish there was a consensus non-cryptographic hash algorithm
I think Siphash serves that role pretty well. It's one of the most if not the most secure among non-cryptographic hashes, and quite speedy, especially with SIMD implementations. One need only consider alternatives if one wants to trade off a bit of security for even higher speed.
Siphash is more like a crypto has right? According to smhasher its performance is somewhere between a non-crypto hash like xxHash or Meow and something like SHA-1.
Although designed for use as a hash function to ensure security, SipHash is fundamentally different from cryptographic hash functions like SHA in that it is only suitable as a message authentication code: a keyed hash function like HMAC. That is, SHA is designed so that it is difficult for an attacker to find two messages X and Y such that SHA(X) = SHA(Y), even though anyone may compute SHA(X). SipHash instead guarantees that, having seen Xi and SipHash(Xi, k), an attacker who does not know the key k cannot find (any information about) k or SipHash(Y, k) for any message Y ∉ {Xi} which they have not seen before.
Consider that the people working for you are experts, but the rest of the world on average is not. If one cannot make a judgement whether <https://eprint.iacr.org/2004/207> weakens SHA-2 for a particular use case or not, then it is safer to assume the worse, and simply use SHA-3 instead.
This is tangential, but your facts about tptacek might be a bit out of date. He's no longer at Latacora; he's now working at fly.io. So I think it's safe to say he has recent experience working with people who, while they're skilled developers, aren't security experts.
The production of seed-independent collisions for various versions of murmurhash (e.g., http://emboss.github.io/blog/2012/12/14/breaking-murmur-hash...) motivated siphash. In general, when there's no positive proof of collision bounds, I would assume (un)lucky inputs can collide much more often than usual, even when you change the seed.
The difference between murmurhash and xxhash in that regard is that anyone can download code to generate murmurhash collisions, while code to make xxhash cry is still unknown/secret.
XXH3 is also a lot faster for long strings, and I believe comparable to murmurhash for short inputs.
Most programmers I know reach for SHA-1 (or if we're being honest, MD5). But a cryptohash is really wasteful if you don't need the cryptographic properties.
Last I looked at this Murmurhash was the closest to widely used with some folks saying Metrohash was better. But that was years ago. Judging by this discussion xxHash is the new hotness?