I wish there was a consensus non-cryptographic hash algorithm. Something that ha...

chronogram · on Oct 29, 2021

You want to have XXH128 for that right? 128 bit, portable, virtually impossible to collide, and only slightly slower than XXH3 while still way faster than most options.

pepoluan · on Oct 29, 2021

xxHash is not really "new", it's been available since 2014. At least, that's the oldest Python bindings available on PyPI:

https://pypi.org/project/xxhash/0.0.1/

Sure, as it developed, xxHash variants appear. The latest in the family is XXH3 and it's already being used at least since 2019:

http://fastcompression.blogspot.com/2019/03/presenting-xxh3....

tromp · on Oct 29, 2021

> I wish there was a consensus non-cryptographic hash algorithm

I think Siphash serves that role pretty well. It's one of the most if not the most secure among non-cryptographic hashes, and quite speedy, especially with SIMD implementations. One need only consider alternatives if one wants to trade off a bit of security for even higher speed.

NelsonMinar · on Oct 29, 2021

Siphash is more like a crypto has right? According to smhasher its performance is somewhere between a non-crypto hash like xxHash or Meow and something like SHA-1.

tromp · on Oct 29, 2021

As its Wikipedia page says:

Although designed for use as a hash function to ensure security, SipHash is fundamentally different from cryptographic hash functions like SHA in that it is only suitable as a message authentication code: a keyed hash function like HMAC. That is, SHA is designed so that it is difficult for an attacker to find two messages X and Y such that SHA(X) = SHA(Y), even though anyone may compute SHA(X). SipHash instead guarantees that, having seen Xi and SipHash(Xi, k), an attacker who does not know the key k cannot find (any information about) k or SipHash(Y, k) for any message Y ∉ {Xi} which they have not seen before.

bmn__ · on Oct 29, 2021

> good randomness, few collisions on typical input data, and universal implementation

The relevant standards body NIST has issued SHA-3 in 2015.

Spread the word that programmers should not reach for outdated hash algos. <https://valerieaurora.org/hash.html>

tptacek · on Oct 29, 2021

The chart on that page has always squicked me out a bit, because SHA-2 is certainly still "considered strong" by cryptography engineers.

bmn__ · on Oct 30, 2021

Consider that the people working for you are experts, but the rest of the world on average is not. If one cannot make a judgement whether <https://eprint.iacr.org/2004/207> weakens SHA-2 for a particular use case or not, then it is safer to assume the worse, and simply use SHA-3 instead.

mwcampbell · on Oct 31, 2021

> the people working for you are experts

This is tangential, but your facts about tptacek might be a bit out of date. He's no longer at Latacora; he's now working at fly.io. So I think it's safe to say he has recent experience working with people who, while they're skilled developers, aren't security experts.

wolf550e · on Oct 30, 2021

Marc Stevens who broke sha1 has said he doesn't think sha2 will get broken any time soon. See https://twitter.com/veorq/status/1128273048367398912

tptacek · on Oct 30, 2021

See, case in point. She should change the chart so SHA-2 has an all-green line. It's clearly misleading people.

throwaway984393 · on Oct 30, 2021

Which do you want? Something non-wasteful, or something good with consensus? Something that everybody agrees on isn't necessarily fast or new or hot.

pspeter3 · on Oct 29, 2021

I’m curious too about the differences between Murmurhash and xxHash. I’ve been using Murmur in my side projects.

pkhuong · on Oct 29, 2021

The production of seed-independent collisions for various versions of murmurhash (e.g., http://emboss.github.io/blog/2012/12/14/breaking-murmur-hash...) motivated siphash. In general, when there's no positive proof of collision bounds, I would assume (un)lucky inputs can collide much more often than usual, even when you change the seed.

The difference between murmurhash and xxhash in that regard is that anyone can download code to generate murmurhash collisions, while code to make xxhash cry is still unknown/secret.

XXH3 is also a lot faster for long strings, and I believe comparable to murmurhash for short inputs.