Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wish there was a consensus non-cryptographic hash algorithm. Something that hashes arbitrary data to 128 or 256 bit keys with good randomness, few collisions on typical input data, and universal implementation.

Most programmers I know reach for SHA-1 (or if we're being honest, MD5). But a cryptohash is really wasteful if you don't need the cryptographic properties.

Last I looked at this Murmurhash was the closest to widely used with some folks saying Metrohash was better. But that was years ago. Judging by this discussion xxHash is the new hotness?



You want to have XXH128 for that right? 128 bit, portable, virtually impossible to collide, and only slightly slower than XXH3 while still way faster than most options.


xxHash is not really "new", it's been available since 2014. At least, that's the oldest Python bindings available on PyPI:

https://pypi.org/project/xxhash/0.0.1/

Sure, as it developed, xxHash variants appear. The latest in the family is XXH3 and it's already being used at least since 2019:

http://fastcompression.blogspot.com/2019/03/presenting-xxh3....


> I wish there was a consensus non-cryptographic hash algorithm

I think Siphash serves that role pretty well. It's one of the most if not the most secure among non-cryptographic hashes, and quite speedy, especially with SIMD implementations. One need only consider alternatives if one wants to trade off a bit of security for even higher speed.


Siphash is more like a crypto has right? According to smhasher its performance is somewhere between a non-crypto hash like xxHash or Meow and something like SHA-1.


As its Wikipedia page says:

Although designed for use as a hash function to ensure security, SipHash is fundamentally different from cryptographic hash functions like SHA in that it is only suitable as a message authentication code: a keyed hash function like HMAC. That is, SHA is designed so that it is difficult for an attacker to find two messages X and Y such that SHA(X) = SHA(Y), even though anyone may compute SHA(X). SipHash instead guarantees that, having seen Xi and SipHash(Xi, k), an attacker who does not know the key k cannot find (any information about) k or SipHash(Y, k) for any message Y ∉ {Xi} which they have not seen before.


> good randomness, few collisions on typical input data, and universal implementation

The relevant standards body NIST has issued SHA-3 in 2015.

Spread the word that programmers should not reach for outdated hash algos. <https://valerieaurora.org/hash.html>


The chart on that page has always squicked me out a bit, because SHA-2 is certainly still "considered strong" by cryptography engineers.


Consider that the people working for you are experts, but the rest of the world on average is not. If one cannot make a judgement whether <https://eprint.iacr.org/2004/207> weakens SHA-2 for a particular use case or not, then it is safer to assume the worse, and simply use SHA-3 instead.


> the people working for you are experts

This is tangential, but your facts about tptacek might be a bit out of date. He's no longer at Latacora; he's now working at fly.io. So I think it's safe to say he has recent experience working with people who, while they're skilled developers, aren't security experts.


Marc Stevens who broke sha1 has said he doesn't think sha2 will get broken any time soon. See https://twitter.com/veorq/status/1128273048367398912


See, case in point. She should change the chart so SHA-2 has an all-green line. It's clearly misleading people.


Which do you want? Something non-wasteful, or something good with consensus? Something that everybody agrees on isn't necessarily fast or new or hot.


I’m curious too about the differences between Murmurhash and xxHash. I’ve been using Murmur in my side projects.


The production of seed-independent collisions for various versions of murmurhash (e.g., http://emboss.github.io/blog/2012/12/14/breaking-murmur-hash...) motivated siphash. In general, when there's no positive proof of collision bounds, I would assume (un)lucky inputs can collide much more often than usual, even when you change the seed.

The difference between murmurhash and xxhash in that regard is that anyone can download code to generate murmurhash collisions, while code to make xxhash cry is still unknown/secret.

XXH3 is also a lot faster for long strings, and I believe comparable to murmurhash for short inputs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: