
SeaHash: Explained - bretthoerner
https://ticki.github.io/blog/seahash-explained/
======
gbrown_
Discussion that came up the other week for those that missed it.

[https://news.ycombinator.com/item?id=13057797](https://news.ycombinator.com/item?id=13057797)

------
thomasahle
> `f : {0,1}^n → {0,1}^n` is a perfect PRF if and only if given a distribution
> `d : {0,1}^n → [0,1]`, `f` maps inputs following the distribution `d` to the
> uniform distribution.

Is this even possible? If the distribution `d` always returns 0, how can a
function make it uniform?

It would be nice, if the article would go into more details on how SeaHash
obtains this property, and how it related to collision avoidance.

~~~
petters
It's not clear what the author means here. This is not a very mathematical
article.

(Nit: d is a distribution so it can not always return 0 -- it has to sum to 1.
But your point stands.)

~~~
pdpi
I believe the point is that you get a uniform distirbution of 1s and 0s after
going through the PRF irrespectively of the distribution of 1s and 0s in the
input. This is certainly a desirable property for a hash function.

------
johnnycarcin
ticki and the rest of the crew working on Redox (jackpot51) are really doing
some neat stuff. a lot of it is over my head but it is cool to see something
being built from the ground up using some new techniques and ideas.

------
libeclipse
Could someone explain what the point is? Is there a use-case for this for
"general hashing" and such, where sha256 is _genuinely_ insufficient?

~~~
llimllib
A good example is for a bloom filter: using sha256 is much too slow for good
filter performance, you want something like siphash or some other non-
cryptographic hash.

Here's a good story of the performance benefits of switching from
cryptographic to non-crypto hashes:
[https://github.com/bitly/dablooms/pull/19](https://github.com/bitly/dablooms/pull/19)

(But I don't recommend you use murmur anymore:
[https://emboss.github.io/blog/2012/12/14/breaking-murmur-
has...](https://emboss.github.io/blog/2012/12/14/breaking-murmur-hash-
flooding-dos-reloaded/) (although tbh I could be wrong on this one, not an
expert))

(Shameless plug for my bloom filter tutorial
[https://llimllib.github.io/bloomfilter-
tutorial/](https://llimllib.github.io/bloomfilter-tutorial/) )

~~~
loeg
> But I don't recommend you use murmur anymore

I think xxHash was/is the fastest good non-crypto hash, and now SeaHash may be
best. Although I'd like to see a bit more data on that (small keys? large
keys? benchmarking methodology) than SeaHash's author is providing.

~~~
llimllib
yeah I should look into that more. aapleby seems to have given some pretty
good arguments against SeaHash in the previous discussion:
[https://news.ycombinator.com/item?id=13058652](https://news.ycombinator.com/item?id=13058652)

~~~
Null-Set
That argument depends on the initial values being the same, you can easily
make sure they are not.

------
wyldfire
> SeaHash has mathematically provable statistical guarantees

I'm all for proofs but would love to see an empirical head-to-head against
metroHash. Is it as good or better output distribution?

~~~
ComputerGuru
Should be easy to plug it into smhasher.

~~~
eutectic
According to the author, it passes.

------
0x6c6f6c
> there is a major difference between cryptographic and non-cryptographic hash
> functions. SeaHash is not cryptographic

Is this in reference to most used hash functions not leaving any way to trace
the contents of what was hashed, or not being able to reliably reconstruct
contents to generate a certain hash? Or something else entirely?

~~~
rudolf0
The latter. Even a very simple hash algorithm will almost always make it
impossible to get the original contents back. You'd have to write an
intentionally pathological algorithm to achieve that.

~~~
xorxornop
Well, yeah, unless the input is smaller than the digest, hashing is
effectively a very lossy compression!

------
notheguyouthink
I've been using blake2(b) for file hashing in some content addressable db
stuff. What might be a scenario where i would choose Seahash over Blake2?

My main concern was speed and assurance that i would not see collisions.
Beyond that, i am clearly naive on the subject.

~~~
dom0
The main question, aside from performance, is...

\- Is there an (abstract) attack model? (Assuming that you have one if you
ought to)

\- Then: Can an attacker insert collisions into the DB, and is that
problematic?

\- Then: A non-cryptographic hash might be much easier to "reverse",
especially for short inputs. Is that problematic?

If none of these are problematic you probably don't need a cryptographic hash.

Regarding performance: BLAKE2b on a Haswell gives you, in a "naive", pure C
implementation (compiled to pure, non-vectorized AMD64 assembly), about 230
MB/s / GHz. (Referring to
[https://github.com/borgbackup/borg/issues/45#issuecomment-22...](https://github.com/borgbackup/borg/issues/45#issuecomment-221234832)
), ie. something like 850-1000 MB/s on a desktop SKU. There are
implementations that are around 10-30 % faster than that.

AFAIK all these newer n-c hash functions that popped up in the last couple
years perform (on desktop SKUs) in the area of beyond ~10 GB/s.

~~~
leeoniya
[https://github.com/ticki/tfs/issues/5#issuecomment-266031657](https://github.com/ticki/tfs/issues/5#issuecomment-266031657)

~~~
ticki_
SeaHash is obviously not cryptographic (nor is SipHash), but I hope it is a
secure PRF (i.e. the keys cannot be extracted), and this was the best attack I
was able to construct. Still, it isn't a practical attack, but I suppose it is
possible to improve.

Note that I am not a cryptographer, and my only piece of advice is: For the
sake of god, don't use hash functions not designed for cryptographic security,
if you need cryptographic security. It's that simple.

~~~
tptacek
SipHash is a cryptographic hash with a pretty good pedigree. The distinction
between SipHash and Blake2 is more subtle than "cryptographic vs not".

~~~
ticki_
No, it's not a cryptographic hash function. It's a MAC function.

The paper clearly states that it is not collision resistant.

~~~
kibwen
I believe that you and tptacek may be using differing definitions of
"cryptographic", because he tends to knows what he's talking about when it
comes to cryptography (e.g.
[https://gist.github.com/tqbf/be58d2d39690c3b366ad](https://gist.github.com/tqbf/be58d2d39690c3b366ad)).

~~~
ticki_
Oh, well.

What I think of as "cryptographic hash function" is a function resistent to
pre-image attack, second pre-image attack, and collision generation.

Neither of those are satisfied by SipHash, and can thus not classify as a
cryptographic hash function by the normal definition.

~~~
tptacek
It's a cryptographic hash function, but not one suitable for all of the same
applications as SHA2. It has security characteristics that other hash-table
hashes (for instance) lack.

~~~
ticki_
If you know the key, it is as weak as it gets (as the paper notes too, you can
construct collisions easily if the key is known), so I disagree.

~~~
tptacek
Honestly, I really don't care about this debate, except to the extent that
it's about whether SipHash "isn't cryptographic and therefore you might as
well use CityHash or SeaHash", which just isn't true.

~~~
ticki_
I agree. SipHash is certainly strong if you don't know the key.

------
eutectic
Previous discussion:
[https://news.ycombinator.com/item?id=13057797](https://news.ycombinator.com/item?id=13057797)

------
_RPM
How does this compare to murmurhash3?

~~~
valarauca1
Seahash is about 3x faster then murmur3

[https://docs.rs/seahash/3.0.3/seahash/](https://docs.rs/seahash/3.0.3/seahash/)

