> You can get higher quality and throughput for short strings from a hash that uses AES instructions or wyhash, or for long strings maybe from xxhash.
What would you consider a short string? The latency of AESENC is 5 if I remember correctly. In that time you can FNV-1a a "short" string.
For longer strings I agree that WyHash and xxHash is better. I have a benchmark of both further down on FastHash's github page[1]. Of course, that is only a measure of speed, but both are high quality as well[2].
From the top of my head, t1ha, falkhash, meowhash and metrohash are using AES-NI and none of them are particularly fast on short inputs, and at least two of them have severe issues, despite guarding against lots of vulnerabilities, which your construction does not.
You are right regarding bulk-keying with SIMD outperforming any kind of scalar hashing one-at-the-time. I've build a database that does just that (requires keys to be available in bulk - like when doing bulk inserts) using SIMD, and the latency becomes negligible.
When I recently went shopping for fast hashes for short strings, I settled on wyhash, but ahash[0] seemed like it would have been better if I had bothered to port it from Rust.
> In that time you can FNV-1a a "short" string.
Not if you read it one byte at a time like in TFA!
It looks like the best FNV for short strings in smhasher[1] is comparably fast to ahash[2] on short strings.
> From the top of my head, t1ha, falkhash, meowhash and metrohash are using AES-NI and none of them are particularly fast on short inputs, and at least two of them have severe issues, despite guarding against lots of vulnerabilities, which your construction does not.
For issues like reading past the ends of buffers and discarding the extra values, it would be nice if programmers could arrange to have buffers that could be used this way. I posted a thing for hashing strings of a fixed length though, to compare with the thing for hashing strings of a fixed length in TFA.
Edit: It looks like your fastest FNV[3] has some issues where it repeatedly increments `p` but reads relative to `uPtr` which is not being incremented. It also seems to expect C# to have some unusual properties like (*uPtr + 4) being parsed as (*(uPtr + 4)) and being equivalent to uPtr[1] when uPtr is a pointer to a type that's 4 bytes wide (I don't know if C# actually has these properties). This may explain why your fastest FNV is much, much faster than the fastest FNV at smhasher.
FNV1a-YT also employs a good trick to avoid the issue of long data dependency chains made of multiplies: it just has three of them and only does a multiply for half of the input words in each of those. While this is fine, I'm not sure if it's really FNV-1a anymore.
> It looks like your fastest FNV[3] has some issues where it repeatedly increments `p` but reads relative to `uPtr` which is not being incremented.
That is entirely possible. Some hashes like the FNV-YoshimitsuTRIAD variant is not fit for real-world use and lack test-vectors to ensure that ports like mine is correct. I'm slowly but surely working my way through the C/C++/Rust/Go implementations out there and verifying them against my implementation to ensure it is correct. Luckily, a few authors provide extensive test vectors[1]
Thanks for the feedback, it is very much appreciated.
> FNV1a-YT also employs a good trick [...] While this is fine, I'm not sure if it's really FNV-1a anymore.
It was made (by Sanmayce) to optimize for instruction-level pipelining, and use the fact that modern CPUs have multiple execution ports. But due to those changes, it is not compatible with FNV1a anymore.
The trick of reading in stripes is employed by many of the fastest hashes. It is kinda funny to see how one author prefers a switch case over for loops, where others prefer while loops. The differences can sometimes have a big impact on what optimizations the compiler decides to use.
What would you consider a short string? The latency of AESENC is 5 if I remember correctly. In that time you can FNV-1a a "short" string.
For longer strings I agree that WyHash and xxHash is better. I have a benchmark of both further down on FastHash's github page[1]. Of course, that is only a measure of speed, but both are high quality as well[2].
From the top of my head, t1ha, falkhash, meowhash and metrohash are using AES-NI and none of them are particularly fast on short inputs, and at least two of them have severe issues, despite guarding against lots of vulnerabilities, which your construction does not.
You are right regarding bulk-keying with SIMD outperforming any kind of scalar hashing one-at-the-time. I've build a database that does just that (requires keys to be available in bulk - like when doing bulk inserts) using SIMD, and the latency becomes negligible.
[1] https://github.com/Genbox/FastHash
[2] https://github.com/rurban/smhasher