Hacker News new | past | comments | ask | show | jobs | submit login

To our surprise, we found a lack of published, well-optimized, large-data hash functions. Most hash work seems to focus on small input sizes (for things like dictionary lookup) or on cryptographic quality. We wanted the fastest possible hash that would be collision-free in practice (like SHA-1 was), and we didn’t need any cryptograhic [sic] security.

This is an important passage that could be explained in more detail. I can look up a list of non-cryptographic hash functions, nine of which support 128-bit output:

https://en.wikipedia.org/wiki/List_of_hash_functions#Non-cry...

Given that the claim is that Meow Hash is fast, where's the benchmarking to prove it's faster than the alternatives?




16 bytes per cycle is what they quote. Imo, that's pretty impressive.

For comparison, xxhash is the closest thing to this that I'm aware of and clocks in at around 1.7 bytes per cycle using the official implementation.


There are variants of CityHash that use the x86 CRC intrinsics (CityHashCrc128 and CityHashCrc256). They quote 4.3-5.5 bytes/cycle.[0]

(Better than xxhash on x86, worse than Meowhash -- assuming the bytes/cycle numbers can be compared apples-to-apples, which is a big if.)

[0]: https://github.com/google/cityhash


I think the big issue with bytes/cycle comparisons right now is that once you get to multiple bytes per cycle, most people are going to be limited on I/O, even to RAM probably. I assume Meow Hash was designed to run on high end workstations, but if you are developing software for end users my guess is that Meow Hash will end up being not significantly faster than xxHash.

Of course, if you are running a lot of hashes in parallel, I bet things get more interesting. (Example: would AES-NI instructions scale in parallel as fast as ordinary bitwise/vector instructions?)


ram throughput isn't that different on a cheap to high end pc.


When I say high end workstation I really do mean workstation and not gaming PC. Most people do not have quad channel memory on their home PCs but it is in fact a pretty good step up in throughput.


That is also for x86-64, I'd be interested in learning what the performance would look like for ARMv8-A


goddamn it, I just spent a fair amount of time getting xxhash to work.. now I need to reevaluate lol


xxHash is still very fast. In fact, I am sure that some implementations of xxHash are faster than that. In fact, after Googling it, I can see that xxHash64 is quoted as having 3.1 bytes per cycle, which is not bad.

xxHash is also very simple, and can be written easily in high or low level languages, which I like. Someone even implemented it in C++ templates I believe. I don't know how Meow Hash is but I imagine the default implementation heavily exploits AMD64 CPUs and based on the article probably makes use of AES acceleration.

Basically, if you do this, at least be sure you're not bottlenecked on I/O already :)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: