Hacker News new | comments | show | ask | jobs | submit login
CityHash, a family of hash functions for strings. (code.google.com)
23 points by atdt on Nov 27, 2012 | hide | past | web | favorite | 7 comments

You know, everytime I see articles about CityHash/MurmurHash I always wonder why more people don't know about xxHash, which is 3 times faster than MurmurHash.



On a side note, same goes with LZ4 being twice as fast as Speedy (from the same author).


I did a quick skim of the source for xxHash and CityHash. They appear to rely heavily on 32- and 64-bit multiply operations, and are likely "fast" to the extent that these operations are vectorized. The benchmarks I noticed are all run on recent x86_64 processors with big vector units and caches.

It would be interesting to see how they stack up on other platforms like ARM with and without NEON, PPC with and without Altivec, and MIPS. Without that data, the algorithms look pretty tailored for SSE3, and so I'd hesitate to build them into any application or protocol that needs to be portable or implemented in embedded devices.

Previously on HN with some good discussions:



Lots of comparison to MurmurHash3, which is my preferred hashing function for generic (read: non-string data). I have since switched to CityHash for string hashing.

The code looks pretty big, I'm afraid this will affect negatively the instruction cache.

Actually, it's rather ironic in that past a certain point, in order to optimize for cache size your lines of code will grow to a huge amount.

Case in point: Judy arrays [0] [1]. That is some of the most heavily-optimized code I have ever seen, optimized through-and-through for each and every possible CPU-related factor including the various cache levels, instruction pipeline optimizations, and more. Yet the codebase is positively gargantuan compared to any other implementation of a dynamic array or hash table, mainly as a result of these optimizations.

Don't be so quick to judge by LoC.

0: http://judy.sourceforge.net/ 1: http://linux.die.net/man/3/judy

The announcement claims:

On a single core of a 2.67GHz Intel Xeon X5550, CityHashCrc256 peaks at about 5 to 5.5 bytes/cycle.

That is pretty impressive, I don't think there's room for a lot of cache missing in that performance envelope.

You have to understand that you can't measure the impact of an algorithm on the icache by just benchmarking it in a loop. If the algorithm being benchmarked fits into icache, it'll perform great in that loop, but when you drop it into your application it could blow everything out of the icache each time you call it and degrade the performance of the rest of your application. It'll also perform worse in your application since it's not remaining in icache either.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact