

CityHash, a family of hash functions for strings. - atdt
https://code.google.com/p/cityhash/source/browse/trunk/README

======
steeve
You know, everytime I see articles about CityHash/MurmurHash I always wonder
why more people don't know about xxHash, which is 3 times faster than
MurmurHash.

<http://code.google.com/p/xxhash/>

[http://fastcompression.blogspot.fr/2012/04/selecting-
checksu...](http://fastcompression.blogspot.fr/2012/04/selecting-checksum-
algorithm.html)

On a side note, same goes with LZ4 being twice as fast as Speedy (from the
same author).

<http://code.google.com/p/lz4/>

~~~
tryp
I did a quick skim of the source for xxHash and CityHash. They appear to rely
heavily on 32- and 64-bit multiply operations, and are likely "fast" to the
extent that these operations are vectorized. The benchmarks I noticed are all
run on recent x86_64 processors with big vector units and caches.

It would be interesting to see how they stack up on other platforms like ARM
with and without NEON, PPC with and without Altivec, and MIPS. Without that
data, the algorithms look pretty tailored for SSE3, and so I'd hesitate to
build them into any application or protocol that needs to be portable or
implemented in embedded devices.

------
ComputerGuru
Previously on HN with some good discussions:

<http://news.ycombinator.com/item?id=3521551>

<http://news.ycombinator.com/item?id=2434547>

Lots of comparison to MurmurHash3, which is my preferred hashing function for
generic (read: non-string data). I have since switched to CityHash for string
hashing.

------
shin_lao
The code looks pretty big, I'm afraid this will affect negatively the
instruction cache.

~~~
unwind
The announcement claims:

 _On a single core of a 2.67GHz Intel Xeon X5550, CityHashCrc256 peaks at
about 5 to 5.5 bytes/cycle._

That is pretty impressive, I don't think there's room for a lot of cache
missing in that performance envelope.

~~~
kevingadd
You have to understand that you can't measure the impact of an algorithm on
the icache by just benchmarking it in a loop. If the algorithm being
benchmarked fits into icache, it'll perform great in that loop, but when you
drop it into your application it could blow everything out of the icache each
time you call it and degrade the performance of the rest of your application.
It'll also perform worse in your application since it's not remaining in
icache either.

