From the readme: "On a single core of a 2.67GHz Intel Xeon X5550, CityHashCrc256 peaks at about 6.09 bytes/cycle. The other CityHashCrc functions are wrappers around CityHashCrc256 and should have similar performance on long strings. CityHash128 peaks at about 4.31 bytes/cycle."
I think it's fascinating that they're able to determine the number of bytes their code processes during a single cycle. Does anyone know where to start looking to learn more about how to do that? It reminds me of a comment I think I read on here a month or two ago about the number of pixels that could be processed per cycle with an algorithm implemented on a 500MHz FPGA.
People have been suggesting the time stamp counter, but that's actually not ideal, because it has a lot of overhead. On my desktop at work (a very beefy Intel Xeon) it adds about 30 cpu cycles. It also drains all the pipelines.
For a microbenchmark like this, I find it's usually better to call it in a loop 1,000,000 times, and compute the total time. That's often a "best case" scenario, where e.g. the cpu doesn't need to decode the instructions every iteration because they fit in appropriate cache. But it avoids the overhead of the timestamp counter.
The question is how representative this measurement really is. Performance of a hash function depends heavily on memory caches. This looks like a test with many iteration where all data are as close to the processor as possible - hardly a real-life scenario.
This is meant for high-speed hashing, so the comparison is this algorithm vs. another high-speed hash algorithm. In that case, how does memory cache affect the comparison between the two algorithms?
That is, if the data isn't "as close to the processor" then won't another hash algorithm in this class be affected the same way? The only thing is that it would deemphasize the performance differences, but it wouldn't change that one is faster than another, would it?
Note that this chip, when fitted with DDR3-1333, has a theoretical peak system bandwidth of 32 GB/s (achievable peak will be somewhat lower). This hash, running on a single core, claims to process 16.2 GB/s. There are 4 cores (8 threads) on the X5550, so the hash is already limited by memory bandwidth.
I think it ends up being simple arithmetic once you access your processor's cycle counter. For example, on x86 you used to do it via http://en.wikipedia.org/wiki/Time_Stamp_Counter and that may still work given the right circumstances.
Relevant comment on reddit by the author of MurmurHash:
Hi, I'm the author of MurmurHash (let me know if verification is needed) and helped review the design of CityHash (I work at Google now).
CityHash has higher throughput and is more complex, MurmurHash has lower latency for very short keys and is relatively simple.
Both are more than adequate for any sort of hashing needs you might have and should be virtually identical to a random oracle for any keysets you might throw at them.
I'm happy to answer questions if you have any.
Is there an article that describes the algorithm in more abstract terms?
The C++ code is quite readable but I'd say that it contains too many magic numbers. Code is no substitute for a clear description given by the authors of the algorithm, its tradeoffs and design choices.
I wonder: isn't this the kind of thing that's better handled by creating a function, and maybe explicitly telling the compiler to inline it? I mean, you can do this (from here: http://gcc.gnu.org/onlinedocs/gcc/Inline.html ):
or you could simply let your optimizer take care of it. It may be faster to not inline it in some circumstances, but you've gone out of your way to prevent it from being able to make that decision, and to make your code harder to read and more prone to problems due to macro expansion.
Why? Is it habit? Or is there a good reason? (currently. historical reasons are not good reasons to continue doing things.)
People who write hash functions are both exceedingly anal about performance concerns and generally interested in having their work compile with any and all reasonable C compilers (yes, even those that aren't gcc) and still have at least good performance.
They also tend to be things that are not designed in code: they are designed in math and then translated to code as the last step, preferably in a way that is able to remain static forever (they should not need maintenance by an engineer, even/especially to update to support random new C compiler builds).
(Some of my friends entered the recent SHA3 NIST hash competition and lasted long enough to have random other mathematicians scrutinize, improve, and finally defeat their math. http://en.wikipedia.org/wiki/Spectral_Hash)
The most important thing to take away from this is "These functions aren’t suitable for cryptography". The requirements for a cryptographically strong hash function are much stronger than one just used for hash-table performance. That's not to say that you shouldn't use it, rather be careful that there's no scope creep.
The page outright says that it is not a cryptographic hash, but the Hash Quality section doesn't expound on why (except for a minor flaw on short strings for one of the functions). Does anyone know what the expected collision rate for this hash is?
In other words, I don't care if the hash doesn't avalanche perfectly (half changed for one bit of input change; a requirement for cryptographic hashing so changes are "obvious") as long as the collision rate is ~SHA256. Or, if I should care, explanations would be wonderful.
True. But in the case I'm evaluating, that isn't my worry. All I care about is the expected number of collisions over some extremely large number of hashes.
Exactly, it's be interesting to have the average number of hashes needed before you hit a collision.
Crypto hashes are made so that collisions are as rare as possible.
Sometimes, you need something very, very fast, with as little collisions are possible (aka it "never" happen, like crypto hashes) yet those are not hashes of passwords and the like, so if it would happen it wouldn't be a security issue.
For that reason, this question is in fact interesting. Not that I have the answer to that :-)
That's not the definition of a cryptographic hash. One of the four requirements of a function being a cryptographic hash function is that "it is infeasible to find two different messages with the same hash."
I understand. I worded my question poorly- what I was looking for was what they found that invalidated it as a cryptographic hash, so I could figure out whether it would fit my (non-crypto) requirements.
Sure, its faster, but is it well distributed and has no known immediate flaws? I understand its not cryptographically secure, but does it perform as well as Murmurhash3 in those areas?
Symmetric keys?? Which part of "not suitable for cryptographic purposes" did you not understand?
Neither FNV, Jenkins, Murmur nor this new City Hash are supposed to be used for cryptography.
Besides, the way one generally requires hash functions in cryptography, performance is the least of your worries (unless you're trying to crack), so there's no reason whatsoever not to go with a proper cryptographically strong hash such as SHA256.
That said, Jenkins has worse avalanche behaviour than FNV--though I'm sure with a few tweaks that could be fixed, non-crypto hash functions aren't exactly rocket science (check out where FNV's constants came from)--but as it is, Jenkins is more of an academic example.
Spooky (also developed by Bob Jenkins) seems better. Oddly enough Spooky claims 3 bytes per cycle, but the author cannot confirm CityHash's 6 bytes per cycle ( http://burtleburtle.net/bob/hash/spooky.html ). Though whatever it is, both are obviously blazingly fast, picking one over the other because of performance reasons should only be done once you've established that the speed of the hashing function is indeed a bottleneck, and you can compare the performance on the specific hardware it's supposed to run on.
> Symmetric keys?? Which part of "not suitable for cryptographic purposes" did you not understand?
I don't mean symmetric in the cryptographic sense. I mean as in "ABBA" mapping to the same hash value as "YZZY". I can't remember the specific case, but I think it was with 8 byte keys, and it might be like the above or it might be more like "ABAB" and "YZYZ". Either way, it caused some pretty catastrophic failures in one case due to the unusually high probability of such keys existing in our dataset.
I've done tests with pretty huge dictionaries on both Jenkins & FNV, and despite FNV's enviable characteristics, empirically Jenkins did much better at avoiding collisions. Sometimes the stuff that looks good on paper has unfortunate characteristics with real world data sets (which typically aren't that random).
I think it's fascinating that they're able to determine the number of bytes their code processes during a single cycle. Does anyone know where to start looking to learn more about how to do that? It reminds me of a comment I think I read on here a month or two ago about the number of pixels that could be processed per cycle with an algorithm implemented on a 500MHz FPGA.
Edit: Thanks for the answers!