HighwayHash: Strong hashing at 0.3 clock cycles per byte

nkurz · on March 3, 2016

Are there any better instructions for how to build this? I'm interested in it, but not having any luck with Bazel. First I failed with a permissions issue (https://github.com/bazelbuild/bazel/issues/739), then because I didn't have a "WORKSPACE" file, and now it gives me an unhelpful error message:

  nate@skylake:~/git/highwayhash$ bazel build :all -c opt --copt=-mavx2
  .....
  ERROR: /home/nate/git/highwayhash/BUILD:3:1: unexpected   
  keyword 'default_hdrs_check' in call to package(*, 
  default_deprecation = ?, distribs = ?, licenses = ?, 
  default_testonly = ?, default_visibility = ?, features = ?, 
  default_compatible_with = ?, default_restricted_to = ?).
  ERROR: no such package '': Package '' contains errors.

amock · on March 3, 2016

I'm guessing it was built and tested internally at Google and no one tested it with bazel. It also depends on smasher, which I believe should be declared in the WORKSPACE file. So unless you want to manually build it you'll have to wait for someone to fix the build.

nkurz · on March 3, 2016

I went ahead and created a Makefile:

  CXXFLAGS := -g -std=c++11 -Wall -O3 -march=native -Wno-sign-compare
  OBJS := highway_tree_hash.o \
  	scalar_highway_tree_hash.o \
  	scalar_sip_tree_hash.o \
  	sip_hash.o \
  	sip_tree_hash.o
  MAIN := sip_hash_main
  all: $(OBJS)
  	$(CXX) $(CXXFLAGS) $(OBJS) $(MAIN).cc -o $(MAIN)
  clean:
  	rm -f $(OBJS) $(MAIN)

Then I commented out two lines in "sip_hash_main.cc":

   // #include "smhasher.h"
   // RunAllTests(argc, argv);

Then "make" and run:

  nate@skylake:~/git/highwayhash$ ./sip_hash_main
  ScalarSipTreeHash ...	    GBps=2.44  c/b=1.44
  ScalarHighwayTreeHash ... GBps=1.16  c/b=3.02
  SipHash ...               GBps=1.54  c/b=2.27
  SipTreeHash ...           GBps=4.56  c/b=0.77
  HighwayTreeHash ...       GBps=11.49 c/b=0.30
  ...

The inner loop for HighwayTreeHash looks like this:

   0.89 │ 60:┌─→vpor   %ymm5,%ymm0,%ymm0                                                            
   4.47 │    │  vpaddq (%rcx),%ymm1,%ymm1                                                           
   6.04 │    │  add    $0x4,%rax                                                                    
   0.22 │    │  add    $0x20,%rcx                                                                   
   2.24 │    │  cmp    %rax,%rdi                                                                    
   3.13 │    │  vpsrlq $0x20,%ymm1,%ymm2                                                            
   3.36 │    │  vpmulu %ymm1,%ymm0,%ymm4                                                            
  13.20 │    │  vpshuf %ymm3,%ymm4,%ymm4                                                            
  11.19 │    │  vpmulu %ymm2,%ymm0,%ymm2                                                            
        │    │  vpaddq %ymm1,%ymm2,%ymm1                                                            
  31.32 │    │  vpaddq %ymm4,%ymm0,%ymm0                                                            
   7.61 │    └──ja     60

forgotpwtomain · on March 3, 2016

Do you know if there is a reference for a comparative number of collisions anywhere e.g. (http://programmers.stackexchange.com/questions/49550/which-h...) ?

janwas · on March 3, 2016

With a 64-bit hash, you shouldn't be seeing collisions at all on small datasets. A test such as smhasher can verify this.

janwas · on March 3, 2016

Thanks for mentioning, I have removed the smhasher dependency. We have an enhanced version of it which cannot be yet be released due to a dependency on threading libraries.

janwas · on March 3, 2016

Thank you for mentioning this. I have added a WORKSPACE file and removed the default_hdrs_check (apparently only available in Google Blaze).

nkurz · on March 3, 2016

Great, the new version now works for me with Bazel. I pushed you an improved Makefile and some fixes for conversion warnings on Github. Up to you if you want to include the Makefile, but I think it would make it much easier for outsiders to test your code. Thanks for providing this!

janwas · on March 4, 2016

You're welcome, thank you for suggesting these :) We do need a contributor agreement click-through first, please see CONTRIBUTING.

rurban · on March 3, 2016

Oh my, again the folks who think that a better hash function saves you from DDOS attacks on collision resolution with linked lists. This is pure nonsense! Get over it please. Hashes for hash tables need to be small and fast, that's all.

First, from any hash only a few bits are use for a hash table lookup, typically between 5-8. It depends on the size of the table. You can make your hash function long and strong, but only the last few bits count, and this does not help you at all against DDOS.

Second, against DDOS only uhash, or using no linked lists or arrays with O(n/2) lookup costs help. A random seed is good, but can be detected. E.g. a timeout as used by djb in his nameserver does help better than a stronger hash function. Or using a hash table as in glibc, gcc or the linux kernel. siphash or HighwayHash does not help at all. python and perl both fell into this trap.

But I appreciate the effort to create a fast metro like hash function, without resorting to the internal crc32 builtin. Will be included in smhasher soon.

Summary: https://github.com/rurban/smhasher#summary

jbapple · on March 3, 2016

> First, from any hash only a few bits are use for a hash table lookup, typically between 5-8.

This is not my experience. Frequently, hash tables will have a size that is a power of 2, and the address into the table will be all of the low order bits. A table with 1 million slots will use 20 bits of hash.

> Second, against DDOS only uhash, or using no linked lists or arrays with O(n/2) lookup costs help.

uhash is just one example of hash function designs based on almost-universal hashing. There are many others, including VHASH and Poly1305. Is HighwayHash almost universal?

GFK_of_xmaspast · on March 3, 2016

> Frequently, hash tables will have a size that is a power of 2 ...

It looked like that was what rurban was arguing? Pull out the low 5-8 bits, get a table of size 32-256?

rurban · on March 9, 2016

Exactly. You attack an existing table in an application via POST request of colliding keys. This table has mostly 10-100 elements, and you need to find a collision of say ~10.000-20.000 elements to have an effect. So it starts with ~5 bits on the avg. case and gets into 13-14 bits. This is all too trivial and doesn't need a "secure" hash function, which is not secure at all and can be precomputed. It's plain security theatre by rookies. The "security" may come from the collision resolution or from the seed if you insist on cache-unfriendly linked lists or better strategies, but never from a hash function per se. With 5-14 bits you have no chance at any "security", which starts at 256 bits.

And on top of that I guess nobody ever heard of "Cache-Conscious Collision Resolution in String Hash Tables" by Nikolas Askitis and Justin Zobel, who measured that a linear search in an array for collisions outperforms a linked lists by 2-4x, even if that can do move-to-front much easier.

People still do worry about the hash function, where they do should worry about their hash tables instead, which can easily be made 4x faster and not 2x slower.

jbapple · on March 12, 2016

> This is all too trivial and doesn't need a "secure" hash function, which is not secure at all and can be precomputed. It's plain security theatre by rookies. The "security" may come from the collision resolution or from the seed

Hash "functions" like UHASH, Poly1305, and SipHash are actually families of hash functions, parameterized by seed.

> And on top of that I guess nobody ever heard of "Cache-Conscious Collision Resolution in String Hash Tables" by Nikolas Askitis and Justin Zobel, who measured that a linear search in an array for collisions outperforms a linked lists by 2-4x, even if that can do move-to-front much easier.

If an implementation is vulnerable to hash-flooding attacks because it doesn't have a randomly-seeded almost-universal hash function, then changing the collision mechanism from linked lists to arrays isn't going to fix the problem.

Good collision resolution and good hash functions are both issues worth addressing.

KMag · on March 3, 2016

> A random seed is good, but can be detected.

and from the linked smasher page:

> You can attack every single hash function, even the best, if you detect the seed, ...

But the most efficient known way to recover the SipHash key is still bruit force.[0] Good luck making an average of 2^127 guesses.

[0] http://link.springer.com/chapter/10.1007/978-3-319-13051-4_1...

jbapple · on March 3, 2016

Where can I read what they mean by "Strong" hashing for HighwayHash? Also, is there a peer-reviewed paper on SipTreeHash? I am not a cryptographer, so I wouldn't trust my own instincts in terms of evaluating its strength compared to SipHash.

For non-cryptographic applications, CLHASH produces 64 bits of result and is 2.004/2^64 almost delta universal, at a cost of 0.16 cycles per byte on Haswell and 0.10 cycles per byte on Skylake.

https://github.com/lemire/StronglyUniversalStringHashing

http://arxiv.org/abs/1503.03465

janwas · on March 4, 2016

For the record, S. Gueron has some papers on similar tree hashes (http://dx.doi.org/10.4236/jis.2014.53010) and there's more theory in http://sponge.noekeon.org/TreeHashing.pdf .

janwas · on March 7, 2016

I hadn't seen CLHASH yet, thank you for mentioning it. Looks even faster but only XOR universal (meaning XORed hashes are uniformly distributed, not the hashes themselves). In particular, it fails smhasher's avalanche test (because bit flips have predictable effects), and their proposed fix is more expensive than a HighwayTreeHash round.

jbapple · on March 13, 2016

> their proposed fix is more expensive than a HighwayTreeHash round.

For longish-strings, most of the cycles are in the preliminary rounds, so appending whatever you want onto the end should add negligible cost. This includes bitmixing (their proposed smhasher fix) or a HighwayTreeHash round.

janwas · on March 15, 2016

Agreed :) We (and SipHash developers) do care about short strings, though. Scripting language hash table inputs are typically around 10 bytes, so we can't ignore finalization overhead.

AYBABTME · on March 3, 2016

I'm a newb at this, but how does this compare to SHA3/blake2 in term of applications? Is it more/less/same secure?

Or better: how can I educate myself better on the proper usage of certain hashing functions?

qrmn · on March 3, 2016

This is a hash function (as used in hash tables/Bloom filters and so forth), but it is not a cryptographic hash function. It is designed to be very fast for things like table lookups, but it is not designed to be strongly resistant to preimages, second-preimages, or collisions. Don't use it for security-relevant purposes.

I think BLAKE2 is probably the fastest and best cryptographic hash function you're likely to see for the foreseeable future?

AYBABTME · on March 3, 2016

Thanks, that's basically what I was looking for. I have apps that are bottlenecked by blake2, so finding a faster cryptographic hash function would be pretty interesting to me.

haneefmubarak · on March 3, 2016

If BLAKE2 (I assume you're using blake2b) is your bottleneck, perhaps you should consider the blake2bp (four way parallelism) or the blake2sp (eight way parallelism) variants. If even those are too slow for you (lots of continuous data and you have multiple cores available), then you might be able to try using tree hashing on top of one of the mentioned BLAKE2 variants.

I'm curious though, what are you working on that saturates ~1 GiB/s/core?

AYBABTME · on March 4, 2016

A content addressable distributed cache, where the disk + network IO capacity is ~40Gb/s. A bit similar to IPFS.

janwas · on March 7, 2016

Interesting. How big do you need the hashes to be?

panic · on March 3, 2016

This isn't a cryptographic hash function. It's meant for use in things like hash tables where speed is an important factor.

mulle_nat · on March 3, 2016

Practice, practice, practice :)