
HighwayHash: Strong hashing at 0.3 clock cycles per byte - jsnell
https://github.com/google/highwayhash/
======
nkurz
Are there any better instructions for how to build this? I'm interested in it,
but not having any luck with Bazel. First I failed with a permissions issue
([https://github.com/bazelbuild/bazel/issues/739](https://github.com/bazelbuild/bazel/issues/739)),
then because I didn't have a "WORKSPACE" file, and now it gives me an
unhelpful error message:

    
    
      nate@skylake:~/git/highwayhash$ bazel build :all -c opt --copt=-mavx2
      .....
      ERROR: /home/nate/git/highwayhash/BUILD:3:1: unexpected   
      keyword 'default_hdrs_check' in call to package(*, 
      default_deprecation = ?, distribs = ?, licenses = ?, 
      default_testonly = ?, default_visibility = ?, features = ?, 
      default_compatible_with = ?, default_restricted_to = ?).
      ERROR: no such package '': Package '' contains errors.

~~~
amock
I'm guessing it was built and tested internally at Google and no one tested it
with bazel. It also depends on smasher, which I believe should be declared in
the WORKSPACE file. So unless you want to manually build it you'll have to
wait for someone to fix the build.

~~~
nkurz
I went ahead and created a Makefile:

    
    
      CXXFLAGS := -g -std=c++11 -Wall -O3 -march=native -Wno-sign-compare
      OBJS := highway_tree_hash.o \
      	scalar_highway_tree_hash.o \
      	scalar_sip_tree_hash.o \
      	sip_hash.o \
      	sip_tree_hash.o
      MAIN := sip_hash_main
      all: $(OBJS)
      	$(CXX) $(CXXFLAGS) $(OBJS) $(MAIN).cc -o $(MAIN)
      clean:
      	rm -f $(OBJS) $(MAIN)
    

Then I commented out two lines in "sip_hash_main.cc":

    
    
       // #include "smhasher.h"
       // RunAllTests(argc, argv);
    

Then "make" and run:

    
    
      nate@skylake:~/git/highwayhash$ ./sip_hash_main
      ScalarSipTreeHash ...	    GBps=2.44  c/b=1.44
      ScalarHighwayTreeHash ... GBps=1.16  c/b=3.02
      SipHash ...               GBps=1.54  c/b=2.27
      SipTreeHash ...           GBps=4.56  c/b=0.77
      HighwayTreeHash ...       GBps=11.49 c/b=0.30
      ...
    

The inner loop for HighwayTreeHash looks like this:

    
    
       0.89 │ 60:┌─→vpor   %ymm5,%ymm0,%ymm0                                                            
       4.47 │    │  vpaddq (%rcx),%ymm1,%ymm1                                                           
       6.04 │    │  add    $0x4,%rax                                                                    
       0.22 │    │  add    $0x20,%rcx                                                                   
       2.24 │    │  cmp    %rax,%rdi                                                                    
       3.13 │    │  vpsrlq $0x20,%ymm1,%ymm2                                                            
       3.36 │    │  vpmulu %ymm1,%ymm0,%ymm4                                                            
      13.20 │    │  vpshuf %ymm3,%ymm4,%ymm4                                                            
      11.19 │    │  vpmulu %ymm2,%ymm0,%ymm2                                                            
            │    │  vpaddq %ymm1,%ymm2,%ymm1                                                            
      31.32 │    │  vpaddq %ymm4,%ymm0,%ymm0                                                            
       7.61 │    └──ja     60

~~~
forgotpwtomain
Do you know if there is a reference for a comparative number of collisions
anywhere e.g.
([http://programmers.stackexchange.com/questions/49550/which-h...](http://programmers.stackexchange.com/questions/49550/which-
hashing-algorithm-is-best-for-uniqueness-and-speed)) ?

~~~
janwas
With a 64-bit hash, you shouldn't be seeing collisions at all on small
datasets. A test such as smhasher can verify this.

------
rurban
Oh my, again the folks who think that a better hash function saves you from
DDOS attacks on collision resolution with linked lists. This is pure nonsense!
Get over it please. Hashes for hash tables need to be small and fast, that's
all.

First, from any hash only a few bits are use for a hash table lookup,
typically between 5-8. It depends on the size of the table. You can make your
hash function long and strong, but only the last few bits count, and this does
not help you at all against DDOS.

Second, against DDOS only uhash, or using no linked lists or arrays with
O(n/2) lookup costs help. A random seed is good, but can be detected. E.g. a
timeout as used by djb in his nameserver does help better than a stronger hash
function. Or using a hash table as in glibc, gcc or the linux kernel. siphash
or HighwayHash does not help at all. python and perl both fell into this trap.

But I appreciate the effort to create a fast metro like hash function, without
resorting to the internal crc32 builtin. Will be included in smhasher soon.

Summary:
[https://github.com/rurban/smhasher#summary](https://github.com/rurban/smhasher#summary)

~~~
jbapple
> First, from any hash only a few bits are use for a hash table lookup,
> typically between 5-8.

This is not my experience. Frequently, hash tables will have a size that is a
power of 2, and the address into the table will be all of the low order bits.
A table with 1 million slots will use 20 bits of hash.

> Second, against DDOS only uhash, or using no linked lists or arrays with
> O(n/2) lookup costs help.

uhash is just one example of hash function designs based on almost-universal
hashing. There are many others, including VHASH and Poly1305. Is HighwayHash
almost universal?

~~~
GFK_of_xmaspast
> Frequently, hash tables will have a size that is a power of 2 ...

It looked like that was what rurban was arguing? Pull out the low 5-8 bits,
get a table of size 32-256?

~~~
rurban
Exactly. You attack an existing table in an application via POST request of
colliding keys. This table has mostly 10-100 elements, and you need to find a
collision of say ~10.000-20.000 elements to have an effect. So it starts with
~5 bits on the avg. case and gets into 13-14 bits. This is all too trivial and
doesn't need a "secure" hash function, which is not secure at all and can be
precomputed. It's plain security theatre by rookies. The "security" may come
from the collision resolution or from the seed if you insist on cache-
unfriendly linked lists or better strategies, but never from a hash function
per se. With 5-14 bits you have no chance at any "security", which starts at
256 bits.

And on top of that I guess nobody ever heard of "Cache-Conscious Collision
Resolution in String Hash Tables" by Nikolas Askitis and Justin Zobel, who
measured that a linear search in an array for collisions outperforms a linked
lists by 2-4x, even if that can do move-to-front much easier.

People still do worry about the hash function, where they do should worry
about their hash tables instead, which can easily be made 4x faster and not 2x
slower.

~~~
jbapple
> This is all too trivial and doesn't need a "secure" hash function, which is
> not secure at all and can be precomputed. It's plain security theatre by
> rookies. The "security" may come from the collision resolution or from the
> seed

Hash "functions" like UHASH, Poly1305, and SipHash are actually families of
hash functions, parameterized by seed.

> And on top of that I guess nobody ever heard of "Cache-Conscious Collision
> Resolution in String Hash Tables" by Nikolas Askitis and Justin Zobel, who
> measured that a linear search in an array for collisions outperforms a
> linked lists by 2-4x, even if that can do move-to-front much easier.

If an implementation is vulnerable to hash-flooding attacks because it doesn't
have a randomly-seeded almost-universal hash function, then changing the
collision mechanism from linked lists to arrays isn't going to fix the
problem.

Good collision resolution and good hash functions are both issues worth
addressing.

------
jbapple
Where can I read what they mean by "Strong" hashing for HighwayHash? Also, is
there a peer-reviewed paper on SipTreeHash? I am not a cryptographer, so I
wouldn't trust my own instincts in terms of evaluating its strength compared
to SipHash.

For non-cryptographic applications, CLHASH produces 64 bits of result and is
2.004/2^64 almost delta universal, at a cost of 0.16 cycles per byte on
Haswell and 0.10 cycles per byte on Skylake.

[https://github.com/lemire/StronglyUniversalStringHashing](https://github.com/lemire/StronglyUniversalStringHashing)

[http://arxiv.org/abs/1503.03465](http://arxiv.org/abs/1503.03465)

~~~
janwas
I hadn't seen CLHASH yet, thank you for mentioning it. Looks even faster but
only XOR universal (meaning XORed hashes are uniformly distributed, not the
hashes themselves). In particular, it fails smhasher's avalanche test (because
bit flips have predictable effects), and their proposed fix is more expensive
than a HighwayTreeHash round.

~~~
jbapple
> their proposed fix is more expensive than a HighwayTreeHash round.

For longish-strings, most of the cycles are in the preliminary rounds, so
appending whatever you want onto the end should add negligible cost. This
includes bitmixing (their proposed smhasher fix) or a HighwayTreeHash round.

~~~
janwas
Agreed :) We (and SipHash developers) do care about short strings, though.
Scripting language hash table inputs are typically around 10 bytes, so we
can't ignore finalization overhead.

------
AYBABTME
I'm a newb at this, but how does this compare to SHA3/blake2 in term of
applications? Is it more/less/same secure?

Or better: how can I educate myself better on the proper usage of certain
hashing functions?

~~~
qrmn
This is a hash function (as used in hash tables/Bloom filters and so forth),
but it is not a cryptographic hash function. It is designed to be very fast
for things like table lookups, but it is not designed to be strongly resistant
to preimages, second-preimages, or collisions. Don't use it for security-
relevant purposes.

I think BLAKE2 is probably the fastest and best cryptographic hash function
you're likely to see for the foreseeable future?

~~~
AYBABTME
Thanks, that's basically what I was looking for. I have apps that are
bottlenecked by blake2, so finding a faster cryptographic hash function would
be pretty interesting to me.

~~~
haneefmubarak
If BLAKE2 (I assume you're using blake2b) is your bottleneck, perhaps you
should consider the blake2bp (four way parallelism) or the blake2sp (eight way
parallelism) variants. If even those are too slow for you (lots of continuous
data and you have multiple cores available), then you might be able to try
using tree hashing on top of one of the mentioned BLAKE2 variants.

I'm curious though, what are you working on that saturates ~1 GiB/s/core?

~~~
AYBABTME
A content addressable distributed cache, where the disk + network IO capacity
is ~40Gb/s. A bit similar to IPFS.

~~~
janwas
Interesting. How big do you need the hashes to be?

