
SipHash and HalfSipHash Added to Linux Kernel - zx2c4
https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/plain/Documentation/siphash.txt
======
mrigor
[https://en.wikipedia.org/wiki/SipHash](https://en.wikipedia.org/wiki/SipHash)

~~~
zer0t3ch
As someone who's not at all versed in cryptography, (yet) anyone want to put
this in more laymen terms? Maybe compare it to an existing cryptographical-
method I might understand or at least know of?

~~~
majke
Hash tables are very important data structures in the computer science world.
Hash tables allow you to have amortized O(1) access cost to arbitrary elements
- like key/value (often called: map, dict).

In order to implement hash table one need to map key onto an index in the
table - this is done with a hash function: hash(string) ---> number.

Here's a problem. If an attacker knows the hash function, she can produce many
strings that will give the same number in return. This usually wasn't a
problem, but in the web world it is. It is possible to flood the server
(usually in python, ruby, perl) with such crafted requests that, for example,
all headers will end up with precisely the same hash value:
hash(any_given_header_in_request) ---> fixed value.

This is will result in hash table collision and is generally bad. Normal hash
functions can't solve this. This problem of maliciously creating hash
collisions is called "hash flooding".

Siphash is an attempt to solve the problem. It is more than a hash function -
it's a crypto PRF function and that gives you more guarantees than dumb hash
function. Most importantly it takes two values: a "string to hash" and a
"crypto key": siphash(string, crypto_key) --> number.

The idea is to generate this "crypto_key" randomly on each program execution,
to make sure the attacker can't predict it.

Crypto speaking hash functions may be reversible. There is nothing
guaranteeing that they are not. But Siphash is a PRF, and in crypto-speach
this means it's not reversible. If you can produce an efficient algorithm to
reverse Siphash - ie: given crypto key and hash value predict input string -
you can write a good paper and be famous.

~~~
koolba
> Siphash is an attempt to solve the problem. It is more than a hash function
> - it's a crypto PRF function and that gives you more guarantees than dumb
> hash function. _Most importantly it takes two values: a "string to hash" and
> a "crypto key": siphash(string, crypto_key) --> number._

That sounds suspiciously like an HMAC.

~~~
Arnt
The difference is in the detail.

An HMAC gives you something very nice, but it's of its choosing. If you want a
256-entry cache, the security properties of a 256-bit HMAC result is not a
good fit for the needs of a cache key.

If you want an infinitely big cache, an HMAC gives you zero collisions, but in
this case what you want is an 8-bit result that spreads nicely over the 256
values, doesn't let an attacker fill one bucket, and doesn't let an attacker
learn anything about the other users and their cache entries.

~~~
eternalban
> 8-bit result that spreads nicely over the 256 values, doesn't let an
> attacker fill one bucket, and doesn't let an attacker learn anything about
> the other users and their cache entries.

Willing to be educated, but won't I get that by just masking off 8 bits off of
say SHA-x?

[edit: h(s + secret-bits) & 0xff ]

~~~
Arnt
People do that. I've done it myself. But all the math about SHA's security
properties is about absence of collisions in the full result, not about
relative frequency in a small part of the result.

Maybe SHA-8 works fine. I don't know. I've never seen any real mathematical
investigation of that, and that's the point.

~~~
dchest
A secure cryptographic hash function's collision resistance should be
MIN(output_length/2, claimed_security). For example, SHA-256 collision
security is 128 bits with 256-bit output, but if you truncate output to 128
bits, collision resistance will be 64 bits.

------
nerdponx
_I have never come anywhere near developing a kernel in my life._

What's the purpose of including this in the kernel itself? Why couldn't
developers just use some kind of "libsiphash" instead?

~~~
mfukar
The primary users of siphash, as seen in lkml, are ipv[4|6] syncookies
(/net/ipv4/syncookies.c), and TCP sequence numbers (/net/core/secure_seq.c).

~~~
zx2c4
Hopefully not "primary" for long. These are just the first uses I saw to
immediately convert to siphash with the least amount of controversy to
actually get siphash in the kernel. Now that these have landed, other
developers can gradually start using it in more and more places. Who knows
what the "primary" use will be after 4.11.

------
rurban
Counter argument:
[http://perl11.org/blog/seed.html](http://perl11.org/blog/seed.html)

Now even the kernel devs are drinking the coolaid, verbatim copying wrong
claims from the SipHash authors.

~~~
wolf550e
Per that argument, AES is insecure because if I know the key, I can decrypt
your messages and even fake a message with valid CMAC authentication tag.

Siphash is secure in the same sense as AES, HMAC and a CSPRNG: seeing many
consecutive values does reveal the key and does not allow predicting the next
value.

~~~
rurban
Oh my. It's trivial to brute force SipHash to attack hash tables with a known
seed. It's part of my hash table flood test for cperl.

Timings:

    
    
       BITS SIZE       TIME
       8  - 255        0.003s
       9  - 511        0.006s
       10 - 1023       0.033s
       11 - 2047       0.12s
       12 - 4095       0.45s
       13 - 8191       1.82s
       14 - 16383      7.6s
       15 - 32767      31s
       16 - 65535      2m2s
       18 - 262143     4m3s 23770
    

18-30 bit: ~4m for an attackable subset (linear time). The typical size in a
kernel is 13bit.

The only SipHash security you get is seed hiding. Once you got it, it's
insecure. You get the seed by various means, usually by poking into memory or
by solving via order-exposure and timing attacks. This commit doesn't mention
anything of it, because it followed the flawed siphash chapter 6, which only
knows about chained hash tables.

So far only the linux kernel, glibc and java were the only immune hash tables
to such nonsense. Now only java and glibc is remaining. I should have done
that CCC talk this year about the disturbing siphash security theatre out
there.

Thanks god the kernel still uses primes, so you need a full 32bit attack. But
with this commit message I fear this will erode also to power of 2 sooner or
later to use a simple bit test instead of mod (or the mult. trick).

~~~
zx2c4
"Poking into kernel memory" is not possible without a serious security
vulnerability. That's not within this threat model.

"Order-exposure and timing attacks" \-- which types of attacks against SipHash
do you have in mind? Could you elaborate, instead of your misinformed hand-
waving?

