Hacker News new | past | comments | ask | show | jobs | submit login
Multiple programming languages vulnerable to DoS via hash algorithm collision (ocert.org)
152 points by kmfrk on Dec 28, 2011 | hide | past | web | favorite | 39 comments

This is a pretty old attack class. A great starting point for it is Crosby/Wallach from 2003:


... but this was also one of Tim Newsham's (many) contributions to our joint research project on IDS in '97:


Glad to see it getting more attention. It seems to pop up every couple years. (This might be the first generalization of the technique to arbitrary web requests, which, yeah, will probably get a lot more attention).

An interesting technique - construct requests that produce pathological hashes (which degrade to linked lists, which have O(n^2) insertions). Make it big enough, and you tank the server. It may be useful to note their link to a previous article, which implies this has been known since 2003 in Perl.[1] Any news of actual attacks using this technique?

The great part is that, as long as you convert request params into hashes (a reasonable step, I'd imagine the vast majority of web frameworks do this), you're screwed if your language is vulnerable to this. Of course, you can tweak your framework to not use native hashes (or detect such attacks and handle them separately), but that's not generally an option as it's a pretty involved operation.

I wonder: would a hashtable that uses cuckoo hashes be vulnerable? Technically, sure, but would it make the attack infeasible due to the difficulty in creating double-collisions? Or is that not as hard as I'm thinking it could be?

[1]: http://www.cs.rice.edu/~scrosby/hash/CrosbyWallach_UsenixSec...

Note that it would also likely work on something accepting json, not just post params.

> which implies this has been known since 2003 in Perl

Known and also fixed because since then Perl has been using a random hash function.

ref: http://perldoc.perl.org/perlsec.html#Algorithmic-Complexity-... (http://news.ycombinator.com/item?id=3402187)

Good to know, thanks!

To fix it at the framework level, assuming the language gives me access to the hash values, I would probably keep a second table mapping hashes to keys. Then I can tell the difference between collisions and dupes. Just kill the request once there are more than a few collisions.

Dupes aren't the problem, it's duplicate hashes of different data. Dupes should just override the previous value, which matches other hashtable behavior, and works in O(1). Some frameworks may add them to an array, but that's likely to be a real array, or a double-ended queue, or something easier to append to than a plain linked list like hashtables use (because they don't need anything else - to append, they need to check equality on every entry in the list, so they must traverse the whole thing every time, causing the O(n^2) time).

That's why I would keep a hash->key table. If the hash matches but the key doesn't, it's a collision.

That's exactly what hashtables do. When they find collisions, they add it to a list of keys that hash to the same value. They do this to be able to return valid data when you request a key - otherwise, you'd find them rarely and seemingly randomly forgetting data or overriding with unrelated values. It's a useful behavior, and it's the very source of the problem.

One way to prevent this attack is to use a hashtable that has a limit on the number of collisions per table, or per key, and throw an exception when it's exceeded. The problem is that doing so by default would break valid uses in very surprising ways in favor of protecting against an unlikely and probably unknown (when designed) attack, which you won't do, because that shouldn't be handled by basic datatypes.

While it is true that traversing the entire list takes O(n^2) time, it also takes O(n) time.

Isn't it trivially easy to defend against by adding a secret character/number to all of your keys, eg.: "dog" becomes "dogu" and "julia" becomes "juliau"?

In fact, why not do this at the language level? python could pick a random character at startup, and use it for the duration of the process.

From the relevant Python-dev thread, it's probably a bad idea for more than one reasons to make this the default behavior for all dicts; a special dict subclass to be used when necessary makes more sense: http://mail.python.org/pipermail/python-dev/2011-December/11...

A single character wouldn't be any good, since it would be trivial to discover which one was in use by brute force.

A longer salt would probably work, but there may well be ways for an attacker to discover the salt in use, and longer salts mean more overhead.

A single pseudorandom byte chosen at container initialisation would work here, because url hashes are ephemeral. Persistent structures on the other hand are already DOS-prone, but need attacks more tailored to the application.

Even with a random per-initialization byte, you could still brute force it. Pick a random byte on your side, generate colliding keys with it, and send the request. If it doesn't work, just try again until it does. You'd expect to only have to try this ~256 times before succeeding, still seems pretty practical.

Of course, bump this up to 2-4 bytes and it no longer is.

256 attempts is the upper bound. The expected value of attempts should be 128.

I believe this is similar to what Ruby 1.9.x does, and why it's not vulnerable - it uses a varying initialization value for the hashes, so you can't collide every time. When they initialize it (per process, per initialization?) and what exactly it does, I don't know, but yeah. Greater overhead, less predictability, greater complexity, though all of them are essentially minor. It's not the sort of thing that tends to make it into a library until there's demonstrated need.

This is precisely why I agree with C++'s choice of a balanced binary tree as the default map implementation for its standard library. There's no denying that in the vast majority of cases hashing is faster, but that's an optimization, unnecessary except in a few specific instances, and while balanced binary trees provide many other advantages in compensation for their slight performance penalty: predictable performance (via non-amortized worst-case bounds), minimized reallocation, insertion/deletion without invalidating iterators, ordered iteration, range queries, and so on.

This is the video from the technical demonstration and explanation:


Although not listed, Microsoft's ASP.Net is also susceptable to this attack.


I'm late to the party, but I found the details interesting so thought I'd do the best technical summary I could: An attacker precomputes a lot of different values that hash to the same value, then posts them in a web request and it eats CPU on an N^2 operation of processing the form variables due to its hashtable now being N^2 since all hashes are the same. There are some optimizations that make it easier than brute force to calculate collisions against the hash algorithms in use by almost everybody today, see [1].

The interesting part is this is a precomputed attack which relies on hash algorithms being well known and invariant.

Ruby has fixed this by initializing their hash function with a random seed each time an app runs, so an attacker can't do the pre-compute step - a collision calculated on his system won't necessarily be a collision on the server. [2]

Other workarounds include limiting the size of post requests so the dictionary can't get too big, limiting the CPU time available to a single web request, or using a balanced tree instead of a hash table.

[1]: This is a great link on the problem: http://www.nruns.com/_downloads/advisory28122011.pdf

[2]: Ruby's fix: http://osdir.com/ml/ruby-talk/2009-05/msg00167.html

Since there is no patch available for the latest Ruby Enterprise Edition, we've made a quick port of the official ruby solution: http://kovyrin.net/2011/12/29/ree-hash-collision-patch/

Lua isn't mentioned there, so I'd like to ask: since Lua interns every string, isn't it vulnerable even if a framework doesn't put request parameters into the table?

Lua's string hash function looks roughly like this:

  hash = length of string
  for each char c in string
    hash = hash ^ ((hash << 5) + (h >> 2) + (unsigned char) c)
It folds up to 32 chars into the hash. If the string is longer, it skips some characters.

This is a well known hash function and it's very different from the one being used by all the other vulnerable languages, but I don't know how hard it is to find collisions for this one.

It looks like it would be trivial to find collisions; all the functions used (xor, bitshifts, and addition) are trivially reversible. Just replace the nondeterminism of reversing the binary operators (xor and addition) with a random number generator and you've got yourself a collision generator.

If it starts skipping characters in longer strings then I'd say it's trivial to find collisions by simply constructing strings that are identical in the bytes that are examined and differ in the bytes that aren't.

The full source (as of 5.2) is here (http://www.lua.org/source/5.2/lstring.c.html#luaS_newlstr).

Yes, and it also uses a weak hash function by design (for performance reasons it only looks at a few characters not the whole string.)

C#/.NET is also not mentioned. Clearly I'm out of touch, I hadn't realized that Rubinius and JRuby were more popular than C# and VB.NET. Anyone have any idea if/how .NET might be impacted by this?

Vulnerable, but a patch is being released tomorrow night: http://weblogs.asp.net/scottgu/archive/2011/12/28/asp-net-se...

From that link: On Dec 28th 2011, details were published at a security conference describing a new method to exploit hash-table data-structures

From what I've been reading, this sounds like something that's not new at all and in fact has been an open vulnerability for years.

They mentioned .NET in the video that sonnym linked here: http://news.ycombinator.com/item?id=3402152

Sorry, I was skipping through it at that point (2/3 of the way through?), I don't know what their conclusion was. I think they said that it was partially different but still likely attackable, but I didn't catch it all.

Very vulnerable, it uses a variant of DJB's common hash functions. Check out the youtube video posted elsewhere in the thread for a great summary

There's ASP.NET.

So I'm digging around in the Rack source right now trying to see what's up, I haven't seen anywhere that it doesn't just rely on the native Ruby Hash class.

Does anybody closer to the matter know if Rack's actually vulnerable in 1.9?

As far as I understand, Rack is not vulnerable unless the underlaying Ruby interpreter is vulnerable, like old 1.8, Rubinius, older JRuby.

Rack introduced a workaround in https://github.com/rack/rack/commit/5b9d09a81a9fdc9475f0ab00...

this may help in some cases when an interpreter fix is not available/installable but not fix the hashing problem in general, especially outside of POST params.

TFA says Ruby 1.9.x isn't vulnerable because its hash function is randomized

Emergency out-of-cycle Patch being issued today by Microsoft:


Related to this earlier HN post: http://news.ycombinator.com/item?id=3401254

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact