

[btrfs] is vulnerable to a hash-DoS attack - giis
http://crypto.junod.info/2012/12/13/hash-dos-and-btrfs/

======
finnw
Another job for siphash?

<https://131002.net/siphash/>

------
ibotty
on lwn, chris mason is responding

<https://lwn.net/Articles/529077>

------
dkhenry
This is a misleading title. The limits of this attack make it so that your
better off fork bombing the system then trying to "hack" it with this kind of
attack. The two attacks specified are making it impossible to create a
specifically named file in a shared directory, and making deletes take a long
time. Neither of these is a real DoS attack and Chris appears to be taking the
reasonable approach of acknowledging that it can be made better and scheduling
it for the next pull window.

This whole article smells of grandstanding.

~~~
darkarmani
That's extremely uncharitable. The major difference that you gloss over is
that one has tools to stop fork bombs, but there are no tools to stop this
kind of attack.

This attack can easily be used to disrupt systems. I can imagine every naive
implementation of file upload out there being vulnerable to this. When the
system goes to delete old uploaded files, it halts.

I'm not sure how "making deletes take a long time" is not a DoS attack.

------
leif
Not a new attack. There is a decent history of an arms race in TOCTTOU races
in access/open that leads to this type of algorithmic complexity attack
against bad hashing. Maybe btrfs does worse things if you do this but it's not
alone.

See <http://www.cs.stonybrook.edu/~xcai/races2.pdf> and some of its
references.

~~~
CJefferson
Doesn't the first paragraph of the article talk about the history of hash
collision attacks? Or are you talking about some other way in which this is
not a new attack?

Also, looking at his numbers, btrfs is doing something very wrong, over simple
hash collisions. A file system which takes over 5 seconds to make 61 files is
simply broken (in this situation, not making general comments about btrfs).

~~~
leif
You're right, I skimmed and that's my fault.

However, it's worth noting that these sorts of attacks can give privilege
escalation, not just DoS.

------
apawloski
So it seems like fixing this is as a simple as using a hash algorithm that's
more robust to collisions? That's not _too_ bad, am I missing something more
severe?

~~~
CJefferson
The hash issues aren't just causing a slight slowdown, they seem to be
bringing the fs to a complete halt. Also, changing hash, unless done with
care, will cause compatibility issues.

~~~
apawloski
CRC32 used to be pretty common -- is there a typical replacement being used
instead? Of course usual mantra is "it depends on what you're using it for",
but I'm curious if there's a typical drop-in.

>Also, changing hash, unless done with care, will cause compatibility issues.

I'm not calling you out, but I'm interested in what kinds of compatibility
issues you're thinking of?

~~~
CJefferson
I was mainly thinking about the disc image not being mountable by older
kernels. File systems usually aim to change as little as possible.

------
PaulHoule
I don't like innovation in filesystems. ext4 might be boring, but ext4 wrecks
don't make the evening news the way wrecks with ZFS do. I worked on a system
that used reiserfs in production and we were always dealing with problems
caused by the weak reliability guarantees. For instance when the system
crashed we'd find our filesystem was full of files full of trash data (it
allocated space for the file but never wrote the data so this is what was left
behind.)

Since the system would try to read the trash data, this was a problem.

I was looking at the btrfs documentation the other day and noticed that (1) it
behaves worse when space runs out than ext4 does, and (2) there's no accurate
way to measure free space on a btrfs volume so it's hard to avoid running out
of space.

Whatever benefits btrfs has is erased if you have to deal with wrecked and
full filesystems all the time.

~~~
dunecn
Regarding (1), yes a COW file system will (generally) behave "worse" that a
non COW file system when space is limited. But, this is a design choice that
makes sense for many use cases. Just to point out a few: * Snapshots are
virtually free. * Writes are sequential. * "Harder" to destroy existing data
(block are not overwritten in place)

These design decisions are not made light heartedly.

------
sgt
I wonder if ZFS is vulnerable to something similar?

~~~
cokernel_hacker
Sadly, this article gets some stuff about btrfs wrong. Allow me to clarify:

* First off, btrfs does not use "hash tables". Instead, it uses hashes to create keys that index into B-Trees. The problem is that hash collisions are not handled with an efficient approach. btrfs is forced to do a lot of work to deal with these collisions.

* ZFS uses two distinct data structures to represent the contents of a directory: the so-called "micro-ZAP" and the so-called "fat-ZAP"

micro-ZAPs are for small directories. This is OK as the number of collisions
is limited by the relatively small size of the micro-ZAP.

fat-ZAPs are like on-disk extensible hash-tables.

They both use CRC64, I think that fat-ZAPs might have a problem.

~~~
ajross
That's interesting. Is there any guidance as to why the tree keys are derived
hashes instead of the actual file names? Obviously the original space is
invulnerable to collisions by definition. Was the point simply to make the
keys smaller to fit in a single machine word? Is that really helpful for
performance vs. an optimized strcmp?

I'm generally a big btrfs fan, but here it seems like they got caught due to a
senseless overoptimization...

~~~
cokernel_hacker
Variable length keys are difficult to implement efficiently. The alternative
is not doing file name lookups by key query which would damage performance in
unpatholgical cases.

~~~
ajross
Really? I'm not sure I buy that. B-tree traversal in real world cases is
virtually always going to be I/O or memory bandwidth bound. That's just not
going to be sensitive to the handful of cycles you save except in the case of
_tiny_ directories that are already in L1/L2 cache. But there, you're paying
the up-front cost of hashing the input file name as "extra" and it's not even
clear to me you'd save anything overall.

Basically, this just smells like a premature optimization to me. If it were my
project and lacked the hashing feature, and someone wanted to add it, I'd
_really_ want to see some numbers before accepting it.

~~~
Someone
_"B-tree traversal in real world cases is virtually always going to be I/O or
memory bandwidth bound."_

I am not an expert on file systems, but have you thought about the following:

\- using file names in disk blocks rather than file name hashes means fewer
entries per disk block. That, in turn, changes the constant of your B-tree
traversal.

\- with variable-length keys, keeping your B-trees balanced is tricky, if not
practically impossible.

EDIT: disadvantage of using hashes would be that you need to read the filename
proper (with small hashes, you cannot ignore hash collisions). That would be
an extra I/O. So I guess this would not be beneficial for small directories.
Maybe you could start out by having in-block hashes amd filenames and only
move the names to a separate block when you need a second block?

~~~
cokernel_hacker
The most common implementation technique that I know of is to include a small
constant number (~4) bytes of the actual name next to the hash.

~~~
Someone
I can see how that would help if your filenames are bin, dev, opt, usr, and
var, but in the general case, I do not see how that beats having a longer hash
(or a second, independent hash).

Four bytes of the name will have less entropy than such data, and having part
of the filename around will not help making sure that a matching hash implies
a matching filename.

Can you give explain this or give a name of a filesystem doing this?

