Hacker News new | past | comments | ask | show | jobs | submit login
Linux Filesystem Fuzzing with American Fuzzy Lop [pdf] (linuxfoundation.org)
159 points by grhmc on April 11, 2016 | hide | past | favorite | 45 comments

Fuzzing is an incredibly useful technique for finding bugs in a codebase. But it should be noted that BUG() is the only valid response if your state is invalid. Filesystems should never try to soldier on if their internal state becomes corrupted -- there lie dragons.

> Fuzzing is an incredibly useful technique for finding bugs in a codebase

Very much so. Even really common code has issues found via fuzzing on a regular basis. In the past six months I've reported NULL pointer deferences printing fingerprints of SSH keys, segfaults in the GNU awk parser, and most recently segfaults when parsing HTML files with w3m.

Most of the time setting up fuzzing is pretty simple, but even updating programs to read from STDIN rather than sockets is usually possible if you're careful and patient.

The hardest part is waiting for the damn things to run. Right now I've had a fuzzing session going against the text-based browser lynx for 4 days, and still counting. (Specifically looking to see if it can be crashed when converting HTML to text, as is often done in mutt, etc, via 'lynx -dump').

> The hardest part is waiting for the damn things to run.

Would it be possible to distribute the fuzzing on a small cluster or some cloud platform thing?

If you wrote your own fuzzer, using a queue to distribute work, then almost certainly. But if you're relying on things like AFL then I think the answer is generally "no".

There is some provision for parallelization in afl: https://github.com/mirrorer/afl/blob/master/docs/parallel_fu...

The correct response is to stop writing to the filesystem. They mean BUG as in it crashes the kernel. Having said that most Linux machines will not let you mount filesystems unless you are root or physically present so they don't seem like major concerns although should be fixed.

> The correct response is to stop writing to the filesystem.

Stop writing AND READING, as what's read can be corrupted and yet written or used elsewhere.

> They mean BUG as in it crashes the kernel.

I know what BUG means. There's a reason it exists: to make sure that code in an invalid state doesn't do something really dangerous. assert() is very useful.

It should be noted that we've hit that BUG() assertion on ext4 using the mount option errors=remount-ro - it should just not be possible to trigger an invalid state causing your kernel to Oops when you've configured it like this.

See Vegard's discussion with Theodore Ts'o about this: http://marc.info/?l=linux-ext4&m=144898400422842&w=2

I'm awed by your work but this is really an abomination:

"Unfortunately, company policy prohibits me from sharing the actual code." (http://marc.info/?l=linux-ext4&m=145007745502639&w=2)

This was true before the advent of mount namespaces, but is now incorrect: See namespace(7) or something like:



Super interesting, but...

We have "Time to first bug" for a lot of file systems (ext4, btrfs, hfsplus, NTFS) covering a wide range of OSes & platforms ... and yet not a single word about ZFS?

Come on Oracle... you can do better than that.

I am not surprised to see them cover every major filesystem option except ZFS. Oracle does not like to talk about OpenZFS. Not to mention, they could say that they were only testing in-tree file systems.

My understanding is that Oracle's linux devs were behind btrfs before the Sun takeover and thus are boosting that, and also Oracle's lawyers are highly litigious on licensing issues, so they probably view ZFS on Linux like lions circling a carcass.


Chris Mason wrote btrfs when he started at Oracle (not Sun):


Conversely, it seems that btrfs lasts only 5s before hitting an issue so I wonder this will drive improvements in that area and how this might affect NASs that use btrfs like Synology.

> and yet not a single word about ZFS?

ZFS is not included in the Linux kernel. Much of this work was about building the infrastructure to be able to perform this kind of test.

Trying to do it for file systems that aren't included in Linux is a lot of effort for much less gain.

Wow, I did the same thing as part of my Bachelor's thesis recently. I'm glad they ran into similar issues I did, although I didn't spend much time on that part of the work. (It still got me the best results, afl is great like that.)

I guess I should bump reporting/fixing the issues I found on my todo list.

Awesome! I work on an embedded project that could benefit _enormously_ from having AFL run against it, but I've never taken the time to do it, because it would take several engineer-weeks to even investigate if it could be done. Their approach to "porting AFL to the kernel" makes me think that yes, it would cost perhaps an engineer-month but at least the outcome wouldn't be "nope, not practical". Thumbs up.

If you know the build system for your project, have some way of getting input via stdin, and existing test corpus, it's surprisingly simple.

I was able to set it up in a few hours for a moderately sized library. Along with valgrind, I was able to find and fix all of the bugs it uncovered after over a CPU month of testing.

Arguably llvm's libFuzzer is around the same magnitude of complexity and delivers similar results.

I used it to create a fuzzer for CPython [1] and it didn't take terribly long to get something going. Majority of my time's been focusing on new test cases.

[1] https://bitbucket.org/ebadf/fuzzpy

Once again Ext(4) shows that its praise for a clean, robust code base is well deserved...

Have we lowered our standards so much that "it took longer to crash it with a fuzzer" already qualifies for "clean, robust codebase"?

Our standards were higher in the past? [citation needed]

There's a trade off between development cost, performance, functionality, and correctness. Writing a filesystem with reasonable performance and that never crashes and never does the wrong thing is hard.

> Our standards were higher in the past?

No, but our fuzzers were worse, so we didn't know :-)

It has just been used more than the others I expect.

XFS has been around for longer than ext4, and is used in quite a lot of enterprise deployments – and is, not incidentally, roughly in the same stability class as ext4.

Randomly modifying a filesystem image and then fixing up the checksums seems a little unfair. Would it not be reasonable to write code that assumes data that matches its checksum is valid? Isn't that the point of a checksum?

> Would it not be reasonable to write code that assumes data that matches its checksum is valid? Isn't that the point of a checksum?

That would be absolutely unreasonable. Filesystems can be malicious, for example Stuxnet transmitted via USB flash drive.

The checksum is only telling you when the value is probably incorrect, it doesn't show that it's correct or meaningful.

For example (on a completely hypothetical filesystem/disk), lets say you want to open a file. You read the entry in the filesystem metadata that tells you on what sector that file begins and it gives you a value of 5005. Even if the checksum is correct, the filesystem code still needs to check that a) 5005 is actually within the range of valid sectors, b) that it actually makes sense as a sector value (maybe files must always start on a multiple of 4), c) that the data at that sector actually looks like a file, d) etc.

If you don't update the checksum as well when generating the test data then you limit the amount of depth in the testing since most errors would get caught immediately at the checksum validation stage.

Having read that slide, I can only guess that they were fixing up checksums for the pages they modified in known valid, but wrong ways. In other words, if there's a page on disk containing a timestamp and the test wants to bump the timestamp downwards to see what happens later on, then you need to fix that page's checksum otherwise the test will not do what you wanted.

Of course that means your mutations have to be "fair" in the sense that your new values should still be valid. Because as you stated, just stomping on blocks and fixing up their checksums seems like a silly test.

> Would it not be reasonable to write code that assumes data that matches its checksum is valid?

If you assume that, you don't need checksums in the first place.

How's that? If you go read your data, re-checksum it, and that matches the original checksum, then you have confidence (to the strength of your checksum function) that the data is not corrupted.

Something could generate corrupted data (cosmic rays!), then checksum it and write it to disk.

That is very true (and a valid concern) but it's hard to understand what the filesystem should do in this case. Seems to me that the best you can hope for is that the data is corrupted in a way that doesn't make "sense" and the filesystem has a check for that, and quits.

Other than duplicating the pages (which is certainly a valid approach) how else can you recover from a data stomp?

I expected ext4 to last longer than the others.

It does, by a fairly large margin (alongside XFS)? The "time to first bug" table is sorted alphabetically, and the times are humanised not in the same unit.

A reverse time-sort would be

    ext4 (2h)
    XFS (1h45)
    GFS2 (8m)
    NTFS (4m)
    NILFS2 (1m)
    HFS (30s)
    HFS+ & ReiserFS (25s)
    OCFS2 (15s)
    F2FS (10s)
    BTRFS (5s)

I think the main purpose of the time to BUG list is demonstrating the value in file system fuzzing. That you can have things like NTFS, HFS+, XFS that have been around a long time, lots of money spent on their development, subject to massive piles of real world tests and yet in just a couple hours hit a BUG, says something good about the technique. Not bad about the filesystem.

Even in the Btrfs case, it may just be a function of the code having lots of awareness of where bodies are buried, so it hits BUG rather than continuing in a bad way and blowing up later. But as yet not enough testing to figure out all the path ways for hitting those bugs. It may explain the edge cases that still pop up with Btrfs, where I go "umm yeah I use this every day and never experience anything like that" but another user suddenly hits something that's a "wow" moment.

Why isn't Oracle running this on the JVM or Oracle DB?

Who says they aren't? Oracle DB results would be internal. Where would you start with AFL on the JVM? Class loader verifier?

Using a gcc plugin to instrument code for AFL sounds interesting (and generally useful for speed). Does anyone know if this plugin's code is available anywhere?

I don't know about their implementation, but I wrote exactly this plugin for GCC several months ago and announced it on the afl mailing list, as a patch to the source. The lack of replies lead me to believe it was mostly uninteresting to people - but maybe I should have advertised it more.

You can find the source code here: https://github.com/thoughtpolice/afl/commit/e54c0237e934d734...

It should not be difficult to update this to work on more GCC versions (I only tested on GCC 4.8.x), but that will take some #ifdef'ery. Porting to newer AFLs should be relatively trivial.

EDIT: I initially wrote this for no particular reason, mind you, other than to play around with writing GCC plugins, and the result wasn't so bad, modulo non-existant documentation. I also thought it would be nice to have an identical equivalent to 'afl-clang-fast' for GCC ('afl-gcc-fast'), in the hopes that perhaps one day the hacky, sed-inspired backends could be removed from afl. I initially wanted to use this on a POWER machine as proof of a portable GCC plugin for afl, although I lost interest in porting to a newer GCC, before losing access to the machine. Watching afl fly on 176 cores was fun, though.

Any link to the video presentation?

Just noticed this presentation is in the future (or a typo)

Nice to see that the law firm of Oracle still has a few engineers on staff.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact