
Pulling JPEGs out of thin air (2014) - shubhamjain
https://lcamtuf.blogspot.com/2014/11/pulling-jpegs-out-of-thin-air.html
======
pjc50
A couple of years ago a colleague built this for H264/5:
[http://www.argondesign.com/products/argon-streams-
hevc/](http://www.argondesign.com/products/argon-streams-hevc/)

It's not just a fuzzer, it guarantees to hit every part of the spec (subject
to what "profile" you're implementing). It's not free, it's a product for sale
to implementers of HEVC for verification purposes.

~~~
schoen
AFL might get significantly more code coverage than these test streams do
because it actively seeks out more code coverage by observing the behavior of
an individual binary on various inputs.

You could imagine a parser that deals correctly with every single one of the
test streams and implements every single feature in the spec, yet also has an
undetected exploitable vulnerability because it made an assumption about
objects' sizes (which the spec permitted it to make, but which an attacker
could take advantage of).

(On the other hand, maybe I don't understand enough about H.264 to appreciate
a reason why this isn't possible in this specific context.)

~~~
DougBTX
So you're saying that a fuzzier would be more likely to hit every code path,
that makes lots of sense when testing for vulnerabilities / crashes etc.

Validating against a spec is almost the opposite, since first it has to check
that there is a code path for each part of the spec.

~~~
schoen
Right.

One extreme might be if you had a backdoor where the presence of a particular
byte sequence in the input intentionally triggers some kind of malicious
activity. The test streams can't detect this because they presumably don't
contain that exact byte sequence, whereas something like AFL can find it
because it can (potentially, depending on the nature of the test that
recognizes the backdoor sequence) deduce what input would trigger coverage of
that code path.

------
acdha
Previous discussion:
[https://news.ycombinator.com/item?id=8571879](https://news.ycombinator.com/item?id=8571879)

Since then the bug list has grown impressively:
[http://lcamtuf.coredump.cx/afl/#bugs](http://lcamtuf.coredump.cx/afl/#bugs)

~~~
stevekemp
Fuzzing is fun, and still there are easy-to-discover issues lurking in even
widely used tools.

For example I setup a site where I require users to upload an SSH key (for
access to a git repository), and figured I'd do what github, etc, do in the
display - show the fingerprint.

Given an SSH key you can get a fingerprint like so:

    
    
         deagol ~ $ ssh-keygen -l -f ~/.ssh/id_rsa
         2048 4d:19:f2:de:ba:f6:06:31:98:af:9e:2a:di:ce:ca:b2 ~/.ssh/id_rsa.pub (RSA)
    

Can you imagine an SSH key causing ssh-keygen, or ssh to segfault? I found one
over a weekend:

[https://blog.steve.fi/so_about_that_idea_of_using_ssh_keygen...](https://blog.steve.fi/so_about_that_idea_of_using_ssh_keygen_on_untrusted_input_.html)

I found similar issues with other well-known tools, for example a program that
would cause GNU awk to segfault.

Really I should do more..

~~~
acdha
Yeah, AFL is an incredibly useful bit of work - I ran OpenJPEG through it and
a number of reasonably actionable bug reports for the maintainers after a few
hours. That class of tool used to be a LOT noisier.

------
vwcx
At first I thought that this title was referring to the artist who intercepted
satellite internet transmissions and re-constituted what JPGs were contained
in the requests: [http://time.com/3791841/the-green-book-project-by-jehad-
nga/](http://time.com/3791841/the-green-book-project-by-jehad-nga/)

~~~
mwambua
I don't think that's what that article is about. The intercepted images from
Lybian internet traffic and by some method akin to steganography he encoded
Gaddafi's book, 'The Green Book', into the images.

------
exabrial
This is mind-bogglingly cool and simple. I've been looking for a "fuzzing for
noobs" article and tool for a long time!

~~~
JadeNB
Thank you for being the one to express the childlike wonder that I felt. The
other comments are informative, but seemed to hit the technical aspects of
this post (entirely appropriate on Hacker News!) as opposed to my simple,
open-mouthed "wow!"

------
some1else
So afl-fuzz could technically be considered the most advanced universal
software license key generator? :) I suppose the days are numbered for offline
validation.

~~~
pjc50
Not if there's a proper crypto primitive in the license key system and it's
reasonably secure against birthday attacks.

------
derefr
Ignoring the actual "fuzzing" going on here—wouldn't it be possible to use
this approach (or something like it) to make a sort of 'universal wire-
protocol auto-discovery-and-negotiation library'? Picture a function like the
following:

    
    
       def client_for(arbitrary_socket)
         valid_requests_discovered = repeatedly_fuzz_probe(arbitrary_socket)
         valid_request_formats = cluster_and_generalize(valid_requests_discovered)
         peer_idl = formalize(valid_request_formats)
         client_module_path = idl_codegen(peer_idl)
         compile_tree(client_module_path)
         require(client_module_path + "/client.so")
       end
    

I imagine that if we're ever doing the Star Trek thing, gallivanting around in
starships encountering random alien species with their own technology bases,
this would be the key to anyone being able to meaningfully signal anyone else.

~~~
roywiggins
The approach needs to be able to instrument the executable and inspect which
code paths it uses, so it won't work across a network.

~~~
munin
Technically, you just need fine-grained enough measurement of the
request/response delay, since you could probably use time of response to
measure how many instructions had been executed in response to your fuzzed
input.

Over the internet, this probably gets lost in the noise, but maybe with
_enough_ inputs, it wouldn't.

~~~
regularfry
This attack has been done, over the internet. Here's a paper about it from
2003: [https://crypto.stanford.edu/~dabo/pubs/abstracts/ssl-
timing....](https://crypto.stanford.edu/~dabo/pubs/abstracts/ssl-timing.html)

------
xhrpost
Anyone having any luck compiling libjpeg-turbo with instrumentation?

~~~
snerbles
This configuration command prior to make worked for me on Linux:

    
    
        ./configure --disable-shared CC="afl-gcc"
    

It calls the AFL GCC compiler instead, which presumably injects
instrumentation.

~~~
xhrpost
Thanks! Got that part working. How can you tell if djpeg exits 0 for a given
queue item?

~~~
jagger11
I'm not sure if it's required (to know the exit code). inputs, both valid and
invalid, will activate various paths in djpeg/libjpeg, and that's the basic
goal here.

In case you'd like to test persistent fuzzing (should be faster due to a few
factors), you can try modifying
[https://github.com/google/honggfuzz/blob/master/examples/lib...](https://github.com/google/honggfuzz/blob/master/examples/libjpeg/persistent-
jpeg.c) \- it will probably take creating main() func which will read data in
the AFL_LOOP() loop and call LLVMFuzzerTestOneInput with that input.

~~~
xhrpost
My goal with the exit code is that (I think) an exit of 0 would mean that a
valid JPEG image was found.

------
p4bl0
Is that feasible with source code instead of JPEG? Compilers and interpreters
also have tons of warnings and errors that could help the fuzzer. Is anyone
aware of such an experiment?

------
amelius
What if the probability of generating an image that is much larger than what
would fit in memory would be greater than generating a normal-sized picture?
Would the fuzzer be able to produce something in practice?

~~~
f-
It's very common when fuzzing. That's why you normally want to place memory
limits on the target process, to avoid bringing the system down. AFL does that
automatically, most other fuzzers have a config option.

------
JabavuAdams
Crazy how quickly Gabor-like patterns start to show up.

------
diyseguy
god I wish this would build on cygwin

~~~
logicallee
why? the peformance hit on VM's isn't that big, if you really care why don't
you just run one? 7 hours or 27 isn't that huge of a difference in the grand
scope of things . . .

~~~
andrewflnr
Maybe they want to run it on a Windows app.

~~~
logicallee
Oh, right. For some reason I thought they just wanted to follow along with the
example but of course this makes more sense.

