How are you measuring that?

moyix · on June 25, 2015

It's based on dynamic taint analysis. We assign each byte the input file a unique label and then track the propagation of those labels throughout a computation. Additionally, we track the "compute number"; basically, if you have an operation in involving tainted data like

    A = B + C

Where at least one of B or C is tainted, the compute number is:

    TCN(A) = max(TCN(B),TCN(C)) + 1

So if you track the propagation of tainted data and the associated compute number, you can make a map between the byte label and compute number. Then the only remaining step is to map back from bytes in the file to pixels in the image; I used an input file in BMP format so that the correspondence between pixels <-> bytes would be fairly direct.

All this is implemented using PANDA [1], which has nice facilities for dynamic taint analysis. I just recorded Paint Shop Pro opening the currency image, and then replayed with the taint analysis to generate the raw data used for that image.

The limitation currently is that the granularity is not great (you can see that there are relatively large rectangular sections of the image with the same compute number). My guess is that this is because a lot of image processing algorithms at the lowest level end up being "multiply the image data matrix by this other matrix and pick out the nonzero entries" or some such -- so all the data there has the same amount of computation done on it. But apparently there is some sort of multi-pass algorithm at work here, which gives rise to the differing amounts of computation done on each region. I'm looking at ways to improve the granularity right now, possibly by incorporating the magnitude of the transformation on the data somehow.

[1] https://github.com/moyix/panda

tacone · on June 25, 2015

I found these videos on his website:

http://laredo-13.mit.edu/~brendan/eurion_attention.mp4

http://laredo-13.mit.edu/~brendan/eurion_tcn.mp4

moyix · on June 25, 2015

Yup. These weren't as exciting as I'd hoped. The former (eurion_attention.mp4) basically shows a sliding window of what pixels had computation done on them over time; the most recent 1000 bytes (~330 pixels) get "lit up" in each frame of the video. You can see in the early part of the video how it scans over the image multiple times, but the later parts are not very interesting (and the whole thing is absurdly long; I recommend watching at 50X).

The second (eurion_tcn.mp4) one is simpler, and just tracks the mapping between pixel and compute number [1] over time.

[1] https://news.ycombinator.com/item?id=9778871