Hacker News new | comments | show | ask | jobs | submit login
Why blurring sensitive information is a bad idea (2007) (dheera.net)
265 points by Schiphol 733 days ago | hide | past | web | 124 comments | favorite

I feel this is also very relevant http://en.wikipedia.org/wiki/Christopher_Paul_Neil

Police took a photo with a "swirl" effect of a paedophiles face and reversed it to reveal a very usable picture. So good in fact he was found and arrested.

He wasn't just a pedophile, he was a child molester.

"just a pedophile"

A phrase I never thought I'd see.

A pedophile is someone who is sexually attracted to children, whether on not he/she acts on that desire in any way.

A child molester is someone who actually engages in inappropriate sexual activity with a child.

A child molester may or may not be a pedophile, for what it's worth (one may molest a child for reasons other than sexual attraction - even when you look at rape of adults, in many/most cases sexual attraction is not the primary motivation of the rapist[0]).

[0] https://en.wikipedia.org/wiki/Motivation_for_rape

Someone who is homosexual isn't any more inclined to molest or rape someone of their own sex than a pedophile is to molest or rape a child.

More on topic about censoring important information. I'm not entirely sure about this, but I feel like I've seen images where the thumbnail was retained from a previous version. Even at 32x32 there might be some way to expand it and gather some kind of information from its pixelated form even after the actual image was censored.

"Someone who is homosexual isn't any more inclined to molest or rape someone of their own sex than a pedophile is to molest or rape a child."

I'm sure you meant something about the proclivity to commit rape or molestation, but all this statement does is say there is a positive correlation between the two groups (and no citation or explanation of reasoning either).

Your statement would still be true if both groups had a large increase in probability of rape or molestation, vs the non-homosexual and/or non-pedophile.

Your pedophile / homosexual comparison seems to imply there's such a thing as consensual sex with a child??

A pedophile is much more likely to molest a child, because that's the only way their desires can be realized in the physical world. A homosexual could, I dunno, go have consensual gay sex? That's why people find this comparison offensive.

I think the distinction is between what a pedophile finds themselves drawn to and what they do. And, I might add, there are many ways for people to find gratification without actually having sex, or in fact ethically needing the consent of anyone. Also, pedophiles are absolutely comparable to homosexuals—both are individuals classified for their sexuality, regardless of how they express it or wish to be classified.

(To clarify, I absolutely to not condone any interactions whatsoever to do with pedophilia... but come on, people, let's not be blind here to the existence of people with full self control, ethical behavior, and private thoughts and desires.)

> To clarify, I absolutely to not condone any interactions whatsoever to do with pedophilia... but come on, people, let's not be blind here to the existence of people with full self control, ethical behavior, and private thoughts and desires.

Completely agree. Sad to think that there are people out there that have done nothing wrong, and cannot control their desires, and yet people (like many in this thread) would judge them without a second thought.

Lets hope none of them commit adultery by simply desiring another person. /s

> Your pedophile / homosexual comparison seems to imply there's such a thing as consensual sex with a child??

Western law currently says no. But allowed sexual behavior (especially in this area) has historically been all over the map. One or two generations ago we were castrating homosexuals for their unnatural bevahior. Pederasty was an important part of Greek culture. Research the age of consent around the world and prepare for surprises.

I'll agree that a pedophile is much more likely to molest child, though. We define it that way.

Given that most thumbnails are done with resampling to scale, I imagine this attack would work similarly there as well.

We were using ImageMagick to resize images down to a smaller size. Turns out that, unless passed certain flags, ImageMagick would retain all metadata.

A few times, we'd end up with a 32x32 pixel image that was 20 megabytes in size - because that metadata included the original image.

What is the relation between homosexuals and pedophiles?

They are both people with no relation to whether or not they will molest someone else they are attracted to. You could also just throw heterosexuals in there to be safe. I suppose if you wanted to avoid comparing a sexual orientation to something that isn't you could maybe use something like being attracted to people who wear glasses. People who find other people who wear glasses attractive has no relation to whether or not they will molest them.

I'm with you 100% on the whole issue, and believe that pedophilia isn't any less ethical than homosexuality is, but I do feel that a sexually repressed person who cannot release their sexual urges is more likely to rape someone than a person who regularly has sex/relationships/etc, regardless of sexual preference.

Maybe not very much more likely, but it's definitely not less, and probably not equal.

> but I do feel that a sexually repressed person who cannot release their sexual urges is more likely to rape someone than a person who regularly has sex/relationships/etc, regardless of sexual preference.

Agreed. Sadly, society (like many in this thread) would rather create an environment where people like that cannot get help. Rather, they must stay repressed, stay isolated, and further the likely hood of uncontrollable behavior.

Rehabilitation is something most don't want to discuss these days. God help those who lose the genetic lottery.

Please do not treat pedophilia like it is a sexual orientation. Also, please do not try to compare homosexuality to pedophilia — you clearly don't know what you're talking about and the underlying implication is quite offensive.

I don't think OP was trying to offend. It was a purely logical assertion that sexual attraction does not correlate with rape/molestation. You could have put any form of sexual preference there and it wouldn't change the meaning.

I think the point here was that [and I could be wrong] we shouldn't assume pedophile == child molester because pedophilia is currently classified as a mental disorder but more importantly, as a sexual attraction to prepubescent children which does not in any way indicate a tendency to cause harm. For all you know, in 60 years, we might start seeing moves to accept pedophilia as a socially acceptable sexual orientation as we did we homosexuality. Let's not forget the dark history of homosexuality's acceptance into society too soon as traces still linger even today

Again, purely logical statement and not intended to insult.

In what sense is pedophilia not a sexual orientation?

It's generally deemed a socially unacceptable orientation, just as homosexuality was in many societies until only recently, but it's an orientation nonetheless. It's a useful metaphor with a lot of parallels to be drawn.

In any case, the parent comment was only using the metaphor to illustrate an important point; that being sexually attracted to a certain group of people, and actually molesting members of that group, are two wholly separate things.

Referring to someone convicted of molesting children using the term for someone who is merely sexually attracted to children is a significant misrepresentation.

No. Pedophilia is a psychiatric disorder or paraphilia; not a sexual orientation. This is thoroughly discussed and documented. The APA has changed any such notation in the DSM.


This is just splitting hairs over definitions.

The APA are not the ultimate arbiters of what is and is not a sexual orientation, and if anything, the fact DSM-5 referred to it as a sexual orientation lends credence to the view that it is as such.

Sexual orientations and psychiatric disorders are not mutually exclusive.

Bear in mind for much of history society treated homosexuality akin to a psychiatric disorder (Edit: It was in fact classified as a mental illness in both the first and second editions of the DSM, released in 1952 and 1968, so let's not treat the DSM as gospel)

I'm going to reply here singly, rather than to each of the three that replied to my post above.

Please be clear that the article described the description of pedophilia as a sexual orientation in the DSM was an error - not a mind-changing. There was no "credence" lent to it being viewed as such; it was mistake and was admitted as such, and corrected. It was intended to read "sexual interest", not sexual orientation.

“In fact, APA considers pedophilic disorder a ‘paraphilia,’ not a ‘sexual orientation.’ This error will be corrected in the electronic version of DSM-5 and the next printing of the manual,” the organization said. The error appeared on page 698, said a spokeswoman.

The fluidity of the APA DSM is not something that is worth arguing; we can all agree that definitions change and have changed. I'm operating under the current set of definitions and primarily wanted to make the point that likening paraphilic disorders as "sexual orientations" is typically hurtful for reasons that probably don't require explaining.

You realize these documents are not carved in stone like holy scriptures, right? DSM is in its 5th edition already. Homosexuality used to be treated (or still is in other cultures) as paraphilia or worse. Who knows, maybe in 50 years from now pedophilia will follow suit.

To quote from your link:

> According to the DSM-5, pedophilia “refers to a sexual orientation or profession of sexual preference devoid of consummation, whereas pedophilic disorder is defined as a compulsion and is used in reference to individuals who act on their sexuality,” NeonTommy wrote.

Basically, the term that they use in the DSM is "pedophilic disorder" (changed from "Pedophilia"), which is a classified disorder.

It never fails: You can always find some pedantic creepy internet weirdo to start tut-tutting "now, now, let's think of the poor pedophiles..."

So you'd rather judge them without action? How sad a situation this is.

Rather than creating an environment where someone with desires (that they cannot control, mind you) must hide and isolate themselves, rather than Getting help!

You realize that you're only making pedophiles more likely to harm children right? They need help, not damnation.

Pedophilia in itself isn't a crime.

Well, unless we've started prosecuting thought-crime. I could have missed that update.

Prosecution of downloading pedo material is as close as it gets to thought-crime.

Similar to how downloading some else's bank account into yours is practically thought-crime?

"But officer, I was only shuffling bits!" isn't a defense.

I don't see how robbing someones bank account is comparable to downloading outlawed pornography.

Hoarding illegal material is a crime (and not a thought crime.) We call it 'recel' in French. Not sure what is the legal term in English, 'fencing' is argotic, right?

Possessing illegal bits is kind of a basic example of thought crime.

Right, that's what I'm saying about the bank account. If child pornography really is just bits, then the same should go for your bank account. When you "illegally" increase your bank account it's just bits, right?

Or how about classified information. Is it thought crime to have classified information you aren't allowed to have? "But it's only bits, how can bits be illegal?"

Also relevant?



Police == 4chan?

Yeah, basically. 4chan's admins have been working with the police for many years because of the questionable shit that gets posted on /b/ (child porn, murder threats/evidence, suicidal posts, etc.) They would have been shut down years ago if they weren't cooperating.

The way I've heard this story in the past was specifically that anon did the unswirl and then tipped off the police. But seeing as that isn't mentioned on wiki (and not uber keen to google about), I guess it's probably not the case.

Someone linked the relevant case elsewhere in the discussion, and apparently it was a computer expert working for the German police that did the unswirling (which is about what I remember from the news at the time): http://en.wikipedia.org/wiki/Christopher_Paul_Neil

In the modern world we have a new class of difficult to obtain information. I know how to find the answer to my question, but not without exposing myself to information that I do not want.

Going on around '09 it was pretty common to have pornography posted with the same type of swirl or other manipulations. It was a game to undo it, using a particular piece of software.

So if something is 1337 days old it gets autoreposted? Could live with that.

Previous discussion: https://news.ycombinator.com/item?id=1939607

A bit more precise this post is even older, and was first discussed in 2007 prominently in these two places: http://www.reddit.com/comments/xaae/how_to_extract_personal_...


(please refrain from responding with XKCD references, I'm aware of that, just want to link to older discussions)

My favourite comment from the old discussion came from andrewreds:

> WHAT... why would you completely black out the number, where you could instead use random coloured squares, that look like it is a blurring, so someone can go through all the effort, decoding your white noise, and thinking in the end they have your number... when they don't ;)


I wrote this a lot more than 1337 days ago. IIRC it first appeared on Slashdot.


Not sure what prompted someone to repost it to HN today. But that's cool because I see people continue doing this on a daily basis.

To be fair, that's a pretty numerically apropos age for HN.

Their answer is to "color over it", but BE CAREFUL if you do that. I'm sure that some of us remember the US government document(s) in the late 90s/early 2000s that were released in redacted form, but the person didn't realize/understand that people with the Acrobat editor program could remove the black bars.

Also, image formats like JPEG have a thumbnail stored in them, which may not be updated when you edit the image.

Also, when blacking out data from JPEGs, leave generous margins, because there may be compression artifacts in the nearby pixels that also act effectively as a one-way hash. I haven't tested how consistent these artifacts are, but it's better to be safe than sorry.

Do they? First I’ve heard of that…

The thumbnail is part of the EXIF metadata. It's reasonably well-known now, but it was not always the case... http://graphicssoft.about.com/b/2003/07/26/techtvs-cat-schwa...

Huh, interesting. Thanks!

A few years ago a TV presenter got caught out by this, she posted a cropped photo of herself online, but the original, uncropped version could be recovered from the file which showed her topless.

There was a trend for a while of people posting images to imgur that had a totally different thumbnail. A rickroll of sorts.

I believe it's a Photoshop thing, rather than something common to all images. Photoshop in general does unusual things.

See also: http://u88.n24.queensu.ca/exiftool/forum/index.php?topic=451...

You can use 'exiftool' on Linux to manipulate the EXIF data, get rid of the thumbnail, or set it to whatever you want. Imagemagick's 'convert' command also has a '-strip' option that will get rid of all EXIF data including any thumbnails.

You don't have to go all the way back to the early 2000s to find poorly redacted federal documents. Rachel Maddow had a great segment on botched redactions, like removing the black lines from a PDF.

The best solution was to actually cut out the parts of the page you don't want seen and then scan in the result. All other methods are prone to mistakes.

Edit: corrected link to segment http://www.msnbc.com/rachel-maddow-show/watch/redaction-rule...

Along the same lines, I remember an MIT Mystery Hunt puzzle once where there were hidden clues in a PostScript document. If you didn't think to look at the PostScript source you probably had no hope of solving it.

Would a screenshot work?

Exactly my thought. A BMP is too simple to go wrong. There's so many pitfalls with digital files, because they can have much more information than the viewer will show you in an obvious manner or at all. Not only layers, but metadata (EXIF) and whatnot.

I've always drawn over sensitive parts in Paint or something, then taken a screenshot of it. It makes sure that everything is compressed to a single layer and can't be undone.

Just be sure to get all of it. It's pretty common that people leave a few pixels. How not to do it: https://imgur.com/ZaCIY5P

Leave generous margins. JPEGs can have compression artifacts that leak information outside the boundaries of the object.

It shouldn't matter if the original image (the screenshot) didn't contain any information to leak.

For a fun variant of this problem, see the 2008 Underhanded C Contest: http://underhanded.xcott.com/?page_id=17

Also great: HTML pages with sensitive passages using #000 for background and foreground. You just need to select and copy to Notepad - voila.

I think I once saw that on the site of a home security company, citing an email from customers who thanked them for burglarproofing their home in XX XXXXXXXX Street. Quite ironic.

There is also "Leaking Sensitive Information in Complex Document Files--and How to Prevent It" from IEEE Security & Privacy earlier this year. All copies might be behind paywalls, however.

In some cases, coloring over parts of the image might still not be enough. Specifically, when all of the following are true: (i) the domain of possible entries is reasonably small (e.g., a number or a name and surname), (ii) the text is printed in a proportional (not fixed-width) typeface, and (iii) enough of the rest of the line is visible to infer the font size and kerning settings.

I think this form of analysis is amazing. It breaks the "The best solution was to actually cut out the parts of the page you don't want seen and then scan in the result." approach from elsewhere in this thread as well. Even cutting out each portion of each line isn't enough because you can reconstruct line sizes. You could cut out each individual word and string them together, with a single size "redaction" block perhaps, but that's a lot of work.

>You could cut out each individual word and string them together, with a single size "redaction" block perhaps, but that's a lot of work.

Not really. OCR is plenty good enough to scan a document and then you can just replace anything you wanted redacted with a common string: [REDACTED]

Even if OCR wasn't an option, it would still (probably) be easier to type up the document than it would be to physically cut out all the things you want to redact.

Actually, using fixed-width would be even easier since it reveals the length.

I'd be curious to see some numbers backing one or the other, but I suspect proportional would be better. Fixed width narrows to a number of characters. So it depends how many possible values have x characters. Proportional narrows to a particular width, which could be more granular than character count. So it depends how many possible values turn out to be exactly x pixels across given the type settings.

Proportional is easier to try to decode. See http://cryptome.org/cia-decrypt.htm for an example from 2004 about inferring words from a redacted US document.

From the Le Monde article: The space occupied by an "I" differs from that taken by "W," which can give additional clues, compared to the text-spacing known as "monospace," like that of often used in e-mailwhere all the letters have the same spacing.

From the NYT article: In January, the State Department required that its documents use a more modern font, Times New Roman, instead of Courier, Mr. Naccache said. Because Courier is a monospace font, in which all letters are of the same width, it is harder to decipher with the computer technique. There is no indication that the State Department knew that.

It doesn't really matter how many values turn out to be exactly x pixels across given the type settings, you still have to do an exhaustive search on different numbers of characters to find them. That problem is simplified by a fixed width font.

In general, though, a proportional face will give a larger number of possible widths, meaning more information to work with (fewer possible strings will map to any particular width).

Additionally, depending on the type of information you are trying to recover, you may already know the number of characters (for example, a US SSN), in which case fixed-width typefaces mean you get no information you didn't already have, whereas a proportional one would at least give you a little bit of info. But on the other hard I would also guess fixed-length strings such as an SSN are more likely to appear in a more "table"-like part of a document than in the middle of a paragraph of text.

I suspect it'd still be a bloody hard analysis either way, but at least according to this back-of-the-envelope theory a proportional typeface should preserve more information.

Oh yeah, in the case where you already know the number of characters, a proportional case font definitely gives you more information. That's a good point. I was thinking more of the general case where any number of characters might be redacted from a document.

Better yet: change the text to "you think you're clever, don't you" and then blur the image.

It's a little easter egg for the people with an unblur plugin.

Has this attack actually been proved possible? He writes that he thinks it should work but doesn't have the time or inclination to prove it. If anyone wants to take a stab at it, I'll gladly submit a mosaiced photo of my credit card because I don't think the attack is practical. If you crack it, you're free to keep whatever you can get. :)

In North America, all the checks (cheques) use the same fonts. See http://en.wikipedia.org/wiki/Magnetic_ink_character_recognit...

You have to know the original font the numbers were in, and the original algorithm or program used to do the blurring. Then it's just a matter of brute forcing it.

Using a 'smear' brush would be a more secure way to blur, there are various sizes and you introduce some randomness in the path of your mouse/stylus when blurring, and you can go over the numbers multiple times. Blurring to much higher factors is a better idea but it is simpler to just use black bars.

You have to be careful doing this. If your blur tool happens to be of the variety that conserves total brightness (i.e. the blurring is accomplished by 2D convolution with a kernel that has a volume of 1, so sum of all pixel values remains constant before and after blurring), it can still be dictionary attacked pretty easily no matter which random ways you blur, since each digit in the cheque font probably has a different amount of total black area, and your blurring preserves that information perfectly.

If you blur tool is of the variety that doesn't conserve total brightness, such as most "smudge" tools, and you use human randomness to blur it, then it would probably be pretty hard to reverse.

I still recommend cutting out the sensitive information rather than blurring it, just to be safe. Also, leave a generous margin, (1) to avoid giving information about length, and (2) because lossy compressors may have left tiny artifacts of the sensitive information in the areas around it.

Most people want to blur because they think it makes their photo flow better than gaudy black highlights, but you can also use highlights that match the background color/image and make it blend it in, leaving a big white space on a white background instead of a big black space. Most people won't notice anything is missing at all.

If you want that pixel mosaic look for extra futuristic feel, remove the original content by making it fade into background, replace with new, irrelevant content, and then pixelize.

I entered an underhanded contest one year where the challenge was recoverable-but-correct-looking redaction of jpegs. I used an insecure random seed based on the time (you could brute force an unredacted image based on the rough time it was generated). The winning solutions were more clever

You (and I) were among the many which implemented that exact technique for http://underhanded.xcott.com/?page_id=17 . :)

Not really. You would need to have a copy of an uncovered version to know how many lines of text, font size, kerning et al.

Some documents are trivially easy to get near copies.

Other documents provide their own context - they have a lot of other text - that you can use to get font, font size and spacing and etc.

Wouldn't cutting that hidden text out entirely with a photo editing tool be better? I've heard people talking about recovering text f. Well, in the case of edges sticking out of the black bar (this happens a lot for people who aren't careful) like using paint, I believe there is a chance to recover partial, if not, the entire content back.

Also, doesn't adding a black bar on top of a text means just adding more bytes to the file, instead of removing the bytes belonging to the now hidden text?

I think a lot of people are unaware of how easily you can achieve blind deconvolution on many images blurred with most blur algorithms and even real-world blurring effects (including motion, out-of-focus, etc blurs).

The results won't be perfect, but they are usually close enough to see much of the detail that appeared to be lost.

I never use blur to obscure sensitive information; black that shit out (and then also make sure you aren't saving it as metadata or in a layer) or just replace it with fake data.

http://blog.mailgun.com/open-sourcing-our-email-signature-pa... (https://news.ycombinator.com/item?id=8081532) has some screenshots with blurred email addresses you can read without fancy deciphering.

In some cases even blurring faces might be a bad idea. Just because we are unable to unblur a face today doesn't mean we are unable in 10 years or 100 years. In many cases this might not be a problem but in some cases this might lead to trouble later on which can be avoided just as easily.

You cannot unblur mosaic blurring. It's the equivalent of a hash. The best you can do is brute force possible input vectors. There will be many collisions. This technique only works because digits/numbers limit the input space for a credit card or bank number. For faces, the best you can do is validate if someone you already suspect or someone you have in a database, is the origin of the mosaic. If you had a picture of every person in your country, you could run them all through and find the origin - but you're not really unblurring as much as extracting information to do process of elimination on a 'image hash'

I can imagine you can computationally unblur mosaic on video, if the object (e.g. face) is not changing, just the camera is moving around a little.

You can in fact do this, it is a technique known as Super-Resolution and has been around for at least a decade. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=5...

The most precise maps of Pluto were made with a similar technique: http://www.boulder.swri.edu/~buie/pluto/mapstory.html

That's really cool. I wonder if that could be used to get images of other planets.

gus_massa posted a link about Pluto:


Maybe the scientifically correct definition of "to unblur" is reverting the process directly by applying a mathematical algorithm. In a normal conversation like what we have here, bruteforcing a good enough result can also be considered "to unblur" because for the person it's the same result: Everybody knows who it is.

The un prefix means to reverse. If you have no database of people you're not going to be able to do anything in the way of reversing the mosaic. It's not a reversal as much as it is a heuristic brute force.

You can't use the brute force method to find any "person" you have in the database. You can only find any "photograph" that you have in the database.

If you have a photograph of me, and I just let my hair and beard grow for a month, then you won't match the mosaic version of my hairy version with the old photograph.

Perhaps you can add some filters to adapt the contrast, luminosity, hue, illumination, rotation, pixel shifts and noise. It would be more difficult to fix the head orientation, open/close mouth/eyes.

Imagine a future where 3D depth-sensitive cameras are prevalent - this means that a 3D model of a person's head can be easily obtained.

Then futuristic analysis software that, from a picture, can determine all the light-sources in that image and 3D positions thereof.

So take a photo, analyze it for light-sources, then brute-force it by applying 3D models of every face in your database with the known light-sources and run it through the mosaic filter.

Futuristic technology could help here. Probably wouldn't work since I'm sure one person's mosaic would match too closely somebody with similar features and a similarly-shaped head.

i agree with erikb.. In theory at least, it seems possible to me that unbluring mosaic could happen if you have in multiple mosaic images from different angles (like mosaic in a video) even without a database with the possible faces).

Even with that, the amount of information is reduced from 1000s of pixels to less than 100. It's possible to extract some information but not enough to do a reversal

If you have trial faces at the right angle, you can unblur faces, it's just a bit harder to generate. But nothing major is going to change in 10 or 100 years; it's just information theory as to whether you can identify the person/number or not.

Indeed. See http://en.wikipedia.org/wiki/Christopher_Paul_Neil

Swirl is not a lossy process. If he had blurred his face he would have been much harder to catch.

It's partially lossy, from the look of it (you can see circular artifacts), but you're right that enough is recoverable to make it meaningless in terms of hiding data.

Web archive version: https://web.archive.org/web/20140714183916/http://dheera.net...

Where did they find such a crappy font for body? SimHei for Latin letters looks absolutely horrendous with disabled font blurring.

They advise you not to blur, yet require blurring for comfortable reading? How ironic!

And when you colour in the picture always use a pen of 100% opacity or the colour can be removed from the image to reveal the data underneath!

I'll just leave this here:


do those pixel blocks actually represent accumulated samples from the image?

any number of very obvious methods can be used to avoid this besides using a black box...


Are you a professional cryptographer? Can you prove that your random shuffling is impossible to extract sensitive data from? No? Then you better stick with completely deleting the data from the image entirely.

Brings back nausea from arguments past, that went something like this (dramatized for your pleasure):

Me: Your protocol has a serious size-side channel that leaks all the important data as sizes. Please use a constant length encoding.

Duh: Thanks! I added RANDOM padding. Totally secure now(tm).

Me: Your random was rand(), I recovered the LCG state from a few packets with known sizes and then recovered the original sizes, its still totally insecure. /Please/ make it constant size.

Duh: I made the random better and got rid of the known length packets. Now its extra completely secure.

Me: This will just take more statistical analysis to break, please just make it constant— the overhead is negligible! This is critical and anything short of constant is leaking information. We can't make assumptions about how powerful the attacker's statistical reasoning is, so even a small leak could be fatal.

Duh: I tried for two hours and couldn't break it. You're wasting my time.

Me: Argh. After a week of analysis, I've created this sampling and averaging script which completely recovers the secret data. Please. Just. Make. The. Encoding. Constant. Length.

Duh: Oh come on, that requires the same user to use it four times in a row. But fine, I now also quantize the size to a multiple of 2. The script you gave me no longer works, so now it's secure.

Me: <jumps off building>

The adage that anyone can make a cryptosystem he himself can't break— should have a sister rule: Most people can make a cryptosystem which isn't cost effective to review by an honest party but which may be very economical to attack once it's protecting something of value.

Don't forget that Duh also lights up the twitter and the hubs with their SuperSecretCoderRing. On the internet anyone can pretend to do brain surgery.

why not just use a solid color instead?

A nice Gaussian blur would probably be fine, it's specifically the pixelation technique that's leaking data.

It will be worse than that. Pixellation destroys information, it's not reversible in general. But in this case, what is left is enough for recovery, as source domain (check number) is small. Gaussian blur is theoretically lossless (rounding loss and loss on image edges in practice) - you can take arbitrary image, blur it, reverse and get original+noise

Thanks for making me look deeper into what's going on with common blurring algorithms. My instincts were clearly wrong here, for years I thought reversing blurs wasn't feasible in practice. I was very, very wrong.

Unknown Partial Knowledge is a huge problem in human cognition.

This blew my mind, https://www.youtube.com/watch?v=tyHBnT4PmTE

This is less surprising to me than the gaussian reversal, because this motion-deblurring technique relies on several images of the same object. It's still cool, though :)

Gaussian blur is a 100% reversible! You just sharpen using the same Gaussian settings for a Gaussian sharpen. It's only due to rounding to 8bit per colour that causes it to loose any information.

Isn't that still forwardly deterministic i.e. you can apply the same algorithm to it but at a higher resolution?

Yes it is reversible for practical purposes. There is some data loss on edges, which will manifest as noise on recovered image.

You think the technique mentioned in the article doesn't leak data? It actually leaks more data than a gaussian blur.

Gaussian blur suffers from exactly the same problem, although a different difference function is needed for it.

Gaussian blur is equivalent to an auto key cipher [0] without a key. Each pixel is composed of a couple pixels from the original. With some simple algebra, cross-examining pixels that have values determined by the same origin pixel, you can reverse the operation.

[0] http://en.m.wikipedia.org/wiki/Autokey_cipher

I see that I deserved a schooling on gaussian blur, but

  > You think the technique mentioned in the article doesn't leak data? 
yes, I do think it leaks data, which was the correct half of my comment, actually.

Not that it matters now, but I always assumed blurring distorted the data in way way that would make it hard to recover in practice.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact