
Why blurring sensitive information is a bad idea (2007) - Schiphol
http://dheera.net/projects/blur
======
bithush
I feel this is also very relevant
[http://en.wikipedia.org/wiki/Christopher_Paul_Neil](http://en.wikipedia.org/wiki/Christopher_Paul_Neil)

Police took a photo with a "swirl" effect of a paedophiles face and reversed
it to reveal a very usable picture. So good in fact he was found and arrested.

~~~
danepowell
He wasn't just a pedophile, he was a child molester.

~~~
frogpelt
"just a pedophile"

A phrase I never thought I'd see.

~~~
izzydata
Someone who is homosexual isn't any more inclined to molest or rape someone of
their own sex than a pedophile is to molest or rape a child.

More on topic about censoring important information. I'm not entirely sure
about this, but I feel like I've seen images where the thumbnail was retained
from a previous version. Even at 32x32 there might be some way to expand it
and gather some kind of information from its pixelated form even after the
actual image was censored.

~~~
g887nt
Your pedophile / homosexual comparison seems to imply there's such a thing as
consensual sex with a child??

A pedophile is much more likely to molest a child, because that's the only way
their desires can be realized in the physical world. A homosexual could, I
dunno, go have consensual gay sex? That's why people find this comparison
offensive.

~~~
duaneb
I think the distinction is between what a pedophile finds themselves drawn to
and what they do. And, I might add, there are many ways for people to find
gratification without actually having sex, or in fact ethically needing the
consent of anyone. Also, pedophiles are absolutely comparable to
homosexuals—both are individuals classified for their sexuality, regardless of
how they express it or wish to be classified.

(To clarify, I absolutely to not condone any interactions whatsoever to do
with pedophilia... but come on, people, let's not be blind here to the
existence of people with full self control, ethical behavior, and private
thoughts and desires.)

~~~
alaoiigha
> To clarify, I absolutely to not condone any interactions whatsoever to do
> with pedophilia... but come on, people, let's not be blind here to the
> existence of people with full self control, ethical behavior, and private
> thoughts and desires.

Completely agree. Sad to think that there are people out there that have done
nothing wrong, and cannot control their desires, and yet people (like many in
this thread) would judge them without a second thought.

Lets hope none of them commit adultery by simply desiring another person. /s

------
mxfh
So if something is 1337 days old it gets autoreposted? Could live with that.

Previous discussion:
[https://news.ycombinator.com/item?id=1939607](https://news.ycombinator.com/item?id=1939607)

A bit more precise this post is even older, and was first discussed in 2007
prominently in these two places:
[http://www.reddit.com/comments/xaae/how_to_extract_personal_...](http://www.reddit.com/comments/xaae/how_to_extract_personal_information_and_account/cxbgy)

[https://www.schneier.com/blog/archives/2007/01/how_to_recove...](https://www.schneier.com/blog/archives/2007/01/how_to_recover.html)

(please refrain from responding with XKCD references, I'm aware of that, just
want to link to older discussions)

~~~
vog
My favourite comment from the old discussion came from andrewreds:

 _> WHAT... why would you completely black out the number, where you could
instead use random coloured squares, that look like it is a blurring, so
someone can go through all the effort, decoding your white noise, and thinking
in the end they have your number... when they don't ;)_

[https://news.ycombinator.com/item?id=1939807](https://news.ycombinator.com/item?id=1939807)

------
nnnnni
Their answer is to "color over it", but BE CAREFUL if you do that. I'm sure
that some of us remember the US government document(s) in the late 90s/early
2000s that were released in redacted form, but the person didn't
realize/understand that people with the Acrobat editor program could remove
the black bars.

~~~
shinigami
Also, image formats like JPEG have a thumbnail stored in them, which may not
be updated when you edit the image.

~~~
robin_reala
Do they? First I’ve heard of that…

~~~
akadruid1
The thumbnail is part of the EXIF metadata. It's reasonably well-known now,
but it was not always the case...
[http://graphicssoft.about.com/b/2003/07/26/techtvs-cat-
schwa...](http://graphicssoft.about.com/b/2003/07/26/techtvs-cat-schwartz-
exposed-is-photoshop-to-blame.htm)

~~~
robin_reala
Huh, interesting. Thanks!

------
nathell
In some cases, coloring over parts of the image might still not be enough.
Specifically, when all of the following are true: (i) the domain of possible
entries is reasonably small (e.g., a number or a name and surname), (ii) the
text is printed in a proportional (not fixed-width) typeface, and (iii) enough
of the rest of the line is visible to infer the font size and kerning
settings.

~~~
ColinDabritz
I think this form of analysis is amazing. It breaks the "The best solution was
to actually cut out the parts of the page you don't want seen and then scan in
the result." approach from elsewhere in this thread as well. Even cutting out
each portion of each line isn't enough because you can reconstruct line sizes.
You could cut out each individual word and string them together, with a single
size "redaction" block perhaps, but that's a lot of work.

~~~
tetrep
>You could cut out each individual word and string them together, with a
single size "redaction" block perhaps, but that's a lot of work.

Not really. OCR is plenty good enough to scan a document and then you can just
replace anything you wanted redacted with a common string: [REDACTED]

Even if OCR wasn't an option, it would still (probably) be easier to type up
the document than it would be to physically cut out all the things you want to
redact.

------
IvyMike
Better yet: change the text to "you think you're clever, don't you" and then
blur the image.

It's a little easter egg for the people with an unblur plugin.

------
bjourne
Has this attack actually been proved possible? He writes that he _thinks_ it
should work but doesn't have the time or inclination to prove it. If anyone
wants to take a stab at it, I'll gladly submit a mosaiced photo of my credit
card because I don't think the attack is practical. If you crack it, you're
free to keep whatever you can get. :)

~~~
bluedino
You have to know the original font the numbers were in, and the original
algorithm or program used to do the blurring. Then it's just a matter of brute
forcing it.

Using a 'smear' brush would be a more secure way to blur, there are various
sizes and you introduce some randomness in the path of your mouse/stylus when
blurring, and you can go over the numbers multiple times. Blurring to much
higher factors is a better idea but it is simpler to just use black bars.

~~~
dheera
You have to be careful doing this. If your blur tool happens to be of the
variety that conserves total brightness (i.e. the blurring is accomplished by
2D convolution with a kernel that has a volume of 1, so sum of all pixel
values remains constant before and after blurring), it can still be dictionary
attacked pretty easily no matter which random ways you blur, since each digit
in the cheque font probably has a different amount of total black area, and
your blurring preserves that information perfectly.

If you blur tool is of the variety that doesn't conserve total brightness,
such as most "smudge" tools, and you use human randomness to blur it, then it
would probably be pretty hard to reverse.

I still recommend cutting out the sensitive information rather than blurring
it, just to be safe. Also, leave a generous margin, (1) to avoid giving
information about length, and (2) because lossy compressors may have left tiny
artifacts of the sensitive information in the areas around it.

------
cookiecaper
Most people want to blur because they think it makes their photo flow better
than gaudy black highlights, but you can also use highlights that match the
background color/image and make it blend it in, leaving a big white space on a
white background instead of a big black space. Most people won't notice
anything is missing at all.

If you want that pixel mosaic look for extra futuristic feel, remove the
original content by making it fade into background, replace with new,
irrelevant content, and then pixelize.

------
wcummings
I entered an underhanded contest one year where the challenge was recoverable-
but-correct-looking redaction of jpegs. I used an insecure random seed based
on the time (you could brute force an unredacted image based on the rough time
it was generated). The winning solutions were more clever

~~~
dalke
You (and I) were among the many which implemented that exact technique for
[http://underhanded.xcott.com/?page_id=17](http://underhanded.xcott.com/?page_id=17)
. :)

------
hawleyal
Not really. You would need to have a copy of an uncovered version to know how
many lines of text, font size, kerning et al.

~~~
DanBC
Some documents are trivially easy to get near copies.

Other documents provide their own context - they have a lot of other text -
that you can use to get font, font size and spacing and etc.

------
yeukhon
Wouldn't cutting that hidden text out entirely with a photo editing tool be
better? I've heard people talking about recovering text f. Well, in the case
of edges sticking out of the black bar (this happens a lot for people who
aren't careful) like using paint, I believe there is a chance to recover
partial, if not, the entire content back.

Also, doesn't adding a black bar on top of a text means just adding more bytes
to the file, instead of removing the bytes belonging to the now hidden text?

------
georgemcbay
I think a lot of people are unaware of how easily you can achieve blind
deconvolution on many images blurred with most blur algorithms and even real-
world blurring effects (including motion, out-of-focus, etc blurs).

The results won't be perfect, but they are usually close enough to see much of
the detail that appeared to be lost.

I never use blur to obscure sensitive information; black that shit out (and
then also make sure you aren't saving it as metadata or in a layer) or just
replace it with fake data.

------
JoshTheGeek
[http://blog.mailgun.com/open-sourcing-our-email-signature-
pa...](http://blog.mailgun.com/open-sourcing-our-email-signature-parsing-
library/)
([https://news.ycombinator.com/item?id=8081532](https://news.ycombinator.com/item?id=8081532))
has some screenshots with blurred email addresses you can read without fancy
deciphering.

------
erikb
In some cases even blurring faces might be a bad idea. Just because we are
unable to unblur a face today doesn't mean we are unable in 10 years or 100
years. In many cases this might not be a problem but in some cases this might
lead to trouble later on which can be avoided just as easily.

~~~
goldenkey
You cannot unblur mosaic blurring. It's the equivalent of a hash. The best you
can do is brute force possible input vectors. There will be many collisions.
This technique only works because digits/numbers limit the input space for a
credit card or bank number. For faces, the best you can do is validate if
someone you already suspect or someone you have in a database, is the origin
of the mosaic. If you had a picture of every person in your country, you could
run them all through and find the origin - but you're not really unblurring as
much as extracting information to do process of elimination on a 'image hash'

~~~
erikb
Maybe the scientifically correct definition of "to unblur" is reverting the
process directly by applying a mathematical algorithm. In a normal
conversation like what we have here, bruteforcing a good enough result can
also be considered "to unblur" because for the person it's the same result:
Everybody knows who it is.

~~~
goldenkey
The un prefix means to reverse. If you have no database of people you're not
going to be able to do anything in the way of reversing the mosaic. It's not a
reversal as much as it is a heuristic brute force.

------
Houshalter
Web archive version:
[https://web.archive.org/web/20140714183916/http://dheera.net...](https://web.archive.org/web/20140714183916/http://dheera.net/projects/blur)

------
cnst
Where did they find such a crappy font for body? SimHei for Latin letters
looks absolutely horrendous with disabled font blurring.

They advise you not to blur, yet require blurring for comfortable reading? How
ironic!

------
tshadwell
And when you colour in the picture always use a pen of 100% opacity or the
colour can be removed from the image to reveal the data underneath!

------
ohazi
I'll just leave this here:

[http://refocus-it.sourceforge.net/](http://refocus-it.sourceforge.net/)

------
jheriko
do those pixel blocks actually represent accumulated samples from the image?

any number of very obvious methods can be used to avoid this besides using a
black box...

------
Udo
A nice Gaussian blur would probably be fine, it's specifically the pixelation
technique that's leaking data.

~~~
ekr
You think the technique mentioned in the article doesn't leak data? It
actually leaks more data than a gaussian blur.

Gaussian blur suffers from exactly the same problem, although a different
difference function is needed for it.

~~~
goldenkey
Gaussian blur is equivalent to an auto key cipher [0] without a key. Each
pixel is composed of a couple pixels from the original. With some simple
algebra, cross-examining pixels that have values determined by the same origin
pixel, you can reverse the operation.

[0]
[http://en.m.wikipedia.org/wiki/Autokey_cipher](http://en.m.wikipedia.org/wiki/Autokey_cipher)

