
TSA redaction fail: hidden text easily readable via copy & paste - anigbrowl
http://www.wanderingaramean.com/2009/12/tsa-makes-another-stupid-move.html
======
cduan
It's helpful to understand a little bit of how a PDF file works.

A PDF file stores a sequence of instructions for what do draw on a page. The
instructions are commands, such as "write this text here" or "draw a circle
there." When you draw black redaction boxes, you are just appending an
instruction to the end of the list ("draw a box here"), but most of the time
the previous instructions are all still there.

Really, the only way to redact a PDF with certainty, short of manually reading
the file itself (yes, PDF files are text-readable; you just have to decompress
them) is to rasterize the file to a large image. Otherwise, the instructions
for writing the text are all still in the file, and there are any number of
ways to extract them (for example, simply by deleting the instruction to draw
the redaction box).

~~~
anigbrowl
Mrs Browl works at a e-discovery company where documents are often redacted
depending on which bits are admissible in court; I gather they use TIFF files
for exactly the reasons you describe.

I suppose the (extremely limited) redaction features in Acrobat stem from the
belief that the document is ultimately destined for printing rather than
computer reading.

~~~
pasbesoin
I have no personal experience and at best perhaps a vague recollection of
having read or heard something, but would document images be or in the past
have been used also to make discovery as inefficient as permitted for the
opposition? Images require a person to analyze, or pre-processing before
electronic search and analysis can be performed against them.

~~~
anigbrowl
OCR is very good nowadays, so that's not so big a problem - and for big
lawsuits (eg patent fights) there are literally hundreds of lawyers hired to
comb through the vast array of documents deciding whether they're responsive
or not.

The pre-processing is largely automated but there's certainly a portion (maybe
5%?) of documents that need to be hand-classified in the database before they
get to the lawyers. It an interesting field - lots of money to be made,
intense competition for it, relatively simple technology requirements but a
legal industry which has been resistant to technology for quite a long time.
Autonomy seems to be the leading software company in this space.

~~~
pmorici
I can't speak for their other products but in my experience their enterprise
search product is a useless piece of crap.

------
wgj
_Want to know which twelve passports will instantly get you shunted over for
secondary screening, simply by showing them to the ID-checking agent?_

Cuba, Iran, North Korea, Libya, Syria, Sudan, Afghanistan, Lebanon, Somalia,
Iraq, Yemen, or Algeria

~~~
staunch
That's such an obvious list I'm not sure if it's from the document itself, or
if you just wrote down what it would probably be.

~~~
endtime
I was a little surprised about Cuba. They're not our favorite government, but
I don't see them as a security risk.

~~~
oiuytgfhn
Are you serious!!!!

America has been living under the shadow of imminent Cuban aggression for
50years. There have been Cuban attempts to invade America and numerous
assassination attempts on the US president by Cuban agents.

And their chemical weapons kill 1000s of Americans after being secretly
delivered from Canada.

~~~
anigbrowl
Apparently your sense of humor is too subtle for some other HNers!

~~~
Beanblabber
Or perhaps a bit too blunt.

------
chaosmachine
This isn't the first time this has happened. Several years back, another
government agency made the same mistake. On a slow computer, the text would
render first, and could be seen for several seconds before the black boxes
would appear.

Googling around, it seems to be a fairly common mistake, going back to at
least 2000.

<http://www.securityfocus.com/news/7272>

<http://cryptome.org/iran-cia/cia-iran-pdf.htm>

<http://blogs.zdnet.com/BTL/?p=12907>

~~~
pasbesoin
In the same vein, if not the same format: Distributing a Word document from
which you've not removed change tracking (and/or other metadata). (Or whatever
Word calls it; it's been a while.) I had to correct my teammates on this one,
a while back (and yes, the documents were going to external clients). Nothing
novel; the problem's been in the news repeatedly. Nonetheless, people -- even
"technical" -- still don't get it right.

(I encouraged them to go further and switch to PDF format for the
distribution, but they wouldn't.)

------
zitterbewegung
This should be required reading on redaction in the government.
<http://www.fas.org/sgp/othergov/dod/nsa-redact.pdf>

~~~
anigbrowl
Nice one. I had to giggle at the ubiquity of the MS Office assistant down in
the corner... _It looks like you're trying to conceal some information from
prying eyes. Would you like help with that?_

------
grinich
Here's a version with the redactions lifted and the areas highlighted.
<http://cryptome.org/tsa-screening.zip>

~~~
clistctrl
it would be really interesting if the government attempts to go after you for
hosting that document.

~~~
carbon8
That's actually hosted by cryptome: <http://en.wikipedia.org/wiki/Cryptome>

------
rms
On Scribd: [http://www.scribd.com/doc/23757640/Screening-Management-
SOP-...](http://www.scribd.com/doc/23757640/Screening-Management-SOP-Redacted)

~~~
sage_joch
It's a lengthy document. Luckily, they've highlighted the areas of interest.

~~~
pmorici
They aren't all that remarkable.

------
keenerd
A responsible citizen would not google the following:

"SENSITIVE SECURITY INFORMATION" site:.gov filetype:pdf

Less than 2000 such PDFs. A very determined and very irresponsible citizen
could have already mirrored all of them.

~~~
skorgu
Well, not from their home IP they wouldn't anyway.

------
noonespecial
I like watching Security Theater as much as the next guy, but this season's
plot is so thin, I'm having trouble suspending disbelief.

------
zacku235
I downloaded it earlier but it seems to have been taken down now. Very
interesting. I bet that will lose some people their jobs - but I think it
definitely highlights an underlying issue within the government bureaucracy
that shows why people getting paid 30k with degrees from the University of
Phoenix shouldn't be entrusted with national security... I wonder how many
security people that passed through...

~~~
pmorici
"bet that will lose some people their jobs"

I doubt it. the more likely outcome is they mandate all their employees attend
document redaction training developed and administered at great tax payer cost
by a government contracting company. It's also quite likely that they use the
incident to justify spending millions to have a government contractor write
document redaction software.

------
ryanwaggoner
Most interesting thing I learned from the document: members of Congress get
special ID cards. It makes sense, I just never thought about it before.

------
sorbus
Using the link in the article, it appears that it's already been taken down
(link leads to a 404 page). However, from the comments in the article, it's
been mirrored at <http://cryptome.org> (at the top of the list, in fact,
filename is tsa-screening.zip, in case that makes it easier to find it).

~~~
anigbrowl
Looking at the dates on the document, it was last revised in May 2008, so it
seems like the horse left the stable some 18 months ago as far as actual bad
guys and spies (who presumably seek out this information more actively than
your or I) are concerned.

------
julio_the_squid
Pretty ridiculous. Why not just make one version for internal distribution,
and a completely separate document for the public?

~~~
jrockway
Because that would cut into your Solitaire time.

------
milkshakes
searchable too. are we feeling safer yet?

------
dhyasama
My sister is an attorney and asked me about software that can catch things
like this, remove metadata, etc. Does anyone have experience with something
along these lines?

------
tlrobinson
Schneier is going to have a field day with this one.

