This is standard procedure...the emails go through a vetting process to redact things that are not meant to be released to the public. This can mean anything from classified information to private information that was sent to an official (medical records, personal information in a forwarded application, etc). The PDF workflow is probably the most efficient way to do this...I don't mean efficient in terms of what programmers think of as efficient...I mean efficient in that the vetting/censoring process involves a good number of officials who are reviewing the documents which probably involves some back-and-forth, for which paper is a pretty decent medium for. Also, not all/many of these officials are versed in digital editing workflows [1]. I think in some situations, the emails are printed, physically redacted (i.e. black marker), then scanned and converted to PDF. Keep in mind that many requesters of government documents are perfectly fine with printable documents, even if such documents are not suited for machine parsing.
[1] Government officials have been burned before when using what they thought was a PDF redaction tool...i.e. using the box-drawing-tool and drawing a black box...was not actually redaction...Governor Blagojevich's case comes to mind: http://www.wbez.org/blog/update-blagojevich-lawyers-created-...
How is it not easier to programmatically find/replace specific names/emails and replace with (redacted).
In this day and age it takes a lot more work to print, manually redact and scan than it does to digitally redact permanently. Your story about redacting PDFs is interesting these were not originally PDFs, they were emails.
I can't see any other reason for delivering documents as printed/scanned PDF than to make it more difficult to analyse considering they were originally not paper or PDF. Occams Razor applies here.
> How is it not easier to programmatically find/replace specific names/emails and replace with (redacted).
Because this isn't possible. How are you going to account for any possible misspellings, accidental spacing or other unknown unknowns? You can't. To properly disseminate things the documents have to run through actual people. Sure, people fuck up things too but they can be much more precise when it comes to processing real, human language than any computer right now.
This is the process the government takes. Convert to PDF, redact, release.
> I can't see any other reason for delivering documents as printed/scanned PDF than to make it more difficult to analyse considering they were originally not paper or PDF. Occams Razor applies here.
You are misusing the meaning of occam's razor. This is actually the easiest, most correct way to disseminate possibly sensitive information from the government that they have available. It sucks. If you can write something better then do it; you can make a ton of money and save the government a whole lot of time. But it better be really, really accurate.
How sure would you be of a tool that let you do this?
"Flattening" the document through a paper medium offers a benefit of reducing the digital trail to the tools used to convert from paper to PDF. Unintended data leaks are minimized.
You're preaching to the wrong person...I think the world would be a better place if everyone could grep/sed/awk their way through textfiles. That said, imagine the agency's situation: They have thousands of documents to review before releasing. Many/most of these emails are releasable as is. But some potentially contain classified information that cannot be known a priori...i.e. it's not as simple as doing a grep for "CLASSIFIED" or "OSAMA". In fact, one of the current controversies with the Clinton email is that they're finding information that should have been treated as classified, but at the time of sending, was not...the conditions for classification aren't often based off of keywords, but on the context and who the sender is.
Occam's Razor falls on the side of the print-to-paper-then-scan-to-PDFs. When you print an email or Word document to paper, you have effectively destroyed all non-visible metadata. When you use a Sharpie to rub out a name, you can be sure that mark is going to propagate to the electronic scan. You have none of those assurances if you work electronically...and imagine being someone who _isn't_ a programmer. Not too long ago the NSA was clusterfucked by a low-level employee who ran wget without anyone noticing. If you're the non-programming bureaucrat in charge of this bespoke redaction process, software seems very arcane and insecure...paper sounds pretty nice in comparison.
But yeah, I do wonder about the amount of human labor and late nights that go into this. But apparently, a lot of our legal system has revolved around legions of paralegals and lawyers sifting through uncountable volumes of paper...and a lot of the people doing the current redaction probably came from that field of work.
I meant Occam's razor as a way of explaining why they were handed over as paper - the simplest explanation being they were intentionally making it harder.
I get why the agency might print it out- but I don't get why the Clinton staffers would print them out. If the government is going to subpoena or raid me, they're just taking my computer. Sure, if it gets to public release, then they might print/redact. But they sure are not going to wait for me to print out stuff and hand it over.
Given the original 'two phones are too difficult' excuse for having the private emails, I can't see how printing out the paper and handing that over can be seen as anything but deliberate delaying. Unless I'm missing something here, it's not like the Clinton staff redacted the mails themselves - they just deleted the ones they didn't want to hand over.
If the agency staff are redacting for public release, then that is a different thing altogether, they could have done that from the original emails.
[1] Government officials have been burned before when using what they thought was a PDF redaction tool...i.e. using the box-drawing-tool and drawing a black box...was not actually redaction...Governor Blagojevich's case comes to mind: http://www.wbez.org/blog/update-blagojevich-lawyers-created-...