
The Mueller Report, Searchable and Accessible on the Archive - sohkamyung
https://blog.archive.org/2019/04/26/the-mueller-report-searchable-and-accessible-on-the-archive/
======
muxator
> The government initially released the document in a PDF format which renders
> it like an image, impossible to search. When PDF files are uploaded to the
> Archive, we automatically run them through an Optical Character Recognition
> (OCR) process. This turns those images into text, making it much easier to
> move between sections and search for specific words or phrases. This allows
> journalists and the public to more easily parse through volumes of
> information contained within these massive documents.

Shouldn't an accessible, searchable and digital version of government
documents be the default form in which they are released?

Why do citizens have to resort to the Internet Archive to see such a basic
right fulfilled?

In my country this happens frequently, as well. Too often, documents that
should be accessible and widely publicized are hidden somewhere on the
internet with practically no incoming links and in a form which is the digital
equivalent of a grainy fax.

Sometimes the intention of superficially fulfilling a transparency obligation,
without actually communicating anything is palpable.

~~~
sohkamyung
One possibly is because they want to make sure the redacted sections are
really redacted. You can do a web search and discover that others have tried
to redact sections of PDF documents, only to later discover that people can
uncover the redacted sections via software tools.

Scanning a redacted document is the simplest way of making sure redacted
sections stay redacted as there is no underlying information to reveal.

~~~
ALittleLight
Releasing the plain text is all that's needed. I fail to see how redacting by
overwriting the relevant text is at all reversible - e.g. replace every
redacted word with five X's. Replace multiple redactions with ten X's. That
actually seems more secure than what's currently done, where you can guess at
the words by their length.

~~~
wahern
> Releasing the plain text is all that's needed.

I doubt Mueller's team wrote the report in plain text. They probably used
Microsoft Word and exported a PDF document.

When I'm president, all government agencies will be forced to use LaTeX. Then
we can have both plaintext source, PDFs with actual text, and PDFs that don't
leak redacted material.

~~~
caseysoftware
LaTeX? That will effectively prevent most government agencies from writing
reports.. oh, that was your goal, wasn't it? Well played.

------
yingw787
I really like this OCR version! I did a search for "Manafort" and it not only
tells you where in the docs the keyword pops up with nice location markers,
but also a snippet of surrounding text to give the keyword context. I do wish
there was a low-level cache of commonly used keywords so that some searches
would be faster, but I don't know what archive.org's position is on tracking
user behavior (probably not a fan).

~~~
justinjlynn
One can't be forced to divulge data on users one does not record in the first
place.

------
rayrrr
Thanks to this OCR'ed version, I was able to put an NLP text summarizer of the
document into production yesterday. Check it!
[https://news.ycombinator.com/item?id=19815506](https://news.ycombinator.com/item?id=19815506)

------
WheelsAtLarge
Here are few other options:

Web based and searchable

[https://www.nytimes.com/interactive/2019/04/18/us/politics/m...](https://www.nytimes.com/interactive/2019/04/18/us/politics/mueller-
report-document.html)

Original document has been updated so it can be searched

[https://qz.com/1601873/the-pdf-of-the-mueller-report-has-
bee...](https://qz.com/1601873/the-pdf-of-the-mueller-report-has-been-updated-
to-be-more-accessible/)

------
sm4rk0
Only 7 views since April 22?

------
fxfan
At this point the whole thing feels like Iraq 2.0 and I wonder if I'm getting
suckered by media.

What only worries me as a progressively-disenchanted but firmly and forever a
supporter of Bernie Sanders is that even sanders seems to have been wrong on
this one unlike Iraq, which I would shout from rooftops that he was the only
politician right about. I would (and still do) quote his 'What are you afraid
of' video to all who would listen...

The question is just why?

~~~
rdtsc
I'd be interested in what Chomsky would have to say about it. Manufacturing
Consent seems more relevant than ever.

~~~
colordrops
He's said several things about it, and has trashed the Russiagate conspiracy.
He emphasizes both that the US interferes constantly in other elections, and
also points out that Israeli collusion and interference in the US is orders of
magnitude worse than Russian influence:

[https://www.realclearpolitics.com/video/2019/04/01/noam_chom...](https://www.realclearpolitics.com/video/2019/04/01/noam_chomsky_trump-
russia_collusion_claims_a_joke.html)

~~~
CriticalCathed
I think you're misinterpreting him. He doesn't say that it's not a bad thing,
or that it's a hoax. He simply says that the US does the same thing abroad and
it's hypocritical to not care about that part of it.

~~~
colordrops
No, I don't believe I'm misinterpreting him. He is clearly showing heavy
disdain for the assertion that the Russians had any material effect on US
elections. He is also pointing out that there is the much greater issue of
Israeli collusion that everyone, including yourself, is ignoring when the
Russia collusion narrative is a "joke" (his words). Everyone please watch the
video for yourselves. It's short.

------
mattnewport
The search function seems to be broken. I searched for instances of "Russian
collusion" and waited for two years for the results and it came back with zero
instances of Russian collusion found...

~~~
valine
The word collusion has no meaning in a legal sense. Try searching for
obstruction of justice.

~~~
s_y_n_t_a_x
[self-censored]

~~~
somebehemoth
False.

"The Mueller report does establish that, in fact, members of the Trump
campaign conspired or coordinated with the Russian government in its election
interference activities."

[https://www.nytimes.com/2019/04/25/opinion/mueller-trump-
cam...](https://www.nytimes.com/2019/04/25/opinion/mueller-trump-campaign-
russia-conpiracy-.html)

Edit: Since the comment I am replying to is no longer present and I got
downvotes for my comment: the person claimed there was "no evidence [anyone]
conspired with Russia" so I linked to an article in the NYT by a law professor
that contradicts this claim in detail. No evidence _is_ a false claim.

~~~
zaroth
Note this link points to an OpEd.

At least I believe the NYT isn’t so far gone as to publish such a claim
outside of the editorial section, but I may be wrong.

For a different perspective, read what Glenn Greenwald has to say on the
subject, in a piece titled “Robert Mueller Did Not Merely Reject the Trump-
Russia Conspiracy Theories. He Obliterated Them.” [1]

Edit: I actually took the time to read the editorial. It is a collection, in
my opinion, of dubious claims about standards of evidence being too high as
the reason Mueller and his massive team could not bring a single solitary
charge against Trump or the campaign related to Russian interference in the
election.

Please remember how this all started, with a dirty dossier obtained from the
Russians by the DNC, with a counter-intelligence operation claiming without
evidence the President could be a Russian patsy. And yet here we are, still
entirely unable to expose truth to the lie.

[1] - [https://theintercept.com/2019/04/18/robert-mueller-did-
not-m...](https://theintercept.com/2019/04/18/robert-mueller-did-not-merely-
reject-the-trumprussia-conspiracy-theories-he-obliterated-them/)

~~~
jpfed
>Please remember how this all started, with a dirty dossier

Nope, it began when some foreign diplomats gave us a heads up that George
Papadopoulos was bragging abroad that the Russians were going to help Trump by
releasing dirt on Clinton. That the investigation was predicated on the
dossier is a common misconception. If you are getting news from sources that
get this basic fact wrong, you may wish to downgrade those sources'
credibility in your estimation.

~~~
zaroth
That happened in “late May 2016” right around the time that Fusion GPS hired
Steele.

Note that Steele and Bruce Ohr were already in cahoots for several months by
then, and Steele was already established as an CI with the FBI (a cover he
later blew).

Papadopoulos met Professor Mifsud in London in March, and later Misfud said he
had ties to the Kremlin and he claimed Russia had dirt on Clinton.
Papadopolous sent an email two weeks after the London bar meeting saying
Misfud introduced him to the Russian Ambassador to England and that the
Ambassador might be able to help setup a meeting between Trump and Putin.

It was what Mifsud told Papadopolous that he would later talk about with the
Greek or Australian diplomat that would be reported back to the FBI.

Interestingly, there are reports that Mifsud is actually a western
intelligence officer, and has no links to the Kremlin whatsoever.

~~~
jpfed
Yes, Steele had good relationships with our intelligence community. I don't
see how that bears on the beginning of the investigation.

I haven't seen anything actually credible that Mifsud was western
intelligence.

~~~
zaroth
Chris Steele (an agent of a foreign government) was dead-set on preventing
Trump from becoming President.

He had high level contacts at the FBI and DOJ which he was in contact with
throughout early 2016.

Hopefully one day it will become fully clear what Steele was telling Bruce and
Nellie Ohr, up until the very day before the FBI opened their investigation,
when Nellie met Chris at the Mayflower Hotel on July 29, 2016.

But we do know at least that it was lies, mostly (Russian?) lies, that Steele
was feeding the FBI. And we know certainly those lies had a bearing on the
character and direction of the investigation, and on the methods employed to
attempt to gather evidence (FISA), even if it is not the publically admitted
initial basis for starting the investigation.

Mifsud, for his part, has gone to ground and isn’t taking interviews.

