
Google PDF Search: “not for public release” - webmonkeyuk
https://www.google.com/search?as_q=&as_epq=not+for+public+release&as_oq=&as_eq=&as_nlo=&as_nhi=&lr=&cr=&as_qdr=all&as_sitesearch=&as_occt=any&safe=images&as_filetype=pdf&as_rights=&gws_rd=ssl
======
ErikRogneby
This was one of the most ironic things I found:
[https://s3.amazonaws.com/reviz-
tutorials/The_Pirates_Code.pd...](https://s3.amazonaws.com/reviz-
tutorials/The_Pirates_Code.pdf)

~~~
pp19dd
Trying "Index of /backup" gave me a heart attack.

~~~
Apofis
Holy shit... you weren't kidding. The first result is a BANK for Christ's
sake... man.

------
Forbo
There's a whole collection of this kind of search engine query at Hackers for
Charity:

[http://www.hackersforcharity.org/ghdb/](http://www.hackersforcharity.org/ghdb/)

------
mindcrime
You can find neat stuff by adding site: qualifiers as well, like:

[https://www.google.com/search?q=not+for+public+release+filet...](https://www.google.com/search?q=not+for+public+release+filetype:pdf+site:.mil)

or

[https://www.google.com/search?q=top+secret+filetype:pdf+site...](https://www.google.com/search?q=top+secret+filetype:pdf+site:.gov)

and

[https://www.google.com/search?q="five+eyes"+filetype:pdf+sit...](https://www.google.com/search?q="five+eyes"+filetype:pdf+site:.mil)

etc.

~~~
pakled_engineer
"for official use only" or "U//FOUO" brings up interesting results, the pdf
"U//FOUO Sovereign citizens extremist ideology" by the FBI was a good read so
were all the Interpol recent internal reports about all their weapons that
have been "misplaced" or stolen.

------
oskarth
There are even more "Top Secret" documents.

[https://www.google.com/search?as_q=&as_epq=not+for+public+re...](https://www.google.com/search?as_q=&as_epq=not+for+public+release&as_oq=&as_eq=&as_nlo=&as_nhi=&lr=&cr=&as_qdr=all&as_sitesearch=&as_occt=any&safe=images&as_filetype=pdf&as_rights=&gws_rd=ssl#as_qdr=all&q=%22top+secret%22+filetype:pdf)

(The above is sarcasm).

~~~
rthomas6
One of them was. It was just from 1960 and probably declassified.

------
PeterWhittaker
Much more interesting if you limit the search by time.

Less than a month old? A single screenful, mostly Australian.

------
Mahn
Offtopic but am I the only one seeing this?
[http://i.imgur.com/3pnOXot.png](http://i.imgur.com/3pnOXot.png) I thought
google moved away from that black bar.

~~~
strictnein
I'm seeing it occasionally.

~~~
Thasc
I seem to remember that they were using it to punish out-of-date browsers. But
since I'm getting it with up-to-date Chrome, it doesn't seem to be very well
targeted.

------
maudineormsby
A lot of these are redacted and appear to be FOIA or similar requests that
have been fulfilled.

~~~
maudineormsby
And then again, after looking, some are clearly not.

------
shanemhansen
This one was quite sad. The suicide of an inmate:
[http://www.drc.ohio.gov/public/after_action_castroA643371.pd...](http://www.drc.ohio.gov/public/after_action_castroA643371.pdf)

~~~
pp19dd
Uh. Unless I'm mistaken, that particular inmate pleaded guilty and was
convicted of 937 various counts raised against him, including murder and rape.
He kidnapped three women (in 2002, 2003 and 2004) and kept them imprisoned in
his basement for nearly 11 years during which time he did horrible,
unspeakable things to them.

~~~
deadalus
morality is subjective

~~~
pp19dd
Alright, I'll bite. People like this guy, serial killers, malignant sociopaths
operate outside of society's morality borders that you're talking about. How
can we possibly evaluate them within them?

------
rohan404
Isn't this one supposed to be public information anyways?:
[http://www.oema.us/files/FBI-OFFICES.pdf](http://www.oema.us/files/FBI-
OFFICES.pdf)

~~~
pdxandi
Hmm, it doesn't appear so. The header says, "NOT FOR PUBLIC RELEASE - PUBLIC
SAFETY AGENCY USE ONLY"

------
rusbus
I wonder if those are the sort of links that can leave you on the wrong side
of the Computer Fraud and Abuse act...

~~~
jdavis703
IANAL, but typically CFAA violations revolve around crafting special URLs, as
in a forced browsing attack. Simply following a URL, is AFAIK not (yet) a
crime.

~~~
ryanlol
Intent is key, not the technical approach. If you're intentionally trying to
access files you clearly aren't intended to be accessing, you're probably
guilty of unauthorized access.

------
ck2
Now subtract -"not for public release until"

------
ChuckMcM
Priceless:
[https://www.amherst.edu/system/files/media/1349/Finalproj_s1...](https://www.amherst.edu/system/files/media/1349/Finalproj_s10.pdf)

Kind of makes me want to take Geology there, sounds like a fun place.

------
ikeboy
Not quite the same, but the results for
[https://www.google.com/search?q=Hyperlinking+to+the+Site+fro...](https://www.google.com/search?q=Hyperlinking+to+the+Site+from+any+other+website+without+our+initial+and+ongoing+consent)
are interesting.

------
cvsv
Assuming filetype:docx is even worse?

------
misiti3780
wow:
[http://www.deathpenaltyinfo.org/TENNlethinjec.pdf](http://www.deathpenaltyinfo.org/TENNlethinjec.pdf)

~~~
schoen
The "not for public release" portion of that document (pp. 72-75) is not
included in the PDF.

------
feld
Tennessee execution procedures? lovely

------
ocdtrekkie
This is pretty interesting. One did say "Not for Public Release UNTIL", so
could presumably be intended, but in a lot of cases webmasters probably didn't
think something would be found and indexed by Google wherever they put it. And
were wrong.

~~~
seiji
This is a great example of the house of cards all our network systems are
built on top of.

Imagine this scenario: you maintain a network of web servers, database
servers, file servers, etc. They all combine to generate a large website used
by tens of millions of users every month. One day you are just doing a cursory
look over a certain server, but you see something strange. Someone is logged
in to your server. And they have a Russian IP address.

What do you do? Obviously, the first step is you login to your edge routers
and null route all of Russia. GFTO. Next, you've got an idle session on one
server. What were they doing?

How can you reconstruct what they were doing? bash history? maybe. Network
forensics? Your network probably isn't recording every historical connection
between servers—99.9999% of the time useless—but critical in this case. File
system access? Your file system probably isn't logging every historical
access—useless 99.99999% of the time—but would be really freaking useful in
this case.

So, you investigate their history, double check some database logs, check
netstat, check lsof, and in the end, you really have no idea what they were
doing at all. Our systems don't leave enough bread crums around to reconstruct
even interior hostile activities, much less semi-intelligently disallowing
Google to not index confidential information when accidentally left exposed.

~~~
jonathonf
WRT detecting Google doing indexing, it's actually trivial. Web server logs
will clearly show Google's web spider(s), and if you want you can set some
monitoring (lots of methods here, all the way up from a cron job running a
grep).

I can't remember the quote exactly, but if you're reacting to a breach it's
too late.

~~~
seiji
Obviously this case is detectable, but it's detectable after it happens since
permissions weren't correct in the first place.

Who keeps web logs these days? It's all spyware javascript tracking for pretty
graph printing.

Plus, any notifications depend on actually _instrumenting_ any monitoring or
triggers or processing to even notice your "sensitive" content has been
accessed out of context.

(and this is just web stuff. imagine how impossible it is to track who
forwards your confidential emails or other internal documents around without
your permission.)

~~~
jonathonf
> Who keeps web logs these days? It's all spyware javascript tracking for
> pretty graph printing.

Anyone who needs records of what has been accessed, so larger companies and
organisations.

> Plus, any notifications depend on actually instrumenting any monitoring or
> triggers or processing to even notice your "sensitive" content has been
> accessed out of context.

Yup. Hence a cron job automatically emailing its result (crude (or simple?)
but it would work).

> (and this is just web stuff. imagine how impossible it is to track who
> forwards your confidential emails or other internal documents around without
> your permission.)

I don't have to imagine that. This is why DRM exists; document/knowledge
management systems should have the ability to allow access to information but
not further dissemination. There's still the user education aspect though (and
users don't like change...).

Oh, and the insistence of wanting to using external services like Dropbox...
gah. "But, but, everyone else uses it!"

~~~
seiji
You are technically right on all counts.

But we live in a new world. A world of BYOD and now, in 2015, Bring-Your-Own-
SaaS. Employees put content up on company platforms, on third party platforms,
on high heel platforms.

The problem of solving data privacy at a _competent_ level across every
organization is intractable with so many "just do whatever you want" vibes in
the air.

Now, that obviously doesn't happen everywhere, but it happens everywhere until
it doesn't. Biggest offenders are usually non-technical offices: sales using 8
hosted platforms for metrics, email, surveys, project management, job hiring,
etc. All impossible to actually control at any sane level outside of 340 UI
clicks of the mouse across webby webby land.

tl;dr give up and go live in a cave for the next 30 years until all this gets
sorted

------
unsignedint
They should have at least have set an owner password for these documents. (In
practice, they are not effective preventing people to disregard limitation
that you set on the document, but at least it'll exclude documents for
indexing at least by Google.)

~~~
r3bl
I think the bare minimum would be to put them all in one directory and use
robots.txt to hide them from Google.

Sure, it's weak, but at least it won't be accessible through Google.

------
peterwwillis
What's especially crazy about these are that so many have been cached by
Google. Anyone can read these docs and only Google would ever have a record.

------
hellbanner
Ironically, a lot of the top results now are about this phenomenon. Reminds me
of that page that deleted it self when indexed

------
dan-silver
Combine with site:[url] for smaller scope. example: "not for public release"
filetype:pdf site:house.gov

------
qupear
Why use "filetype" and not "ext"? The results are identical.

~~~
monochromatic
If the results are identical, then who cares? If it's just a matter of saving
five keystrokes, I wonder if your 60-keystroke comment was a good use of
time...

~~~
wut42
it may have more results with the filetype, some server generated PDFs doesn't
always have the extension.

~~~
qupear
Isn't ext just an alias for filetype?
[http://www.googleguide.com/advanced_operators_reference.html](http://www.googleguide.com/advanced_operators_reference.html)

~~~
knd775
Then what in the world are you arguing?

