
Google PDF Search: “not for public release” - casefields
https://www.google.com/search?as_q=&as_epq=not+for+public+release&as_oq=&as_eq=&as_nlo=&as_nhi=&lr=&cr=&as_qdr=all&as_sitesearch=&as_occt=any&safe=images&as_filetype=pdf&as_rights=&gws_rd=ssl
======
derefr
Sometimes a document is actually no longer confidential, and there’s a newer
version without the watermark, but accidentally the older version is the one
that gets shared/linked to. (Usually the only difference is the watermark;
we’re not talking about documents that get redacted for public consumption
here.)

I encounter this problem pretty frequently with System-on-Chip spec sheets.
They’re confidential while under development; and still confidential while
being floated for sale by private negotiation to private device makers; but
once they’re _in_ a production device, those devices will have their own
public PR pages describing exactly the properties of the SoC, so the SoC can’t
really hide its specs any more at that point.

So, once the SoC manufacturer makes a high-profile sale, they will usually
“declassify” the spec... in the sense of _allowing_ the release of the spec
sheet. But they don’t usually bother to ever _publish_ a copy of the de-
watermarked spec sheet at any canonical URL; they only begin sending the de-
watermarked sheet in place of the watermarked one to new customers.

So, when you find one of these spec sheets online, they’re just as likely to
be a copy that a device maker got from the SoC manufacturer from _before_ the
specs were declassified, which they released once they were told it was okay
to do so. And such documents still have the “confidential” / “not for public
release” watermark.

~~~
dreamcompiler
So it's analogous to the key revocation problem, but more like a key
_invocation_ problem.

------
guessmyname
Not surprising, in all my years as a “Security Analyst” I found tons of
sensitive information like this.

Here’s a database that I always keep at hand whenever I need ideas for my
projects [1].

[1] [https://www.exploit-db.com/google-hacking-database](https://www.exploit-
db.com/google-hacking-database)

~~~
hartator
That's interesting. We have a couple of security companies that do ton of
requests like this on serpapi.com. I always wondered if that approach was
organized somewhere.

~~~
erikig
Smooth placement for a pretty sweet-looking product. How do you get around
Google's ToS?

~~~
hartator
Thank you! Always trying to do an organic placement when there is a relevant
opportunity.

INAL. Regarding ToS. In the US, scraping public data is a fair use exemption
under the first.

~~~
lgats
Could you elaborate on what you mean by "public data"?

I'm not sure that just because some data is in Google search it becomes any
sort of 'public'.

~~~
icedchai
How do you think Google got their data in the first place? Nobody complains
about Google scraping and indexing everyone.

------
jcims
There's a bunch of equivalent statements 'not for public disclosure' 'for
internal use only' 'company confidential' etc.

If you add 2019 to the search term you'll get a bit more relevant content. You
can also use doc, docx, etc for the filetype.

Be careful. The fact that it's possible to download something doesn't
necessarily give you the legal right to.

~~~
no_identd
Especially if you have any government security clearances.

~~~
crankylinuxuser
And if you do it accidentally, you also have a duty to report.

And if it's sensitive stuff (not classified, secret, top secret, or scif) you
can still be forbidden to even provide the link other than to your approved
security contact.

It's really better not to know if you have any sort of clearance.

------
ashleyn
One of my favourite things to do as a teenager was search google for
`intitle:"VNC Viewer for Java"`

~~~
pault
I'll bite: what did that find?

~~~
jasonjayr
VNC Servers on the public internet, that typically used weak passwords. You
could monitor or remote control other people's computers.

~~~
api
There was a site posted here a while back that scanned for wide open VNC and
found a lot of things that looked disturbingly like industrial control
consoles. Want to blow up an oil refinery?

------
oxfordmale
I didn't dare to click any of the links returned by this query:

"top secret" filetype:pdf

The Google summary for my first hit starts with this:

The attached material contains TOP SECRET information which bears directly
upon the effectiveness of our national defense or the conduct.

~~~
guessmyname
Do you mean this —
[https://i.imgur.com/y2z2Zva.png](https://i.imgur.com/y2z2Zva.png) ?

It’s just an empty template —
[https://i.imgur.com/H9u5sYi.png](https://i.imgur.com/H9u5sYi.png)

~~~
blobmaster
[https://github.com/umbrae/reddit-
top-2.5-million/blob/master...](https://github.com/umbrae/reddit-
top-2.5-million/blob/master/data/tf2.csv/&/p@pinterest.com/p0)

------
KirinDave
I was stupid/bold and downloaded a few. Every hit I got was a heavily redacted
document.

So maybe this isn't as big a deal as folks want to make it out to be? 𐑴𐑝𐑻𐑤𐑰
𐑛𐑮𐑩𐑥𐑨𐑑𐑦𐑒, 𐑥𐑱𐑚𐑰.

------
cmroanirgo
Wow. I added 'site:.au' and on the first page was an old doc related to the
Australian Defence Force's position on the Wikileaks release (2010):

[http://www.defence.gov.au/foi/docs/disclosures/054_1213_Docu...](http://www.defence.gov.au/foi/docs/disclosures/054_1213_Documents.pdf)

It's both marked as "Declassified" and "UNCLASSIFIED - Document not for public
release". Hilarious!

That said, there's some great commentary:

> _The potential for the leaked documents to adversely affect the safety of
> deployed forces, or operational security more broadly, is being assessed by
> the Defence task force. It appears unlikely at this stage that ADF or
> coalition forces will be directly endangered by the leaks. The information
> examined so far is tactical information that is now sufficiently aged that
> it poses minimal threat_

It's interesting to see how despite the apparent "minimal threat" Julian
Assange is still under quite a lot of heat from strong political forces over
this (& other releases).

EDIT: formatting

------
rtkwe
Wonder how many of these are actually still confidential. Some from just
randomly clicking were once confidential and have sense been redacted or
otherwise prepared to be made public.

~~~
scott_s
Yes - the second in my list is a private company's internal report, but it's
hosted on Australian Energy Regulator servers. I assume the document was made
public during some discovery process by the government.

------
codingdave
I write content and document workflow products for public government, so was
half-expecting many of these to be from my products, but was pleasantly
surprised. Only 3 documents came up when I searched on our domains. 2 were OK,
just defining which types of document should contain the phrase "not for
public release", and the 3rd had already been corrected.

These kinds of things are absolutely in the hands of the organizations who are
authoring and publishing the documents, not the vendors like myself who
provide the tools, but it is still a relief.

~~~
dragonwriter
> These kinds of things are absolutely in the hands of the organizations who
> are authoring and publishing the documents, not the vendors like myself who
> provide the tools,

If you can detect “publishing smells” like this with a Google search without
access to the tools, you also ought to be able to detect and raise warnings
about them in tools; quite a lot of tooling exists in software development
workflows to catch the programming equivalents of this kind of oversoght and
prevent them from getting into deployed code. (And arguably even more similar,
there are a wide array of tools designed to identify and protect against
internal information being exfiltrated, intentionally or accidentally, via
email, etc.)

So while responsibility for the results of what gets published is always the
responsibility of the tool users and not the tool vendors, I don't think it's
accurate to say that this is completely outside the hands of tool vendors.
And, I'm pretty sure customers (especially in the enterprise space) would see
support from publishing toolchains for preventing this as a competitive
advantage.

~~~
codingdave
Let me clarify - it is a legal/strategic decision not to put in tools that
monitor their content, because keeping the line distinct between the content
they author and the tools we provide also keeps the line distinct between who
is responsible for whether that content meets all legislative and regulatory
requirements.

------
dgellow
Just FYI:

Downloading confidential documents, even when they are publicly available
through google, can be a legal liability.

There is at least one case in France where a security researcher/blogger faced
legal issues after downloading confidential documents that where publicly
available when using the correct search terms (see Bluetouff vs French
justice).

I don’t know about the situation in the US for that kind of things though.

~~~
tantalor
That can't be right. How can you know if the document is confidential unless
you download it?

~~~
anomaloustho
I think the reasoning that it is generally not legal (including in the US) is
that it’s a slippery slope into the ambiguities of hacking. e.g. “I found a
public link that listed a bunch of addresses marked “private.foo.com”” I then
pinged those addresses and they were also publicly available. I then scanned
for documents and other links. I found more links publicly available named
“classified-do-not-open.pdf” - I downloaded them all and distributed them to
others.

Or for example when Dropbox disclosed a brief moment when you could sign into
anyone else’s account. What if all their links had just broken and had auth
removed at that time? If you were to exploit that, are you hacking - or was it
the fault of Dropbox? Are you wrong for taking publicly available files? They
were publicly available when you downloaded them - but you knew you shouldn’t
have.

How do you draw the line?

~~~
tantalor
I think a "reasonable person" would believe it's ok to download the file, but
then discard it once they realize the contents are not actually public.

By analogy, suppose you receive a letter in the mail; you reasonably believe
the contents are yours, so you open it. Oops, it was actually mailed to you in
error; the contents belong to another person. Obviously you did not commit a
crime, but you are obligated now to discard the contents.

------
muhammadn
Nothing new. You can find .env files that has database passwords, secrets all
over by typing "filetype:env" in google

~~~
ellisv
Personally I like "filetype:id_rsa"

~~~
Someone
I know it wouldn’t fix the leak, but files like those would be extremely
simple for Google to filter out of their index.

For github/gitlab/etc links, they probably also could inform project owners
about the problem.

------
joelx
For even more fun, try filtering by the past year or past month or past week.

------
segfaultbuserr
The Signal-to-Noise ratio is quite low. There are actually quite a lot of
false positives, including...

1\. Declassified/Released documents.

Many historical documents are no longer classified, yet some had its original
classification markings. The same goes for commercial documents. For example,
the Trump-Russia dossier was a previously classified document of a private
company.

2\. Whistleblowing documents.

These documents were released by whistleblowers, thus no longer private and
classified. You can find many NSA documents leaked by Edward Snowden hosted on
Amazon AWS owned by news websites.

3\. Boilerplate classification.

Many commercial documents have boilerplate classification, regardless of
whether they are truly sensitive or not. It's very common to find technical
documents and datasheets with "Confidential" markings, which should have been
removed long ago, probably related to (1).

In my experience, these documents make up 85% of the search result, only less
than 15% of the documents in the search result are truly security breaches and
confidential.

------
iliketosleep
I have a feeling this could get people into serious trouble for "hacking" in
some juristrictions if they were to download some of the files found in the
search. Especially when adding additional search filters such as site:gov

------
lozzo
funny. I think I would even spice up the search with "site:example.com"

------
chauyan
"not for public release" then you got the search results: About 81,600 results
(0.31 seconds)

------
markstos
Now you are a hacker.

------
saemil
I needed a good laugh today. Thanks.

