
Bucket Stream: Finding S3 Buckets by watching certificate transparency logs - Chris911
https://github.com/eth0izzle/bucket-stream
======
matt_wulfeck
> _Randomise your bucket names! There is no need to use company-
> backup.s3.amazonaws.com_

This is really poor advice. It offers no real benefit, especially since any
asset you access will betray your bucket name because it's part of the DNS
resolution. Bucket names are emphatically public as much as a DNS name is
public.

~~~
jcims
I wouldn't call it poor advice. It isn't a control, more security by
obscurity, but it doesn't exactly hurt anything either. I saw a situation
recently where a bucket was accidentally opened to the world, but the name was
a UUID and in the entire history of the bucket no request was logged other
than from the intended clients.

~~~
IanCal
> but it doesn't exactly hurt anything either.

It hurts me if I'm trying to remember the bucket I'm after.

Is fc20d856-2a7e-41ab-b072-9bb9a68c6bda production or
193565ac-9121-4071-8aeb-62f3111c4c97 or is that the dev setup or the staging
data for the other service or...

To me the big question here is why these names have to be global. Why can't I
have a UUID externally but a name and an account internally? Honest question,
I assume there may be a significant issue as smarter people than me decided
not to do it that way.

~~~
TheDong
I've heard many aws employees lament the global namespace of s3 bucket names.
They think it's a mistake too.

Though if they weren't global, they'd probably be "name.accountid.s3...."
which isn't really obscure either since aws account ids are semi-public.

------
feelin_googley
Could be more general: finding subdomains by watching CT logs.

So what is the problem here?

How to "hide" private subdomains?

How to "securely" configure S3 buckets?

IMO, the problem is in the use of the CA system, where control over "names"
(e.g. subdomains) is shared with third parties (certificate issuers) instead
of being solely with the user who wants to reserve names.

It is possible to have a non-CA PKI system where the user controls both the
issuance of the public key _and_ the associated name she will use. In such a
system, no third party has control over names. People learn the user's name
and the user's key from the same source: the user.

Thus there is no issue of trust re: using third parties, and thus no need for
monitoring what names the third parties are issuing, e.g. via "certificate
transparency" logs. CT logs do not need to exist.

This is not a new idea and it has been proven to work. I can prepare a post
with examples if anyone is interested.

~~~
cortesoft
If you have a wildcard cert, you don't have to share the subdomains with the
CA.

~~~
noway421
More importantly: why s3 doesn't use wildcard ssl cert? I find it strange that
they would queue DNS changes on a simple bucket provision.

~~~
rocqua
Because then amazon would have trivial access to all connections to s3
buckets.

------
notyourwork
> Randomise your bucket names! There is no need to use company-
> backup.s3.amazonaws.com.

I don't think this is a globally true statement. Random bucket names are hard,
not everyone is using s3 with a code configuration and therefore remembering
bucket name is actually important.

------
jstanley
Passive DNS might be another good way to get S3 bucket names.

There doesn't seem to be a Wikipedia article on Passive DNS, but this article
explains it quite well:
[https://help.passivetotal.org/passive_dns.html](https://help.passivetotal.org/passive_dns.html)

Basically some resolvers submit all (some?) of their DNS query responses to a
central database so that it can be searched later. It seems you can also
install a passive "sensor" in your network that (presumably) passively MITMs
DNS queries and then sends off the responses.

I don't know how hard it is to get access to the data, but:

> programs like RiskIQ's DNSIQ allow organizations to install a sensor on
> their network that reports back to RiskIQ and in exchange, the organization
> gains access to all the passive DNS traffic inside the central repository.

EDIT: VirusTotal has some passive DNS data publicly available: e.g. look in
"observed subdomains" [https://www.virustotal.com/en/domain/s3-us-
west-2.amazonaws....](https://www.virustotal.com/en/domain/s3-us-
west-2.amazonaws.com/information/)

EDIT2: And a bunch of them appear to be unprotected...

------
jcims
I did some analysis a few months ago and collected the names of approximately
100,000 buckets in the wild. Rough numbers, about 5% are open to the public
for anonymous read, and about 5% of those are open for anonymous write.

I'm convinced that Chris Vickery, the guy behind a good many of the open
bucket finds this year, has access to enterprise firewall/proxy logs. Not
because the buckets would have been hard to find, but because you could spend
a lifetime looking through thousands upon thousands of open buckets before you
find anything interesting.

------
kaivi
Love stuff like that! I've quickly wrapped a prettifier for S3 xml listings in
a userscript, so you can use it with Tampermonkey Beta. Tested on Chrome under
OS X.

[https://gist.github.com/kaivi/8114cbc2080da78d67c94238af6421...](https://gist.github.com/kaivi/8114cbc2080da78d67c94238af64210d)

Edit: Okay, the userscript won't run on larger XML files, gotta figure it out
later.

------
michaelbuckbee
This is concerning b/c there have been a number of high profile data breaches
that have occurred due to over reliance on S3 bucket obscurity. Where the
buckets have been left with minimal or misconfigured permissions and GBs of
data there for the downloading.

~~~
finnn
How is this concerning? This is very good, because it makes it easy to do
that, which means that's much harder to dismiss as "something that will never
happen".

~~~
michaelbuckbee
Concerning in the sense of "if you aren't sure why this is a story on HN" ->
that you may be unaware that many large and generally technically competent
firms are screwing this up and this repo/tool is yet one more reason to take
this seriously.

------
realusername
I was curious so I've tried if I could find anything compromising with it and
it's mostly just public buckets of some images used for websites so nothing
strange. Maybe the README is a bit too dramatic.

------
ceejayoz
I'm confused. Aren't S3 buckets secured by pre-existing wildcard certs?

~~~
tptacek
Ignore any direct connection between S3 buckets themselves and particular
certificates, and just think of the stream of domain names you get from CT as
the seed for a dictionary to grind against S3.

~~~
mynewtb
But why do we get those domain names if there (supposedly) is an existing
wildcard certificate?

~~~
simcop2387
To put the s3 bucket under another domain. Such as static.example.com instead
of abcdef01123451523245.s3.amazon.com (or whatever it is).

