Hacker News new | comments | show | ask | jobs | submit login
Bucket Stream: Finding S3 Buckets by watching certificate transparency logs (github.com)
122 points by Chris911 11 months ago | hide | past | web | favorite | 48 comments



>Randomise your bucket names! There is no need to use company-backup.s3.amazonaws.com

This is really poor advice. It offers no real benefit, especially since any asset you access will betray your bucket name because it's part of the DNS resolution. Bucket names are emphatically public as much as a DNS name is public.


True.

It can also create more problems. If you name something like companyname-production vs companyname-qa, you pretty much know right off the bat which environment you are about to mess up. Not so with random names or UUIDs.

This is also security by obscurity. If all one needs to know is the bucket name, you have already lost.

EDIT: As an exception to this, I randomize a portion of the bucket name when it is created by automation. But this is solely to avoid name clashes across separate clusters. The prefix will still be the same.


> This is also security by obscurity.

I see this being claimed a lot, but isn’t all security by obscurity at the end of the day?

A simplistic example, compare (A) with (B).

A) I run telnet with no password on a random port. The changes for an attacker to guess my password are 1/65k.

B) I run telnet in port 25 with password being a random number from 1 to 65k.

How do A and B differ in security?


> I see this being claimed a lot, but isn’t all security by obscurity at the end of the day?

I Do Not Think It Means What You Think It Means [1].

To elaborate, the concept is not formal/mathematical, it's a design concept. You can distinguish between a security implementation that explicitly depends on a secret key or password, and an implementation that implicitly relies upon secret implementation details for its security. The latter is not intentionally designed as a carefully-controlled secret, and therefore much easier to accidentally leak.

[1] https://en.wikipedia.org/wiki/Security_through_obscurity


You are right, I did, but I think so does the parent of my original reply.

The GP of the original reply said "Randomise your bucket names" and the parent said this is "Security by obscurity".

The point I was trying to make, was that using a random name, as the GP suggested, is as good as using some kind of security with a password of the same strength.

Assuming there is no way for somebody to get a list of all the buckets, and therefore not having to "guess" the name.

But yeah, it has nothing to do with security through obscurity. Sorry.


Whilst they don't in your example, choosing a password between 1 and 65K is a very bad decision to begin with (assuming the attack knows this ... if they don't the password search space is far larger than the port search space)

In general A does not improve your security 65K times since a single attempt will tell if there is telnet on the port or not, whereas with B all you know if you got the wrong password.

Now if you ran a dummy telnet that always can slow 'wrong password' responses on the other (65K-1) ports that would potentially increase the security 65K times, but still isn't really a meaningful thing to do.


You are completely missing the point of the example.


I hope the example was not implying that the “security” was just lack of knowledge of the random telnet port. Running a port scan to find out is incredibly easy (or just use Shodan).


This comment is a great example of one of worst aspects of HN.


A is a realistic case of security by obscurity - there's a sizable amount of people who believe that to be secure.

B is in my opinion much less realistic: very few people believe a password two bytes long (or better, with two bytes of entropy) to be secure. Even a trivial password like "TelnetSucks" scores 31 bits of entropy with https://apps.cygnius.net/passtest/.


That app doesn't seem to be great. It complains when a password contains spaces or is longer than 16 words.


A) brute force takes 65K max attempts

B) unless you know the password MUST be a number and MUST be between 1 and 65K (which is a terrible password requirement, e.g. a password of max length 65000 using only digits 0-9 is as good as no password), you need to brute force the entire known character space up to some finite number. The sun will die first.


> unless you know the password MUST be a number and MUST be between 1 and 65K

But with A you don't get in unless you know the password MUST be empty and there MUST be a telnet server on a random port. What's the difference?


> What's the difference?

The problem is one of probabilities - even the most basic script-kiddie scanners is set up to find your telnet server. Right now there are hundreds, if not thousands, of machines scanning the entire IPv4 space over and over for exactly this kind of silly configuration. If you do something like this it will eventually be found and used.


You are missing the point of the example.


If you've had to say this to both of the commentators who've replied, perhaps you should try to come up with a different example that better conveys the point you are trying to make...


No, the responses to the comment are a common fallacy on display, where rather than addressing the point of the thought experiment, which is clear enough, people attack the premise. There is no amount of defensive writing[1] that can bring relief to this situation.

1. https://pchiusano.github.io/2014-10-11/defensive-writing.htm...


I actually somewhat agree with your point, your example is simply not realistic. Your point is correct because people are using the term security by obscurity wrong. Security by obscurity means that you rely on the secret implementation of your algorithm. Our best encryption algorithms are public so they can be poked and peer reviewed. You are right in the fact that through enough obscurity of the key, you attain security as it's non feasible statistically to brute force.

If you have a public & unlisted endpoint that looks like

https://example.com/VERYLONGANDRANDOMKEY

You might argue it's as good as a request to

https://example.com

with an Authorization header containing this key for example.

(Well, not exactly the same, as most access logs will include the first and not the second, but for the sake of the argument)

p.s. I don't agree for example that

VERYLONGANDRANDOMKEY.example.com is the same, as if I'm not mistaken, if you just scan the entire IP range, then try to do a reverse DNS lookup, you'll end up finding it anyway.


Ahh.. OK. Yeah, I wasn't trying to make a realistic example. Yes, completely agree with your reply.

By the way I think the reason that people, including myself, are confused about what exactly security by obscurity means, is that even the experts don't explain it very clearly.

An example that always comes into my mind when we talk about security by obscurity is the one give int he "Applied Cryptography" book:

"If I take a letter, lock it in a safe, hide the safe somewhere in New York, then tell you to read the letter, that's not security. That's obscurity."


At the most abstract level, security is RISK management which is related to SECRETS management. So, on some level it is true that security is equivalent to obscurity. But, that's like saying that cars are molecules. It is true, but it is not a useful statement.

There are two operative principles of security that you should research. 1) Defense in depth, where there is more than one layer of security that must be pierced. 2) Assume that the attacker knows absolutely everything about your system, design, ports, and so on - except for the key material.


I can think of one advantage.. it makes it difficult for somebody to attack you with a typo attack. If all your buckets having a consistent naming scheme that is very strict, then somebody else could make a bucket very similar to one of yours where a typo would be likely and your data starts going to them.


I wouldn't call it poor advice. It isn't a control, more security by obscurity, but it doesn't exactly hurt anything either. I saw a situation recently where a bucket was accidentally opened to the world, but the name was a UUID and in the entire history of the bucket no request was logged other than from the intended clients.


> but it doesn't exactly hurt anything either.

It hurts me if I'm trying to remember the bucket I'm after.

Is fc20d856-2a7e-41ab-b072-9bb9a68c6bda production or 193565ac-9121-4071-8aeb-62f3111c4c97 or is that the dev setup or the staging data for the other service or...

To me the big question here is why these names have to be global. Why can't I have a UUID externally but a name and an account internally? Honest question, I assume there may be a significant issue as smarter people than me decided not to do it that way.


I've heard many aws employees lament the global namespace of s3 bucket names. They think it's a mistake too.

Though if they weren't global, they'd probably be "name.accountid.s3...." which isn't really obscure either since aws account ids are semi-public.


> in the entire history of the bucket no request was logged other than from the intended clients

This sounds sort of like dumb luck. It just means no one was looking for it, that doesn't mean it's secure. This all reminds of me of the xkcd about making passwords that are easy for computers to guess and hard for people to remember[0].

Your security on buckets should be the bucket policy/permissions themselves, not the arbitrary naming of them. Security by obscurity is rarely secure and more about the illusion of security.

[0] https://xkcd.com/936/


I couldn't agree more with your second point, but risk is usually considered the product of likelihood and impact. If I name my bucket 'bestbuy' vs '4fc6-43b0-bc19-75fe07e06133', the likelihood that some random is going to find my bucket increases dramatically.


The chance of it being found by someone guessing the name would increase dramatically. The chance of it being found by someone running a script that searches for buckets using DNS logs, code searches, etc would be the same.

Hackers don't often try to guess things. They run scripts. That's why it doesn't matter what you call the bucket.


Could be more general: finding subdomains by watching CT logs.

So what is the problem here?

How to "hide" private subdomains?

How to "securely" configure S3 buckets?

IMO, the problem is in the use of the CA system, where control over "names" (e.g. subdomains) is shared with third parties (certificate issuers) instead of being solely with the user who wants to reserve names.

It is possible to have a non-CA PKI system where the user controls both the issuance of the public key and the associated name she will use. In such a system, no third party has control over names. People learn the user's name and the user's key from the same source: the user.

Thus there is no issue of trust re: using third parties, and thus no need for monitoring what names the third parties are issuing, e.g. via "certificate transparency" logs. CT logs do not need to exist.

This is not a new idea and it has been proven to work. I can prepare a post with examples if anyone is interested.


> Could be more general: finding subdomains by watching CT logs.

Yep. Can use crt.sh for this on a per domain level, I also wrote ausdomainledger.net as an experiment to index all subdomains in the .au TLD, querying the CT logs directly, which was a bunch of fun.

> How to "hide" private subdomains?

Symantec provides the option of label redaction (using the '?' symbol) for CT precerts with the certificates they issue. For example: https://crt.sh/?q=?.amazon.com.au . However I'm pretty sure its not supported by the CT RFC ...

Otherwise, I'd say wildcards.

Replacing the CA PKI with something else is very drastic and if possible, will probably take a very long time ...


If you have a wildcard cert, you don't have to share the subdomains with the CA.


More importantly: why s3 doesn't use wildcard ssl cert? I find it strange that they would queue DNS changes on a simple bucket provision.


Because then amazon would have trivial access to all connections to s3 buckets.


I think support for wildcards is coming next year (to let's encrypt)


> Randomise your bucket names! There is no need to use company-backup.s3.amazonaws.com.

I don't think this is a globally true statement. Random bucket names are hard, not everyone is using s3 with a code configuration and therefore remembering bucket name is actually important.


Passive DNS might be another good way to get S3 bucket names.

There doesn't seem to be a Wikipedia article on Passive DNS, but this article explains it quite well: https://help.passivetotal.org/passive_dns.html

Basically some resolvers submit all (some?) of their DNS query responses to a central database so that it can be searched later. It seems you can also install a passive "sensor" in your network that (presumably) passively MITMs DNS queries and then sends off the responses.

I don't know how hard it is to get access to the data, but:

> programs like RiskIQ's DNSIQ allow organizations to install a sensor on their network that reports back to RiskIQ and in exchange, the organization gains access to all the passive DNS traffic inside the central repository.

EDIT: VirusTotal has some passive DNS data publicly available: e.g. look in "observed subdomains" https://www.virustotal.com/en/domain/s3-us-west-2.amazonaws....

EDIT2: And a bunch of them appear to be unprotected...


I did some analysis a few months ago and collected the names of approximately 100,000 buckets in the wild. Rough numbers, about 5% are open to the public for anonymous read, and about 5% of those are open for anonymous write.

I'm convinced that Chris Vickery, the guy behind a good many of the open bucket finds this year, has access to enterprise firewall/proxy logs. Not because the buckets would have been hard to find, but because you could spend a lifetime looking through thousands upon thousands of open buckets before you find anything interesting.


Love stuff like that! I've quickly wrapped a prettifier for S3 xml listings in a userscript, so you can use it with Tampermonkey Beta. Tested on Chrome under OS X.

https://gist.github.com/kaivi/8114cbc2080da78d67c94238af6421...

Edit: Okay, the userscript won't run on larger XML files, gotta figure it out later.


This is concerning b/c there have been a number of high profile data breaches that have occurred due to over reliance on S3 bucket obscurity. Where the buckets have been left with minimal or misconfigured permissions and GBs of data there for the downloading.


How is this concerning? This is very good, because it makes it easy to do that, which means that's much harder to dismiss as "something that will never happen".


Concerning in the sense of "if you aren't sure why this is a story on HN" -> that you may be unaware that many large and generally technically competent firms are screwing this up and this repo/tool is yet one more reason to take this seriously.


At some point an organization living in the cloud needs to properly secure their cloud resources. This makes it easier to justify that effort up front.


Correct me if I’m wrong but last time I tried to make a new bucket’s contents public it was a real PITA. The default configuration is very locked down. So I think it’s never a case of minimal configuration and always misconfiguration.


I was curious so I've tried if I could find anything compromising with it and it's mostly just public buckets of some images used for websites so nothing strange. Maybe the README is a bit too dramatic.


I'm confused. Aren't S3 buckets secured by pre-existing wildcard certs?


Ignore any direct connection between S3 buckets themselves and particular certificates, and just think of the stream of domain names you get from CT as the seed for a dictionary to grind against S3.


But why do we get those domain names if there (supposedly) is an existing wildcard certificate?


To put the s3 bucket under another domain. Such as static.example.com instead of abcdef01123451523245.s3.amazon.com (or whatever it is).


The code takes the CT hostname and tries to access a bunch of different buckets that might exist related to that hostname. So if you get a cert for foo.example.com it will ask s3 if foo.example.com.s3.amazonaws.com and www-foo.example.com.s3.amazonaws.com exist.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: