This is really poor advice. It offers no real benefit, especially since any asset you access will betray your bucket name because it's part of the DNS resolution. Bucket names are emphatically public as much as a DNS name is public.
It can also create more problems. If you name something like companyname-production vs companyname-qa, you pretty much know right off the bat which environment you are about to mess up. Not so with random names or UUIDs.
This is also security by obscurity. If all one needs to know is the bucket name, you have already lost.
EDIT: As an exception to this, I randomize a portion of the bucket name when it is created by automation. But this is solely to avoid name clashes across separate clusters. The prefix will still be the same.
I see this being claimed a lot, but isn’t all security by obscurity at the end of the day?
A simplistic example, compare (A) with (B).
A) I run telnet with no password on a random port. The changes for an attacker to guess my password are 1/65k.
B) I run telnet in port 25 with password being a random number from 1 to 65k.
How do A and B differ in security?
I Do Not Think It Means What You Think It Means .
To elaborate, the concept is not formal/mathematical, it's a design concept. You can distinguish between a security implementation that explicitly depends on a secret key or password, and an implementation that implicitly relies upon secret implementation details for its security. The latter is not intentionally designed as a carefully-controlled secret, and therefore much easier to accidentally leak.
The GP of the original reply said "Randomise your bucket names" and the parent said this is "Security by obscurity".
The point I was trying to make, was that using a random name, as the GP suggested, is as good as using some kind of security with a password of the same strength.
Assuming there is no way for somebody to get a list of all the buckets, and therefore not having to "guess" the name.
But yeah, it has nothing to do with security through obscurity. Sorry.
In general A does not improve your security 65K times since a single attempt will tell if there is telnet on the port or not, whereas with B all you know if you got the wrong password.
Now if you ran a dummy telnet that always can slow 'wrong password' responses on the other (65K-1) ports that would potentially increase the security 65K times, but still isn't really a meaningful thing to do.
B is in my opinion much less realistic: very few people believe a password two bytes long (or better, with two bytes of entropy) to be secure. Even a trivial password like "TelnetSucks" scores 31 bits of entropy with https://apps.cygnius.net/passtest/.
B) unless you know the password MUST be a number and MUST be between 1 and 65K (which is a terrible password requirement, e.g. a password of max length 65000 using only digits 0-9 is as good as no password), you need to brute force the entire known character space up to some finite number. The sun will die first.
But with A you don't get in unless you know the password MUST be empty and there MUST be a telnet server on a random port. What's the difference?
The problem is one of probabilities - even the most basic script-kiddie scanners is set up to find your telnet server. Right now there are hundreds, if not thousands, of machines scanning the entire IPv4 space over and over for exactly this kind of silly configuration. If you do something like this it will eventually be found and used.
If you have a public & unlisted endpoint that looks like
You might argue it's as good as a request to
with an Authorization header containing this key for example.
(Well, not exactly the same, as most access logs will include the first and not the second, but for the sake of the argument)
p.s. I don't agree for example that
VERYLONGANDRANDOMKEY.example.com is the same, as if I'm not mistaken, if you just scan the entire IP range, then try to do a reverse DNS lookup, you'll end up finding it anyway.
By the way I think the reason that people, including myself, are confused about what exactly security by obscurity means, is that even the experts don't explain it very clearly.
An example that always comes into my mind when we talk about security by obscurity is the one give int he "Applied Cryptography" book:
"If I take a letter, lock it in a safe, hide the safe somewhere in New York, then tell you to read the letter, that's not security. That's obscurity."
There are two operative principles of security that you should research. 1) Defense in depth, where there is more than one layer of security that must be pierced. 2) Assume that the attacker knows absolutely everything about your system, design, ports, and so on - except for the key material.
It hurts me if I'm trying to remember the bucket I'm after.
Is fc20d856-2a7e-41ab-b072-9bb9a68c6bda production or 193565ac-9121-4071-8aeb-62f3111c4c97 or is that the dev setup or the staging data for the other service or...
To me the big question here is why these names have to be global. Why can't I have a UUID externally but a name and an account internally? Honest question, I assume there may be a significant issue as smarter people than me decided not to do it that way.
Though if they weren't global, they'd probably be "name.accountid.s3...." which isn't really obscure either since aws account ids are semi-public.
This sounds sort of like dumb luck. It just means no one was looking for it, that doesn't mean it's secure. This all reminds of me of the xkcd about making passwords that are easy for computers to guess and hard for people to remember.
Your security on buckets should be the bucket policy/permissions themselves, not the arbitrary naming of them. Security by obscurity is rarely secure and more about the illusion of security.
Hackers don't often try to guess things. They run scripts. That's why it doesn't matter what you call the bucket.
So what is the problem here?
How to "hide" private subdomains?
How to "securely" configure S3 buckets?
IMO, the problem is in the use of the CA system, where control over "names" (e.g. subdomains) is shared with third parties (certificate issuers) instead of being solely with the user who wants to reserve names.
It is possible to have a non-CA PKI system where the user controls both the issuance of the public key and the associated name she will use. In such a system, no third party has control over names. People learn the user's name and the user's key from the same source: the user.
Thus there is no issue of trust re: using third parties, and thus no need for monitoring what names the third parties are issuing, e.g. via "certificate transparency" logs. CT logs do not need to exist.
This is not a new idea and it has been proven to work. I can prepare a post with examples if anyone is interested.
Yep. Can use crt.sh for this on a per domain level, I also wrote ausdomainledger.net as an experiment to index all subdomains in the .au TLD, querying the CT logs directly, which was a bunch of fun.
> How to "hide" private subdomains?
Symantec provides the option of label redaction (using the '?' symbol) for CT precerts with the certificates they issue. For example: https://crt.sh/?q=?.amazon.com.au . However I'm pretty sure its not supported by the CT RFC ...
Otherwise, I'd say wildcards.
Replacing the CA PKI with something else is very drastic and if possible, will probably take a very long time ...
I don't think this is a globally true statement. Random bucket names are hard, not everyone is using s3 with a code configuration and therefore remembering bucket name is actually important.
There doesn't seem to be a Wikipedia article on Passive DNS, but this article explains it quite well: https://help.passivetotal.org/passive_dns.html
Basically some resolvers submit all (some?) of their DNS query responses to a central database so that it can be searched later. It seems you can also install a passive "sensor" in your network that (presumably) passively MITMs DNS queries and then sends off the responses.
I don't know how hard it is to get access to the data, but:
> programs like RiskIQ's DNSIQ allow organizations to install a sensor on their network that reports back to RiskIQ and in exchange, the organization gains access to all the passive DNS traffic inside the central repository.
EDIT: VirusTotal has some passive DNS data publicly available: e.g. look in "observed subdomains" https://www.virustotal.com/en/domain/s3-us-west-2.amazonaws....
EDIT2: And a bunch of them appear to be unprotected...
I'm convinced that Chris Vickery, the guy behind a good many of the open bucket finds this year, has access to enterprise firewall/proxy logs. Not because the buckets would have been hard to find, but because you could spend a lifetime looking through thousands upon thousands of open buckets before you find anything interesting.
Edit: Okay, the userscript won't run on larger XML files, gotta figure it out later.