
Ask HN: Is security by obfuscation sufficient? - milestinsley
I am currently working on a web application that allows users to upload files. There is much more to it than this of course, but I am asking the following question solely in relation to the storage of files.<p>I am planning on using a CDN (such as Amazon's S3) and my question is simple: How effective is obfuscating the names of publicly available files using a UUID for security purposes? For example, naming a file something like this <i>4b013ca21ba608373efb4717.jpg</i>.<p>I would be fascinated to hear any thoughts from the HN users. Certain questions spring to mind, like:<p>- How easy would these be to guess? Can there be any guarantees of uniqueness?<p>- What is the best algorithm to create them?<p>- Should I just make them all private and use authenticated access controls?<p>I understand that this is an application specific question, that depends on what level of security I require, among other factors. But, for purposes of this discussion, lets just say that it needs to be high, but no Fort Knox: if a file <i>was</i> comprised it would not be critical.<p>Thank you in advance for any help. :)
======
MichaelApproved
Random file name: You can create an md5 hash of the original file name and use
that for the public file name. From what I understand it's extremely unlikely
to have two strings hash into the same value. It's also long and nearly
impossible to guess.

Security: I had a similar need for my website and figured that if I'm the only
one that knows about the URL then it's secure. I was dead wrong. Some browser
plug-ins look at each url you enter and spider them. I know this is true
because I started to see Alexa hit unpublished admin URLs on my website.

Unpublished URLs != Security.

~~~
andrewljohnson
If you compute a string that is say 32 characters long using just letters and
numbers, then there are 62^32 possible combinations. This number is 58 digits
long.

In this case, extremely unlikely just means never unless you are being
attacked by say Russians with secret technology and a room full of
supercomputers. The world will almost certainly end before anyone guesses one
of these numbers or before you generate two that are the same.

You can look at the probability table for the Birthday Attack in this
Wikipedia article and decide how unlikely you would like this event to be:
<http://en.wikipedia.org/wiki/Birthday_problem>

Also from the article: "In theory, MD5, 128 bits, should stay within that
range until about 820 billion documents, even if its possible outputs are many
more."

~~~
tome
You're also relying on the unpredictability of MD5 here, which is not so
watertight as it was once believed to be:

<http://en.wikipedia.org/wiki/Md5#Vulnerability>

------
eli
It's not clear what you're trying to secure against. Are you worried about
securing a particular image so that only certain designated users can see it?
Are you worried about the original name "leaking"? Are you worried about
someone iterating through all of your images?

In general, relying on a "secret" URL is not a good way to keep things secret.
Google has a nasty habit of finding URLs you thought had no links. Definitely
tune your robots.txt to keep the images off legit search engines.

~~~
chris100
_Definitely tune your robots.txt_

But if you put exact URLs in your robots.txt, then a normal person could pick
them up from there and access your content.

Make sure you use a directory structure and forbid all files from the content
directory :-)

~~~
andrewljohnson
It's also worth pointing out that only the likes of Google and Yahoo will even
look at robots.txt.

Some other crawler may walk in there, then create pages that link your files,
and then Google could link those pages, and it's game over.

------
gojomo
Obfuscation via a suitably-long unpredictable URL _can_ work, but as others
have noted, there are a number of ways such an URL can leak --
toolbars/browser-plugins being one of them. Also, that a user can easily
forward the URL to others to grant access may be a bug or a feature, depending
on your preferences.

One often overlooked leak: the 'Referer' [sic] header. If your document is
hypertext, and includes outlinks to elsewhere, and the authorized user(s)
click those links, those outlink-target sites may receive your confidential
URL as a 'Referer' header. As some sites then publish their referrers, in one
way or another, the 'secret' URL could wind up in public.

Remember this before creating a 'Competitors' page with outlinks on a 'login-
required' but plain-HTTP wiki!

~~~
milestinsley
Yes. This concept of 'leaking' the URL is becoming apparent to me. I naively
assumed you could keep them semi-private (by simply not telling anyone), but
this approach is doesn't appear to hold up.

It's a great point about the referrer header too. Thanks!

------
falsestprophet
First of all, Amazon S3 is absolutely not a CDN.

 _How easy would these be to guess?_

There is a 1:3e38 chance of two randomly generated UUIDs colliding.

 _What is the best algorithm to create them?_

[http://en.wikipedia.org/wiki/Universally_Unique_Identifier#I...](http://en.wikipedia.org/wiki/Universally_Unique_Identifier#Implementations)

 _Should I just make them all private and use authenticated access controls?_

If your users don't want other people peaking at their files (DropBox): yes
absolutely. If your users don't care (HotorNot), then no.

~~~
dustingetz
"Randomly generated UUIDs ... To put these numbers into perspective, one's
annual risk of being hit by a meteorite is estimated to be one chance in 17
billion [24], that means the probability is about 0.00000000006 (6 × 10−11),
equivalent to the odds of creating a few tens of trillions of UUIDs in a year
and having one duplicate. In other words, only after generating 1 billion
UUIDs every second for the next 100 years, the probability of creating just
one duplicate would be about 50%. The probability of one duplicate would be
about 50% if every person on earth owns 600 million UUIDs."

for smarter than brute-force random UUID algs, the odds of collision are
lower.

------
soult
There are two things to consider: First, you need to make sure that your
filename is a long, unique, hard-to-guess string, that is easy to generate.
This rules out all 5 UUID specifications:

Version 1: MAC + timestamp => easy to predict

Version 2: MAC + some other static data + partial timestamp: => Also easy to
predict

Version 3: MD5 of some file or random string => If an attacker has a file, he
can generate the MD5 hash himself and see if you also have this file.

Version 4: Random data => slow

Version 5: Same as version 3, but with SHA1.

Your best bet IMHO is to use the HMAC of the file. This will defend you
against all the flaws that using an unique ID would have.

The second step is to ensure that your secret links don't leak. You can employ
robots.txt to disallow robots, use dereferer.org or anonym.to to hide away
referers, but you still won't be secure, as someone can still copy and paste
the link. If that is ok with you, then you can stop reading now.

You could of course add an EC2 machine between the user and your S3 storage
that makes sure each link only works once, but this would be expensive and
counter-effective. However, Amazon S3 allows you to create a request that can
be made via HTTP GET and that is only valid until a specific time. (See the
API documentation, chapter "Authentication and Access Control"). This will
allow you to generate a new URL every time you want to serve a file to your
client. The URL will only be valid for a specific time period. The downside
is, that this again is time-demanding and that all caching on the user side
will be useless.

Good luck!

------
yumraj
Let me try to answer your question in a different way.

There are two things, security and the perception of security. For arguments
sake, even if we assume that the method of simply creating UUIDs for file
names is secure (which as several people have said is not a valid assumption),
I would argue that it does not provide a good enough perception of security.

So, if your users are really concerned about security and afraid that others
will look at their files, a simple solution as above would just not cut it.
You will have to really convince them that your system is really secure. Now,
no matter what you use, UUID, MD5, SHA1/2/256 etc. to them it won't make a
difference and that may mean that you will lose users.

Based on that, my suggestion would be to do what you said above, make them all
private and use authenticated access control. This will provide security as
well as perception of security and get you more satisfied users.

------
prodigal_erik
The secret filename is not only on the wire but will show up in history and
logs, which makes it a step down from basic HTTP authorization. I'd use at
least digest authorization for anything that matters in any way.

------
rbrcurtis
if you are suggesting that you leave these secret urls open to the entire
internet, and just rely on them being inaccessbile, you're looking for
trouble. Depending on the type of documents, at the least your users may not
be happy about them being unauthenticated, and I personally would worry about
legal issues. Add authentication.

------
clutchski
It depends on your risk profile. For a bank, no. For an lolcatz photo site,
sure.

