

What naming convention do you use for S3 keys? - sam

I'm (re)doing our naming convention for our S3 keys. We have many (millions) of objects to store on S3. I'm planning on putting them all into one bucket and naming the keys to be the MD5 or SHA-1 digest of the files. I'll keep this synced with a database table which maps an auto-increment GUID (globally unique id) with the digest of the file.<p>Then I read this:
http://paltman.com/2007/05/29/amazon-s3-and-filename-magic
but I don't really understand the advantage of storing an object on S3 whose key is the hash and which points to the GUID.
One thing I don't want to do is store the GUID as the S3 key (to prevent massive scraping of all the assets).<p>How are you all dealing with this? Is anyone using  a MD5 or SHA-1 digest as the key? A salted hash of the GUID as the key?
======
uruzseven
I think this is dependant on what you're doing with S3. I'm only using it for
backup and use the tarball name and date/timestamp as the key name. This
ensures that it's unique and easy to understand when you need to restore.

If I were going to use them for hosting my videos or something, I'd probably
just use the GUID without the need for the MD5. Just make sure there's not
chance of duplicates in the database schema and it should work.

I use Bacula for network backups and it does the same thing. All paths and
filenames are stored as unique ID numbers.

~~~
sam
Yeah, using the GUID works, but then it opens us up to someone coming in and
just scraping every file we have on S3 which would be bad for us. I'm leaning
towards encrypting/obfuscating the ids... any suggestions for python libraries
suited to this task?

------
sam
Hmm. I just realized that if they key is the MD5 digest, then I can't
independently store/delete identical files. Maybe the best bet is to encrypt
the GUID.

