Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

S3 will automatically shard your bucket evenly across your sub-namespace if you use pre-shardable names whenever possible.

This sounds complex but is actually fairly easy to do. For example, if you are not dealing with randomly generated ID's, try using Base64 encoded keynames rather than the keynames themselves. (A more limited character set is helpful, though.)

A better name than user_f38c9123 is f38c9123_user, which will allow a statistically even shard size across the full 16 character range of the first character. As more performance is needed, S3 will automatically shard the second character into 16x16 (256) possible shards, etc.

Also, using a more limited character set such as hexadecimal (that is, [[a-f][0-8]]*) (or just numberic digits) for the first characters of a filename will shard more evenly than a full alphanumeric [A-z][0-9][etc].

Jeff Barr had a blog post on this a while ago.. here it is:

https://aws.amazon.com/blogs/aws/amazon-s3-performance-tips-...

"By the way: two or three prefix characters in your hash are really all you need: here’s why. If we target conservative targets of 100 operations per second and 20 million stored objects per partition, a four character hex hash partition set in a bucket or sub-bucket namespace could theoretically grow to support millions of operations per second and over a trillion unique keys before we’d need a fifth character in the hash."

We actually do exactly this in Userify (blatant plug, SSH key management, sudo, etc https://userify.com) by just switching the ID type to the end of the string: company_[shortuuid] becomes [shortuuid]_company. It makes full bucket scans for a single keyname a bit easier and faster if you use different buckets for each type of data, but you actually will get better sharding overall by mixing all of your data types together in a single bucket. The trade-off is worth it for the general case.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: