The versioning is cool in and of itself, but auto-archiving to Glacier after an expiration date is a fantastic feature. The value-add on that is going to be really great for a lot of teams I work with.
What I'd really love to see on S3, though, is the ability to make partial or append-only updates without needing to re-upload the entire bitstream. As of the last time I checked this wasn't directly possible at the API level (it seems like most people store deltas and "reconstruct" the file programmatically when they need to do this).
This would enable an entire class of use cases that's otherwise not easy to do on S3, but which I think would be useful (streaming logs, sensor data, etc.).
Repeatedly updating S3 files on a stream that produced a lot of updates (e.g. 1-second-resolution server logs) would get very expensive, very quickly!
Something like that might work if you only periodically rewrote the S3 file (e.g. once an hour). But it's still much less efficient than just allowing delta updates.
You'd want to write your logs to S3, and then process and cache in-memory, either in something like Redis or in-memory in your analytics app on an EC2 instance.
That'd seriously put logging startups in a tough position, but I guess processing the logs is still the main differentiator which you must do by hand on S3.
Amazon keeps blowing me away with the speed of new feature introductions and the breadth of their cloud platform.
I really hope that Rackspace and other players figure out how to make an open standards version of Amazon's platform, or else we are all basically going to be locked into Amazon for the foreseeable future.
Once you get hooked into all these awesome features, its hard to migrate to another provider.
Which is of course rubbish; events like the sudden fall of civilization, large-scale nuclear war, all governments on earth suddenly declaring very large computing or storage facilities illegal out of paranoia about runaway AI, etc., are surely much more probable than 10^-11 per year and have the potential to wipe out S3 entirely.
It's arguable that all of these are sufficiently major events that if any of them happens you won't care that you just lost all your data stored in S3 because you'll be too busy fighting off wolves, dying of radiation sickness, or whatever.
More to the point, though, I take it the real point of these many-9s guarantees is that if you store a very very large number of things on S3 the danger of losing one of them is still very small. So, for instance, if you store 10^8 objects in S3 then the probability of losing any of them in a given year is allegedly about 0.1%, which is actually fairly credible. (But when that happens, a substantial fraction of the time it's because of a really major disaster and you probably lose a lot more than one object.)
We could do with more precise ways of describing availability guarantees, to distinguish between a small probability of losing lots of data and a large probability of losing a tiny amount of data.
That argument applies to any uptime guarantee from any vendor. When Rackspace says "99.9% uptime!" no one complains that they should have to say "unless nuclear war happens".
If the figure is 99.9% then I don't think it needs any such disclaimer; large-scale nuclear war is pretty improbable these days. It's only once you start getting to large numbers of 9s that these spectacular low-probability events begin to matter.
There's a difference between setting a 99.9% uptime on a monthly basis vs. implying that my data will be safe for a period of time exceeding the existence of humans as a species.
They standup to it too. I know a number of large companies in town that store a ton of stuff in S3 and have never lost a one of them. However, they did once see some wonky API behavior a few times. Nevertheless, probably a lot better than your internal IT guys can provide.
Durability and Availability however are different. AWS did go down for 8 hours in 2012, but as implied by the Durability claims, all the objects were there when they came back up.
Primary S3 (s3.amazonaws.com) is bi-coastal redundant. If us-east-1 goes down, your objects will continue to be served from their Oregon location transparently.
This doesn't apply if you're using S3 as a website endpoint though, unless you're doing your own service checks/DNS failover.
Finally! This will save us a lot of money as we wrote tools to do this, but it costs keeps increasing, especially when you have to scan the whole bucket.
What I'd really love to see on S3, though, is the ability to make partial or append-only updates without needing to re-upload the entire bitstream. As of the last time I checked this wasn't directly possible at the API level (it seems like most people store deltas and "reconstruct" the file programmatically when they need to do this).
This would enable an entire class of use cases that's otherwise not easy to do on S3, but which I think would be useful (streaming logs, sensor data, etc.).