Amazon S3 Lifecycle Management for Versioned Objects

jxf · on May 21, 2014

The versioning is cool in and of itself, but auto-archiving to Glacier after an expiration date is a fantastic feature. The value-add on that is going to be really great for a lot of teams I work with.

What I'd really love to see on S3, though, is the ability to make partial or append-only updates without needing to re-upload the entire bitstream. As of the last time I checked this wasn't directly possible at the API level (it seems like most people store deltas and "reconstruct" the file programmatically when they need to do this).

This would enable an entire class of use cases that's otherwise not easy to do on S3, but which I think would be useful (streaming logs, sensor data, etc.).

leef · on May 21, 2014

Sounds like something to use Kinesis[1] for. Read the kinesis stream, process the updates and write/rewrite the S3 file.

1 - https://aws.amazon.com/kinesis/

jxf · on May 21, 2014

Repeatedly updating S3 files on a stream that produced a lot of updates (e.g. 1-second-resolution server logs) would get very expensive, very quickly!

Something like that might work if you only periodically rewrote the S3 file (e.g. once an hour). But it's still much less efficient than just allowing delta updates.

toomuchtodo · on May 21, 2014

You'd want to write your logs to S3, and then process and cache in-memory, either in something like Redis or in-memory in your analytics app on an EC2 instance.

pestaa · on May 21, 2014

That'd seriously put logging startups in a tough position, but I guess processing the logs is still the main differentiator which you must do by hand on S3.

TheMagicHorsey · on May 21, 2014

Amazon keeps blowing me away with the speed of new feature introductions and the breadth of their cloud platform.

I really hope that Rackspace and other players figure out how to make an open standards version of Amazon's platform, or else we are all basically going to be locked into Amazon for the foreseeable future.

Once you get hooked into all these awesome features, its hard to migrate to another provider.

thinkmassive · on May 21, 2014

Eucalyptus Systems makes an open source version of a subset of AWS-compatible services:

https://www.eucalyptus.com/

clebio · on May 21, 2014

If only Redshift fully implemented Postgres (9.3).

nl · on May 21, 2014

11 nines durability!

I don't think I've ever seen that before. What does that work out to in terms of data loss?

jzwinck · on May 21, 2014

Amazon explains here: http://aws.typepad.com/aws/2010/05/new-amazon-s3-reduced-red...

"If you store 10,000 objects with us, on average we may lose one of them every 10 million years or so."

gjm11 · on May 21, 2014

Which is of course rubbish; events like the sudden fall of civilization, large-scale nuclear war, all governments on earth suddenly declaring very large computing or storage facilities illegal out of paranoia about runaway AI, etc., are surely much more probable than 10^-11 per year and have the potential to wipe out S3 entirely.

It's arguable that all of these are sufficiently major events that if any of them happens you won't care that you just lost all your data stored in S3 because you'll be too busy fighting off wolves, dying of radiation sickness, or whatever.

More to the point, though, I take it the real point of these many-9s guarantees is that if you store a very very large number of things on S3 the danger of losing one of them is still very small. So, for instance, if you store 10^8 objects in S3 then the probability of losing any of them in a given year is allegedly about 0.1%, which is actually fairly credible. (But when that happens, a substantial fraction of the time it's because of a really major disaster and you probably lose a lot more than one object.)

We could do with more precise ways of describing availability guarantees, to distinguish between a small probability of losing lots of data and a large probability of losing a tiny amount of data.

ceejayoz · on May 21, 2014

That argument applies to any uptime guarantee from any vendor. When Rackspace says "99.9% uptime!" no one complains that they should have to say "unless nuclear war happens".

gjm11 · on May 21, 2014

If the figure is 99.9% then I don't think it needs any such disclaimer; large-scale nuclear war is pretty improbable these days. It's only once you start getting to large numbers of 9s that these spectacular low-probability events begin to matter.

dmd · on May 21, 2014

> pretty improbable these days

http://upload.wikimedia.org/wikipedia/commons/4/4b/Doomsday_...

Spooky23 · on May 21, 2014

There's a difference between setting a 99.9% uptime on a monthly basis vs. implying that my data will be safe for a period of time exceeding the existence of humans as a species.

xxs · on May 21, 2014

Simple things like data breach (or human errors) are significantly more like that the 11 '9' bullcrap.

twistedpair · on May 21, 2014

They standup to it too. I know a number of large companies in town that store a ton of stuff in S3 and have never lost a one of them. However, they did once see some wonky API behavior a few times. Nevertheless, probably a lot better than your internal IT guys can provide.

Durability and Availability however are different. AWS did go down for 8 hours in 2012, but as implied by the Durability claims, all the objects were there when they came back up.

toomuchtodo · on May 21, 2014

Primary S3 (s3.amazonaws.com) is bi-coastal redundant. If us-east-1 goes down, your objects will continue to be served from their Oregon location transparently.

This doesn't apply if you're using S3 as a website endpoint though, unless you're doing your own service checks/DNS failover.

kolev · on May 21, 2014

Finally! This will save us a lot of money as we wrote tools to do this, but it costs keeps increasing, especially when you have to scan the whole bucket.

randall · on May 21, 2014

Whoa jeffbarr hangs out on HN? Go figure.

kolev · on May 21, 2014

Jeff Barr and Werner Vogels are two huge reasons I'm such a fan of AWS. Jeff is very responsive on Twitter as well.

jeffbarr · on May 21, 2014

Well, thank you, that's great to hear. I really enjoy what I do and I hope that it shows.

And to the GP post, I have been on HN for 7.23 years.