As a general rule important data should be backed up to (at least) two separate places, and in this scenario I'd consider S3 to be one place.
(basically if you have similarly named images, they'll all get deleted with the "Delete AMI parts and deregister" feature)
If you think the humans at Amazon are somehow above making mistakes or bad decisions, then there is no need to backup your data.
If you don't fully trust the humans at Amazon, and what to be able to have access to your data, on your terms, at any time, you should back it up...
# colons in path confuse s3sync
NEWEST=$(/bin/ls -r $DIR | grep -v current | head -1)
# copy the newest copy of files into a new directory, creating hardlinks instead of duplicate files
cp -al $DIR/$NEWEST/ $DIR/$DATE/
# sync the s3 bucket against the new directory
$S3SYNC --recursive --make-dirs --delete --no-md5 -v $BUCKET: $DIR/$DATE/ 2>&1 | grep -v 'Could not change owner'
# update the "current" symlink link
test -e $DIR/current && rm $DIR/current
ln -sf $DIR/$DATE $DIR/current
# remove snapshots older than DAYS_TO_KEEP_BACKUPS days
find $DIR -maxdepth 1 -type d -mtime +$DAYS_TO_KEEP_BACKUPS -print0 | xargs -0 --no-run-if-empty rm -rv
There are already documented cases of Amazon losing data in S3.
The Amazon S3 SLA does not cover data loss at all, only service unavailable situations, which they will cover by crediting you.
And even if all this was not the case you have a responsibility to your customers you can not outsource that responsibility.
I have recommended to some customers that they back up S3 data to the S3 service running in another region. The new export service also provides a way to get physical copies of your data but, depending on how much data you have, it might not be practical.
Do you need more or do you think that is enough like that ?
We've been here before by the way:
Would love to see REAL documented cases of data loss on Amazon, and anything in the last year would be great too.
Not saying it's infallible, and you should always back up your data (Amazon can't prevent things like natural disasters - this is the point of backing up in general - you can't predict), but if you're going to act like you gave good examples, at least give good examples.
To categorically deny this because you think the cases are not 'good enough' is arguing that only when there is specific documentation about S3 losing data in the last couple of months or a year will convince you that Amazon S3 can indeed lose data. Even if Amazon S3 had never lost data before then there still would be no reason to assume that it could not happen.
S3 is made up from hardware and built by people. It can - and most likely will - fail again, it has already done so in the past. When the last case was is not really relevant, just like when the last earthquake was is not really relevant when you're living on a fault line.
Earthquakes - and data loss - are a fact of life in the IT business, you plan for them, or you weigh the economics of the risk and you decide that you can re-create your data at a lower cost than it will cost you to back it up over the average time to failure.
Amazon will not be able to magically recreate your data so if you have a business incentive to keep your data (such as a responsibility to third parties) then you should back it up.
It's that simple.
Oh, and regarding amazon customer service, note that it took them 11 days to pinpoint the fault, and customer data actually was lost.
Check Allans post at Jun 23, 2008 6:28 AM for a pretty good insight into how easy it is for S3 to break.
What also bothers me is that apparently all traffic for these customers was passing the same SPOF, a single load balancer.
Another thing to take home from this is to ALWAYS supply an MD5 of your data and keep an MD5 of what you sent.
Gmail, another example of a large body of data that end users have some attachement to has also occasionally lost data, see:
http://www.thebitguru.com/blog/view/252-Have you lost email on gmail
Sure, you could argue, gmail is not S3, but that is not relevant, the things they have in common (type of architecture, kind of hardware, run by very fallible people) are what matter.
If that's what you call categorically denying that it can happen....
Again, please find a case in the last 2 years even.
I think we both agree you should back up your data, and as an IT policy it's obviously incorrect to ever think you're 100% safe, and if you use S3 you should still be redundant if you want to get closer to that 99.9% limit. But you'll never be 100% - that's life.
The only reason I defend S3 so heavily is that compared to the other options you'd be using instead of (or better: in concurrence with) S3, it's probably among the safest, data loss wise.
Very very important caveat, there. If you have three copies of the data, but all of them are in the same S3 account, a prankster who steals your S3 creds can delete all of them in about ten seconds.
Or, if you make a typing mistake, you can do that to yourself. Boy oh boy, will that be an unhappy day.
Diversify, diversify, diversify.
I lost 1TB of data several months ago due to some backend issues with EBS and S3. Fortunately for me, it was just a backup of a backup of a backup. ;-)
Ultimately, my EBS device become unusable by any operating system, and Amazon support stated that the data was lost due to several backend systems failing.
Would you put all of your eggs in a bookstore who just recently decided to become an egg storage vendor?