Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Do you trust Amazon S3 or Mosso Cloudfiles not to lose or corrupt your data?
15 points by bk on Mar 23, 2009 | hide | past | web | favorite | 16 comments
Do you treat these systems as "reliable" and "safe" backends (since they're physically and geographically replicated) or do you feel it's still necessary to back up data hosted on their services?

How/where do you back up that data to, especially with large amounts of data (e.g. 100s of GB+)?

S3 does this for a living. I don't.

S3's entire business revolves around never losing anybody's stuff. They have hordes of smart people working on the problem, and they have an architecture that makes it really hard to lose anything by accident.

My business, on the other hand, revolves around letting people draw cartoon testicles onto other people's powerpoint presentations in the pretense of a "web meeting". Which of us would you rather trust to keep hold of your valuable data?

S3 does this for a living. I don't.

So does Carbonite, and, as you can see in today's news, it also lost people's data. I strongly suspect Amazon does a better job than Carbonite with both software and infrastructure, but still. Mistakes happen, bugs happen, and data gets lost even by smart people working for solid and reputable companies.

S3 only backups = eggs in one basket. It's a terrific, strong basket made of titanium and suspended on aircraft cable, but it's still just one basket.

So does Carbonite...

Good example - when I read the story I was not surprised. It was bound to happen to at least a company or two. I used carbonite for a bit before I went back to my own server (w/nightly backup). I thought I let my paranoid tendancies take over, but I just never felt good backing up my data to carbonite, et. al.

I also agree that S3 is much more failproof but would still rather keep things on my own server. I may be doing different in the future though, but for now there is no place like my own server!

The point here is not about picking winners between S3 and you/me; Probably that's a no-brainer at this point.The thing to worry about is whether S3 is reliable enough that you can trust your user's data with S3 alone.

While S3 might be doing this for a living, Amazon doesn't. AFAIK,revenues from cloud services are not at all significant given Amazon's scale. What does S3 license say? Is Amazon liable if it loses data stored in S3?

I trust S3 and Mosso Cloudfiles more-so than my own single-failure HDD. They have a lot of redundancy built into their system, and although every system has risks, its risk are far less than our person undistributed implementations of file storage.

It pretty much comes down to this. No one can guarantee 100% redundancy, not even Amazon. Although their setup and infrastructure will probably reduce overall MTTF (mean tike to failure) in comparison to a small hosting company or your personal backup solution. Very simple math can show that their system is probably more reliable then others, yours or mine.

Sure? It's like anything else, don't trust a single solution. In addition to local backups S3 is good, in addition to S3, Mosso. There will always be accidents, data loss, corruption, etc. The only way to mitigate that risk is simply to cover your bases and avoid relying on a single thing.

Agreed, I have backups on my laptop and external hard drive, so if one of my drives fail I'm covered. If both fail, well that's why I've got the really important stuff backed up on my FTP server.

Every few months I email my hotmail account with all my writing. As text is ridiculously compressible I haven't even hit the 10MB attachment limit. I also have all my more current documents saved in Google Docs, mostly for the portability but also for the very slim chance of an "Oh my god I broke my laptop, holy crap my house burnt down, and dammit I forgot how to connect to my FTP server and for the love of god I forgot the password to my hotmail account."

Just use both at the same time. Use S3 for active stuff, and Mosso as your secondary backup. The chances of S3 and Mosso crapping out at the same time are pretty much nill. And the cost of hosting something on S3/Mosso, as a one time backup is dirt cheap

They are as safe as any other service provider. Ultimately, its always good practice to:

1. Do your own backups

2. Routinely test that you can recover from these backups

I would argue that point (2) is much more important than point (1). I do try to do that at least once a month to ensure that there aren't any bugs in the backups, including any missing parts of the infrastructure.

For mirroring really large data, rsync is a viable solution.

Edit: I do want to add that performing your own backups is really subjective and you might need to ask yourself - what's the cost to me/my business/my users in the event that I can't recover from backups and or my provider failed in their own reliability (for e.g. Carbonite)

I have a lot of faith in S3, more so than any local storage I may have.

My site, cuuute.com is hosted on EC2. I use Elastic block for database storage, and I back both that and my EC2 instance to S3 on a regular basis.

I've been using that as my hosting for 2-3 months, and I couldn't be happier.

S3, Mosso and other cloud providers are not an appropriate back up mechanism. These are good for sharing files and a temporary storage system. The only legitimate backup is a physically stored disk. S3 doesn't provide versioning nor deletion protection.

The problem is I hear about these new "Cloud" storage companies claiming backup, but when asked what do they do. They rely on Amazon to move the files into different data centers. But anyone can delete a file or directory by accident and poof the files are gone forever.

If the storage provider does have permanent physical storage in there backup plans, don't think your files are forever.

Yes, that's a good example of why not to use S3 as a simple backup destination. However software like Jungle Disk on running on top of S3 adds versioning and deleted file retention to make it act like a real backup system.

Having a local backup is great too, but that won't protect you from fire, flood, or theft in many cases.

I wasn't talking about zipping up your files on a CD and putting them in a draw. There are companies like IronMountain, which give you off-site physical storage.

I am sure the deletion protection Jungle Disk provides is good but what if somebody internally deletes a S3 file or directory of files they are gone for ever. This isn't an S3 problem just a issue with the strategy.

Someone at Iron Mountain or Fedex could lose your backups too (accidentally or maliciously). Neither is too likely, nor is someone internally deleting data at S3. That said, if you have data you can't ever take a chance to lose, don't store it in only one place ever - not just on S3, not just on a USB drive, not just in off-site physical storage. Keep at least two copies, maybe more. One of the things we're working on is allowing you to backup to multiple cloud providers - so even in the unlikely event of a catastrophic cloud failure you'd have other copies available.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact