With a price tag of $0.150/GB/month, storing 1TB of data costs around $150/month on Amazon S3. But this is a recurring amount. So, for the
same amount of data it would cost $1800/year and $3600/2-years. And this doesn't even include the data transfer costs.
Consider the alternative, with colocation the hardware cost of storing 1TB of data on two machines (for redundancy) would be around $1500/year. But this is fixed. And increasing the storage capacity on each machine can be done at the price of $0.1/GB. Which means that a RAID-1+redundant copies of data on multiple servers for 4TB of data could be achieved at $3000/year and $6000/2-years in a colocation facility. Whereas on S3 the same would cost $7200/year and $14,400/2-years.
Also, adding bandwidth+power+h/w replacement costs at a colocation facility would still keep the costs significantly lower than Amazon S3.
Given this math, what is the rationale behind going with Amazon S3? The Smugmug case study of 600TB of data stored on S3 seems misleading.
I do see several services that offer unlimited storage which is actually hosted on S3. For example, Smugmug, Carbonite etc. all offer unlimited storage for a fixed annual fee. Wouldn't this send the costs out of the roof on Amazon S3?
If your startup is using Amazon S3 for its storage needs, for the benefit of the startup community, can you please elaborate your rationale for choosing this service?
However, spraying files everywhere is a pain! MogileFS makes it a lot better, but you're still in charge of monitoring it and making sure it's healthy. With only two boxes, you have to be always on call so that you can order another box from your provider fast.
Plus, there's the issue of multiple data centers. S3 doesn't just make redundant copies of your data. It makes copies across data centers. So, you're paying $0.10/GB for data in, but you don't have to pay for when it replicates copies into several data centers.
You also have to realize that you have to pay for excess capacity anytime you're doing your own storage system. If you like to keep a 50% buffer (a reasonable size), you're going to be paying 1.5x the base cost of $0.10/GB that you've come up with.
And then there's the issue of having to make sure you're monitoring it and that if you see a spike in storage usage you can add drives fast enough. . .
You pay for a bit of convenience with S3. I'm not going to argue that it's cheaper, but it's definitely a lot less headache. Are you going to colo several boxes in different data centers, constantly monitor the storage, make sure that they serve the files properly, making sure that more copies get replicated if one server dies, replace drives as they fail, adding more servers as needed. . .
If you're on a large scale, I'd say you should do your own storage because you can justify making that someone's job (or a large enough portion of their job). I'm not sure I agree with SmugMug using S3, but I'm not sure I disagree either - it allows them to concentrate on what they want to do. Remember, for every tech person on HN, there's 100 that will say they're doing backups and aren't (ok, maybe not true, but you have to find an employee to manage your storage who you trust as much as Amazon).
However, most people don't have that much to store. If you're storing 100GB of data, you'd then be paying for multiple servers all with RAID and managing MogileFS or the like for what? 20% savings? $150/year? I'm as cheap as the next person, but I also like sleep. I don't want a pager calling me telling me that one of my two file stores is down and that I need to provision and configure a new box at 2am. And do you want to focus your time on creating a compelling product that your customers think is awesome or do you want to spend your time creating an awesome file store that works really well? Life has tradeoffs. You're not wrong, but I don't see Amazon as ripping people off with their pricing and I don't mind someone profiting from giving me a hassle-free, no-lock-in solution.
EDIT: I personally think your estimate of buying boxes and colo'ing them is a tad low so my 20% might be your 50% and so it might make sense by your numbers more. Maybe I've just seen crappy colo offers. Link if you know good ones! I love being proved wrong.