
Does Amazon S3 really save money? - psaccounts
With a price tag of $0.150/GB/month, storing 1TB of data costs around $150/month on Amazon S3. But this is a recurring amount. So, for the <i>same</i> amount of data it would cost $1800/year and $3600/2-years. And this doesn't even include the data transfer costs.<p>Consider the alternative, with colocation the hardware cost of storing 1TB of data on two machines (for redundancy) would be around $1500/year. But this is fixed. And increasing the storage capacity on each machine can be done at the price of $0.1/GB. Which means that a RAID-1+redundant copies of data on multiple servers for 4TB of data could be achieved at $3000/year and $6000/2-years in a colocation facility. Whereas on S3 the same would cost $7200/year and $14,400/2-years.<p>Also, adding bandwidth+power+h/w replacement costs at a colocation facility would still keep the costs significantly lower than Amazon S3.<p>Given this math, what is the rationale behind going with Amazon S3? The Smugmug case study of 600TB of data stored on S3 seems misleading.<p>I do see several services that offer unlimited storage which is actually hosted on S3. For example, Smugmug, Carbonite etc. all offer unlimited storage for a fixed annual fee. Wouldn't this send the costs out of the roof on Amazon S3?<p>If your startup is using Amazon S3 for its storage needs, for the benefit of the startup community, can you please elaborate your rationale for choosing this service?
======
mdasen
The reason is that you don't have to deal with it. Amazon isn't magic and they
use the same hardware available to everyone else. Sure, there's some scale
involved when it comes to labor and power and bandwidth, but they can't
undercut what you can do yourself.

However, spraying files everywhere is a pain! MogileFS makes it a lot better,
but you're still in charge of monitoring it and making sure it's healthy. With
only two boxes, you have to be always on call so that you can order another
box from your provider fast.

Plus, there's the issue of multiple data centers. S3 doesn't just make
redundant copies of your data. It makes copies across data centers. So, you're
paying $0.10/GB for data in, but you don't have to pay for when it replicates
copies into several data centers.

You also have to realize that you have to pay for excess capacity anytime
you're doing your own storage system. If you like to keep a 50% buffer (a
reasonable size), you're going to be paying 1.5x the base cost of $0.10/GB
that you've come up with.

And then there's the issue of having to make sure you're monitoring it and
that if you see a spike in storage usage you can add drives fast enough. . .

You pay for a bit of convenience with S3. I'm not going to argue that it's
cheaper, but it's definitely a lot less headache. Are you going to colo
several boxes in different data centers, constantly monitor the storage, make
sure that they serve the files properly, making sure that more copies get
replicated if one server dies, replace drives as they fail, adding more
servers as needed. . .

If you're on a large scale, I'd say you should do your own storage because you
can justify making that someone's job (or a large enough portion of their
job). I'm not sure I agree with SmugMug using S3, but I'm not sure I disagree
either - it allows them to concentrate on what they want to do. Remember, for
every tech person on HN, there's 100 that will say they're doing backups and
aren't (ok, maybe not true, but you have to find an employee to manage your
storage who you trust as much as Amazon).

However, most people don't have that much to store. If you're storing 100GB of
data, you'd then be paying for multiple servers all with RAID and managing
MogileFS or the like for what? 20% savings? $150/year? I'm as cheap as the
next person, but I also like sleep. I don't want a pager calling me telling me
that one of my two file stores is down and that I need to provision and
configure a new box at 2am. And do you want to focus your time on creating a
compelling product that your customers think is awesome or do you want to
spend your time creating an awesome file store that works really well? Life
has tradeoffs. You're not wrong, but I don't see Amazon as ripping people off
with their pricing and I don't mind someone profiting from giving me a hassle-
free, no-lock-in solution.

EDIT: I personally think your estimate of buying boxes and colo'ing them is a
tad low so my 20% might be your 50% and so it might make sense by your numbers
more. Maybe I've just seen crappy colo offers. Link if you know good ones! I
love being proved wrong.

~~~
jedc
Thank you for such a detailed and reasoned answer. I'm learning a hell of a
lot from this thread already.

~~~
mdasen
Well, I think what it really comes down to is what your business is. If your
business is storage, get good at doing your own storage. If your business is
web applications, get good at web applications. Then there are companies on
the cusp. Flickr is clearly storage heavy enough that they need to be a
storage company. 37signals, on the other hand, stores attachments and some
pictures, but it their primary function isn't storage - it's the interaction
with that stored content.

Is storage such a thing for your business that you're willing to put a lot of
labor behind its solution? Or is storage important to your business, but as
long as you get something reliable it doesn't have to be the most efficient
possible because it's a small part of your business relative to the other
things you do (like HTML, CSS, Ruby/Python/Perl/et al., MySQL/PostgreSQL)?
Your time might be better spent on other company work than on storage.

------
onethumb
Hey there, I'm the CEO & Chief Geek at SmugMug. You're overlooking a few
things:

\- Amazon keeps at least _3_ copies of your data (which is what you need for
high reliability) in at least _2_ different geographical locations. That's
what we'd do ourselves, too, if we continued to use our own storage
internally. So your math is off both on the storage costs and then the costs
of maintaing two or more datacenters and the networks between them.

\- When Amazon reduces their prices, you instantly get all your storage
cheaper. This isn't something you get with your capital expenditure of disks -
your costs are always fixed. This has upsides and downsides, but you certainly
don't get instant price breaks to your OpEx costs. When they added cheaper,
tiered storage, our bill with Amazon dropped hugely.

\- There's built-in price pressure with Amazon, too. The cost of one month's
rent is roughly the same as the cost of leaving. So if it gets too expensive
(or unreliable or slow or whatever your metrics are), you can easily leave.
And Amazon has incentive to keep lowering prices and improving speed &
reliability to ensure you don't leave.

\- CapEx sucks. It's hard on your cashflow, it's hard on your debt position if
you need to lease or finance (we don't, but that just means it's even harder
on our cashflow), it's hard on taxes (amortization sucks), etc etc. I vastly
prefer reasonable OpEx costs, with no debt load, which is what Amazon gets us.

\- Free data transfer in/out of EC2 can be a big win, too. It is for us,
anyway.

\- Our biggest win is simply that it's easy. We have a simpler architecture, a
lot less people, and a lot less worry. We get to focus on our product (sharing
photos) rather than the necessary evils of doing so (managing storage). We
have _two_ ops guys for a Top 500 website with over a petabyte of storage.
That's pretty awesome.

Hope that helps!

~~~
dhotson
Just curious, do you keep your own backups as well? Do you have any
contingency plans for if Amazon's services go down?

I'm not trying to be cynical, but I'd hate to be in the position where Amazon
has your business by the balls if something goes wrong. How do you guys deal
with this?

~~~
socmoth
amazon did go down once. smugmug has a blog post about it and what they did.
go google it. the entire blog is amazing and informative and i highly
recommend it.

i'd link you, but i'm lazy

~~~
songism
[http://blogs.smugmug.com/don/2008/02/15/s3-outage-we-
werent-...](http://blogs.smugmug.com/don/2008/02/15/s3-outage-we-werent-
affected/)

or maybe this: [http://blogs.smugmug.com/don/2007/01/30/amazon-s3-outages-
sl...](http://blogs.smugmug.com/don/2007/01/30/amazon-s3-outages-slowdowns-
and-problems/)

------
aristus
You are paying a premium for scaling, bandwidth, operations and lower capital
cost.

Being able to smoothly scale from 1TB to 2TB (or down to 500GB) is nothing to
sneeze at. Nor is having a metered 250mbps connection (shop around -- hard to
get less than $30/mbps at low volume), or having someone else handle the pager
24x7x365, or paying $150 at the _end_ of the month instead of $5,000 up front.

There are systems and scales for which S3 is actually too expensive, and of
course Amazon is making a profit off of all of this.

But there are a lot of hidden costs to DIY. Ask anyone who's tried to get 6
more servers flown in on the weekend, or had to cut short a holiday to drive
to the damned colo at 3am, or overbought capacity, etc.

(edit) As for the "unlimited" option -- SM et al know to the byte what their
average user uses so they price the unlimited option to make a profit on the
average case.

------
andr
Hypothetical case study:

You want to host a liveblog for the Apple keynote at Macworld. No matter if
you are a small site or Engadget, your traffic for that 90 minutes would be 2x
to 100x bigger than your daily average.

So do you:

a) buy or rent extra servers for the whole month, spend a few man-weeks (or
pay somebody for) setting them up, working on their synchronization, etc.

b) write a small script that regularly uploads the static liveblog HTML (or
JSON) to an S3 dubomain and rely on Amazon's thousands of servers and flexible
scaling (Dynamo, which powers S3, will allocate as many servers as needed to
handle your load) to do the work?

Granted, option b)'s per-GB cost would be a bit higher, but your fixed cost
for labor and hardware in a) would be even bigger.

------
jodrellblank
Don't forget time to spec servers, install the OS/Apps/Backup system,
configure, test, drive to the datacenter, install the kit, document it. Also
you will need to spend time setting up an account with a rackspace provider
and arranging all the DNS, public IPs, any necessary firewalls, etc.

Also, server warranties, UPSs and the sheer hassle of specifying, ordering
configuring servers and taking them to somewhere and fiddling with them and
regularly patching them and so on.

 _And increasing the storage capacity on each machine can be done at the price
of $0.1/GB_

No it can't, you need to pay a competent admin to go to the servers, shut them
down, install new drives, and start them up again and expand the RAID onto
them. 1hr minimum. Assuming the RAID is expandable and there is physical space
in the server - if you need to add more mounted disks the app must be adjusted
to support that. If there is no space, you may need to replace some drives and
handle moving the data onto them (several hours?), or worse buy a new extra
server.

Assuming your backup system can just take another TB of data without any
changes.

And what if something does go wrong? You're talking of at least half a day
from the time you find out until you get someone to go, wait for the travel
time, diagnose and repair and restart, then dealing with problem reports and
complaints.

I hope Amazon has better monitoring on S3 than anything I have set up as well.

------
barryrandall
The problem is that storage costs are a step function. Once you cross a
certain threshold (the threshold depends on your performance requirements
a.k.a. IOPS), storage gets Really Freaking Expensive (tm). The steps start to
get really, really steep as your capacity increases, and that's just for the
primary copy of your data.

Once you get into truly large data amounts, other things start to break (RAID
5, RAID 6, tape backup, disk backup, synchronization, the ability to replace
storage systems without massive outages). The good news is that they're almost
all solved problems, but you're usually stuck with buying overpriced crap from
EMC, Hitachi, NetApp, 3PAR and IBM (storage is a protection racket). All of
this combines to explain why a good storage admin pulls down 6 figures a year.

I may be a bit myopic, but I see a world coming where technology startups
trade capital costs for operating costs. S3 is pricey if you're dealing with
small quantities of data, but once the step increase in your per-GB storage
costs goes over 30%, you might want to reconsider. The steps only get bigger.

~~~
sokoloff
Something about your argument is not making sense. You say that as storage
needs increase, the costs go up dramatically (that there are significant
DISeconomies of scale).

Yet Amazon is acting as an aggregator of all those ever-increasing storage
needs, operating at even higher scale, yet is able to turn a profit providing
a service at scale. Amazon almost certainly isn't building their own hardware.

It seems you're arguing in circles somewhere there...

~~~
barryrandall
Sorry, I forgot to mention one of my assumptions...

There are tremendous diseconomies of scale if: you're buying the storage
through traditional storage vendors. When your storage needs get really big,
you'll realize that stuff is 1) a waste of money, and 2) not meeting your
needs (you shape your needs to fit the available products, not vice versa). At
that point, it makes enough sense to roll your own storage solution (write
your own S3), tailored to your very specific needs.

------
dmv
James Hamilton posted a fairly robust model of how to think about this (and
evaluate S3), just prior to joining Amazon.

[http://perspectives.mvdirona.com/2008/12/22/TheCostOfBulkCol...](http://perspectives.mvdirona.com/2008/12/22/TheCostOfBulkColdStorage.aspx)

His is a more datacenter-level view (considering $/GB, from the perspective of
arbitrary GB), but builds a convincing case why S3 @ $0.18/GB is fair).

------
brk
Whenever you outsource anything to an "as a service" company, there are a
couple of sweet spots where it makes sense, and a (generally) larger area
where it does not make sense to pay someone else to do the service for you.

Amazon makes sense for companies that are too small to dedicate an admin to a
server and/or to load that server and bandwidth to 60%+ capacity. At some
point, when you have a need for a small server farm and the uptime of those
servers is core to your business you will likely find it is more cost
effective to do the operations in-house.

Think about companies like Salesforce.com, for their datacenter needs, I don't
think you could ever justify Amazon (or a similar) service.

As a rule of thumb, I charge customers in my colo facility $60/U/mo as a
starting price. You get a shared pipe and relatively unmetered/unlimited
bandwidth (for "business" use, no torrent hosting or stupid shit). For a
little more you can get dedicated bandwidth, 95% billing, etc. I've got
someone who is looking at bringing in a 3U backup server, they'll pay $200/mo
to have 3mb dedicated bandwidth and the colo space. That server will have a
few TB of data replicated from their main site, I don't think S3 would make
sense for them.

------
callmeed
Regarding those services that offer "unlimited storage", I think they are
really banking on the fact that many people _won't_ use enough space to cause
the company a loss. Sure, there are heavy users, but when it all averages out,
the company (hopefully) makes money.

Also, you have to read the terms of some services closely. IIRC, some of the
online backup services are limited to "personal use only". They have separate
pricing tiers for business backup (because they know those customers will use
more).

~~~
rs
That's called "over-selling" - its a pretty common model amongst any resource
intensive businesses, which reminded me of a blog entry on the Dreamhost blog
where Josh Jones makes a connection between the recent finance market debacle
and over-selling: <http://blog.dreamhost.com/2008/10/22/how-to-make-money/>

Edit: the blog post is meant for light reading

------
donna
With AWS i don't have to buy, set up, maintain or upgrade the hardware... or
manage employees who do. Nor do i own the hardware. Nor do i have to travel to
and from a datacenter.

For that service, the cost is very reasonable.

~~~
trezor
This goes for any hosting service and not just AWS as long as we are not
talking about VPSes though.

I just did the math on what using S3 instead of my current business class
shared hosting plan would cost me for one of my audio streaming sites.

I have around 10-15GBs of data which gets rotated regularly, so nothing is
kept around for longer than around two months. With the amount of traffic I
get, around 70GBs per month, and the number hits I get, I still pay less for
my business class shared hosting plan than I would pay for S3.

And I also have full shell access and scriptability.

To top it off my plan actually covers 1TB traffic a month, so my site could
scale up to 12 times traffic-wise before I would need to upgrade my account.
At that cross-over point Amazon S3 would still be 54% more expensive than my
current solution.

This might not apply to everyone, but in my case getting S3 would be plain
dumb.

~~~
donna
just be aware because you're on a shared host, they may kick you off to a more
expensive machine if you start using all that bandwidth.

------
callmeed
SmugMug's cheapest plan is $40/year, which is $3.33 a month

S3's most expensive tier is $0.15gb/mo

So, not counting bandwidth, a user would have to store roughly 22GB of images
to eat up that cost.

Being conservative, a high-res JPEG out of camera (12mp) is probably 3mb on
average. So, we're talking about over 7,000 images.

~~~
brk
FYI, most people using high-res DLSR's are storing data in .raw format,
instead of a lossy (jpeg) format.

My 12.8MP Canon EOS5D produces .RAW files of around 13MB each. I don't store
in .jpg, but generally post-process a jpg thumbnail and mid-size image to make
browsing my photo catalog easier (or for emailing samples).

Edits to pics are always saved as a sub-rev, so it's not uncommon for all the
files associated with an image to add up to 20-40MB. This is fairly standard
for anyone who shoots for $$ (full time, or for a hobby as I do).

Anyway, kind of off topic, but your 3mb number is too low :)

~~~
callmeed
I'm fully aware of the raw file aspect (I shoot professionally with various
gear, including a 5D). I didn't consider it in my calculations for two
reasons:

\- SmugMug doesn't include it RAW file storage as part of the their accouts.
You have to pay extra for it: <http://www.smugmug.com/help/smugvault>

\- Most pro photogs are using services like SM after first editing their raw
files locally (in Aperture, Lightroom, etc.) and then exporting high-res
JPEGs.

~~~
ojbyrne
I have a 5d and use flickr, where I typically upload jpegs at full res and the
highest quality settings. It's not uncommon for me to run into flickr's 10 meg
limit. I'd say the average jpeg size that I upload there is more like 7 mb.

~~~
callmeed
I personally haven't seen a lot of 7+mb JPEGs from a 5D but I'm sure it's
possible. When I output images from a wedding at level 10, they generally
range from 2 to 5mb. My editorial work usually gets exported as tiffs, so
those are huge.

I'd consider saving your JPEGs at level 10 in Photoshop (max is 12). It's
unlikely you'd ever need or notice the difference. In fact, my print lab
specifically requests this.

~~~
ojbyrne
I do always use 12, in fact I generally crop first before reducing the
quality. I probably don't notice the difference, but since flickr rescales to
make several copies I always felt that uploading at the highest quality was a
good idea.

~~~
ojbyrne
My 5D Mk II just shipped. Right around the time I posted this comment. I feel
serendipitous.

------
umangjaipuria
Besides all the other points in favour of S3, I'd like to add one small note:
in any business, you want to shift your fixed costs over to variable costs.

------
lsc
I use S3 for small things... it can't be beat if you only need to back up a
few gigs. I agree, though, that once you scale up it gets pretty expensive. I
host my own backups for the heavy stuff. (there are other providers willing to
host your backups for cheaper as well, if you are willing to buy in terrabyte-
size chunks)

------
zaidf
That Amazon S3 is cheaper than traditional dedicated servers is a myth that's
been spreading for a while now.

You can argue all you want about how reliable and convenient it is. But it's
not cheaper. In fact, it can be very expensive especially as you consume more
bandwidth and storage.

~~~
jodrellblank
It's not so much arguing that it is cheaper, but that there is still rationale
for choosing it even though it's more expensive.

Of course it gets more expensive the more you use it - but so does doing it
yourself. And not on a nice smooth scale, either.

~~~
zaidf
There are certainly many cases for using S3 even though it is more expensive.
There is nothing wrong with that.

It's just that I have met way too many people who after describing my
situation, one of the first things they say is "have you tried S3? shouldn't
it be cheaper?"

------
jwr
You forget management costs -- someone has to deal with this storage, service
the computers, replace the drives, add them, add additional computers, etc.

~~~
jasonkester
Indeed. If your business relies on keeping 1TB of data alive, you're going to
have dedicated staff keeping an eye on it.

So now, you're looking at ~$2k hardware + $150k salary for your dedicated box
watcher.

Compare that to $3k for S3 where it's their problem to keep your stuff backed
up and deal with hardware problems.

It's a no-brainer if you ask me.

------
timf
I haven't looked back on the smugmug case since that came out, but just wanted
to note that 600TB would cost $0.12/GB/mo not $0.15.

~~~
dcurtis
It's a stepped rate. You pay .15 for the first 600, then .12 for all aditional
data beyond that.

~~~
timf
Ah, I forgot about the stepped thing, thanks. It's 0.15 only for the first
50TB though:

$0.150 per GB – first 50 TB / month of storage used

$0.140 per GB – next 50 TB / month of storage used

$0.130 per GB – next 400 TB /month of storage used

$0.120 per GB – storage used / month over 500 TB

