

Ask YC: 40Tb in a year. Would you use Amazon? - inovica

Hi there. We're building a system which stores voice audio files. We're looking at good compression codecs for it (speex) but we're looking at 30 million minutes of audio a month which is looking like 40Tb of data storage a year. These kind of figures scare me a little!! Wondering if you would use Amazon for this or go another route?  Just interested to hear from anyone who's doing anything of a similar scale
======
dmix
SmugMug currently hosts 600TB of pictures on Amazon S3.

[http://gigaom.com/2008/06/25/structure-08-werner-vogels-
amaz...](http://gigaom.com/2008/06/25/structure-08-werner-vogels-amazon-cto/)

So yeah I'd probably use them.

~~~
bprater
Here's a S3 calculator that can help you figure out costs:

<http://calculator.s3.amazonaws.com/calc5.html>

A quick check tells me that ~40,000GB storage is about $6k/month, not
including xfer costs.

~~~
dmix
My first concern would be access to capital.

There are plenty of options with cloud storage.

I hope you have a way to balance that cash flow.

------
markbao
40TB?!

Look into other CDNs like Akamai or Limelight. I think the bulk price deal
you're going to get with an established CDN is better than the rates you'll
get with Amazon flat rates.

~~~
agotterer
If you arent doing that much traffic, amazon is a nice option. Otherwise CDN's
will offer you a better price. Check out bit gravity, they are smaller then
limelight and akamai but may be able to give you better attention. They also
have some nice features the other CDN's arent offering.

~~~
markbao
Ah, yes, BitGravity was the one I forgot. Diggnation uses them, I believe.

~~~
justin
Justin.tv uses Bitgravity to store over 60TB (I think it's much higher but I
haven't checked personally in a while). I can't recommend them highly enough;
we've had problems at 9pm on a sunday night and gotten the CTO on the phone
personally in 5 minutes. BG is flat out the best of any CDN/storage/any server
provider we've ever dealt with.

~~~
agotterer
Vimeo currently uses them and College Humor/Todays Big Thing is in the process
of moving over there.

------
secorp
We run a small specialized storage company and the things that seem to matter
most are: storage capacity, availability, reliability, transfer rates for both
current data usage and new data addition.

40Tb can be handled pretty well by S3 and other storage services and they have
pretty good pricing information to model your costs. Note that they don't
(yet) provide very specific SLA's for data availability, so keep that in mind
when designing your system.

Maintaining your own drives with some sort of redundancy (RAID, automatic
copies, etc.) or using something like (bias alert) our open-source project
<http://allmydata.org> which is effectively a software RAID layer both require
some IT and systems energy, so this has to be bundled into your operational
costs if you choose that route.

Just to emphasize what others have mentioned, it is important to incorporate
the new data influx rate into your model. If you are successful, 40Tb this
year might turn to 120Tb next year, so make sure that your cashflow model can
support the underlying cost of whatever system you choose.

------
ComputerGuru
Depending on your projections for future growth and how much cash you have,
I'd consider opening my own data center for that kind of storage.....

~~~
Tichy
Isn't 40TB just a paltry 40 hard drives (maybe 50 with redundancies)? A
dedicated data center seems a bit overblown for that?

~~~
ComputerGuru
Actually, it's more like 50 hard drives without redundancy (keeping in mind
TiB vs TB) and at least double that if we're talking quality redundancy
(assuming this is a for-profit company hosting people's sound clips then they
better have duplicates of everything).

I don't know enough about their model and what services they provide though.
I'm thinking 40TB a year storage, but ~100TB transfer per month or more -
which isn't a little.

But my biggest point is future expansion. 40TB in year one... how many in year
2?

~~~
Tichy
I have to admit I don't know anything about operating in those dimensions
(ianaa - I am not an admin). But HD capacities are growing fast - maybe faster
than the data needs of that company? So perhaps they would not have to buy
more and more hard drives, only replace the old ones with bigger ones?

All hypothetical, though - personally I think I would go for something like
S3.

------
bigbang
I havent done anything similar. But just my 2c - try to see if you can make
use of existing file sharing systems like rapidshare/megaupload etc and link
to them. I believe hotornot guys used free yahoo photos hosting and just
linked to them to save on bandwidth(it worked then)

~~~
markbao
Rapidshare, Megaupload, etc. don't allow audio streaming. Nor do I want to
wait 120 seconds and solve a captcha that has to do with finding which letters
have cats attached to them as opposed to dogs to get an audio file.

~~~
bigbang
They have premium accounts which doesnt have issues of waiting(not sure about
captcha part). You can buy a premium service and in your streaming player just
point to the link in megaupload ?

------
nasser
Do not underestimate the transfer costs. The information you have here - size
of data "stored" - is only one factor. You need to have some estimates about
your transfer in and out, and that will tell you whether Amazon makes sense or
you have to go with a CDN.

