
Ask HN: Strategy for Dealing with 0.5PB of Data and AWS - esalman
I work at an Neuroimaging lab which is in the middle of a transition. We have about 0.5PB of data which may grow to 1PB in next 1-1.5 years. We have limited funding to utilize AWS services to store the data.<p>At any time we need to have maybe around 100TB of this data be available locally for computation at a high performance computing cluster. We have ample local storage.<p>Funding is a big concern, so what kind of solution (based on AWS) will be most cost-effective for this scenario? Do we need to think about offsite back-up when AWS is available? Is there a resource or case study which we can learn from?
======
kevinsimper
There is upside and downsides to both choosing AWS or hosting it yourself.

You should choose AWS with Glacier if you and your company values agility. You
will spend way less time securing, backing up and maintaining your data.

You should selfhost if you company values price, but are prepared to spend way
much more time and effort in keeping it cheap at the cost of agility and
innovation speed.

------
QuinnyPig
Depending upon how frequently you need this data and what your retrieval
latencies are for it, Deep Archive would cost $1,000 a month for a petabyte.

You might also look into Storage Gateway as a viable local caching solution
for any of the S3 storage tiers...

