Hacker News new | past | comments | ask | show | jobs | submit login

At that kind of scale, S3 makes zero sense. You should definitely be rolling your own.

10PB costs more than $210,000 per month at S3, or more than $12M after five years.

RackMountPro offers a 4U server with 102 bays, similar to the BackBlaze servers, which fully configured with 12GB drives is around $11k total and stores 1.2 PB per server. (https://www.rackmountpro.com/product.php?pid=3154)

That means that you could fit all 15TB (for erasure encoding with Minio) in less than two racks for around $150k up-front.

Figure another $5k/mo for monthly opex as well (power, bandwidth, etc.)

Instead of $12M spent after five years, you'd be at less than $500k, including traffic (also far cheaper than AWS.) Even if you got AWS to cut their price in half (good luck with that), you'd still be saving more than $5 million.

Getting the data out of AWS won't be cheap, but check out the snowball options for that: https://aws.amazon.com/snowball/pricing/




[disclaimer: while I have some small experience putting things in DC, including big GPU servers, I have never been anywhere near that scale, certainly not storage]

10k $ is for a server with no hard drive. W/ 12 Gb disks, and with enough RAM, we're talking closer to 40-50k$ per server. Let's say for simplicity you're going to need to buy 15 of those, and let's say you only need to replace 2 of them per year. You need 25 over five years, that's already ~ 750k $ over 5 years.

And then you need to factor in the network equipment, the hosting in a colocation space, and if storage is your core value, you need to think about disaster recovery.

You will need at least 2 people full time on this, in the US that means minimum 2x 150k$ of costs per year: over 5 years, that's 1.5m$. If you use software-defined storage, that's likely gonna cost you much more because of the skill demand.

Altogether that's all gonna cost you much more than 500k$ over 5 years. I would say you would need at least 5x to 10x this.


yes, the TCO needs consideration, not just the metal


After a certain size, AWS et al simply don't make sense, unless you have infinitely deep pockets. For storage that you pull from, AWS et al charge bandwidth costs. These costs are non-trivial for non-trivial IO. I worked up financial operational models for one of my previous employers, when were were looking at costs of remaining on S3 and using it, versus rolling it into our own DCs. The download costs, the DC space, staff, etc. was far less per year (and the download cost is a 1 time cost) than the cold storage costs.

Up to about 1PB with infrequent use, AWS et al might be better. When you look at 10-100PB and beyond (we were at 500PB usable or so last I remembered) the costs are strongly biased towards in-house (or in-DC) vs cloud. That is, unless you have infinitely deep pockets.


I should add to this comment, as it may give the impression that I'm anti cloud. I'm not. Quite pro-cloud for a number of things.

The important point to understand in all of this is, is that there are cross-over points in the economics for which one becomes better than the other. Part of the economics is the speed of standing up new bits (opportunity cost of not having those new bits instantly available). This flexibility and velocity is where cloud generally wins on design, for small projects (well below 10PB).

This said, if your use case appears to be rapidly blasting through these cross-over points, the economics usually dictates a hybrid strategy (best case) or a migration strategy (worst case).

And while your use case may be rapidly approaching these limits (you need to determine where they are if you are growing/shrinking), there are things you can do to risk and cost reduce transitions ahead of this.

Hybrid as a strategy can work well, as long as your hot tier is outside of the cloud. Hybrid makes sense also if you have to include the possibility of deplatforming from cloud providers (which, sadly, appears to be a real, and significant, risk to some business models and people).

None of this analysis is trivial. You may not even need to do it, if you are below 1PB, and your cloud bills are reasonable. This is the approach that works best for many folks, though as you grow, it is as if you are a frog in ever increasing temperature water (with regard to costs). Figuring out the pain point where you need to make changes to get spending on a different (better) trajectory for your business is important then.


And again at at even larger size it makes sense again with >80% discounts on compute and $0 egress.


We had taken the discounts into account (we had qualified for them). The $0 egress was not a thing when we did our analysis. And we were moving 10's of PB/month. BW costs were running into sizable fractions of millions of dollars per month.


The thing about fitting everything in one rack, potentially, is vibration. There have been several studies into drive performance degredation from vibration, and there's noticeable impact in some scenarios. The Open Compute "Knox" design as used by Facebook spins drives up when needed, and then back down, though whether that's for vibration impact, I don't know (their cold storage use [0]).

0: https://datacenterfrontier.com/inside-facebooks-blu-ray-cold...

https://www.dtc.umn.edu/publications/reports/2005_08.pdf

https://digitalcommons.mtu.edu/cgi/viewcontent.cgi?article=1...


Here is Brendan Gregg showing how vibrations can affect disk latency:

https://www.youtube.com/watch?v=tDacjrSCeq4


I'm an absolute noob here, but is using SSD racks for storage a feasible option cost wise and for this issue in particular?


Absolutely it's an option

It's gonna cost more

But it's also going to be nearly vibration-free (just the PSU fans), and stupidly-fast


No, ssds are still way to expensive If you don't need the performance.


10PB costs more than $210,000 per month at S3, or more than $12M after five years.

Your pricing is off by a 2X - he said he's ok with infrequent access, 1 zone, which is $0.01/GB, or $100K/month.

If he rarely needs to read most of the data, he can cut the price by 1/10th by using deep archive, $0.00099 per GB, so $10K/month, or around $600K over 5 years, not including retrieval costs.


Nope, can't use Deep Archive as he specified max retrieval time of 1000ms. But you're correct with S3-IA


For a 10X reduction in cost, things that are impossible often become possible.


> Nope, can't use Deep Archive as he specified max retrieval time of 1000ms.

If accesses can be anticipated, pre-loading data from cold storage to something warmer might make it viable.


>RackMountPro offers a 4U server with 102 bays, similar to the BackBlaze servers, which fully configured with 12GB drives is around $11k total and stores 1.2 PB per server. (https://www.rackmountpro.com/product.php?pid=3154)

I dare you to buy 102 12TB drives for $11k

The cheapest consumer class 12GB hdd is ~$275 a pop

That's $28k just for the drives


If you have a PBs of data that you rarely access, it seems to make sense to compress it first.

I've rarely seen any non-giants with PBs of data properly compressed. For example, small JSON files converted into larger, compressed parquet files will use 10-100x less space. I am not familiar with images but see no reason why encoding batches of similar images should make it hard to get similar or even better compression ratios

Also, if you decide to move off later on, your transfer costs will also be cheaper if you can move it off in a compressed form first.


couple be wrong but I don't believe compression of batches of compressed images compresses well

but it'd be very interested to here about techniques on this because I have a lot of space eaten up by timelapses myself


It's not about space reduction, it's about handling the small file problem. HDFS can handle up to 500M files without issue but the amount of RAM needed to store the files' metadata starts to go beyond what you'd typically find in a single server these days.

When you store multiple images and/or videos inside of a single PQ file, you'll end up keeping fewer files on your server.

I believe Uber store JPEG data in PQ files and Spotify store audio files in PQ or a similar format on their backend.


On the contrary, batches of images with a high degree of similarity compress _very_ well. You have to use an algorithm specifically designed for that task though. Video codecs are a real world example of such - consider that H. 265 is really compressing a stream of (potentially) completely independent frames under the hood.

I'm not sure what the state of lossless algorithms might be for that though.


Best I know of for that is something like lrzip still, but even then it's probably not state of the art. https://github.com/ckolivas/lrzip

It'll also take a hell of a long time to do the compression and decompression. It'd probably be better to do some kind of chunking and deduplication instead of compression itself simply because I don't think you're ever going to have enough ram to store any kind of dictionary that would effectively handle so much data. You'd also not want to have to re-read and reconstruct that dictionary to get at some random image too.


A movie is a series of similar images and while it does allow temporal compression in a 3rd axis to the 2d raster, H265 is about as good as it gets at the moment but its also lossy which might not be tolerable.


H266 VVC looks impressive. Waiting to get my hands on fpga codec for testing.


right but we're not talking about compressing a video stream but compressing individually compressed pictures, big difference


I’ve heard reports that minio gets slow beyond the hundreds of millions of objects threshold


You are mixing up your units, with 12GB drives and 15TB in a rack.


You didn't take personnel cost into account. You will need at least two system administrators to look after those racks (even if remote hands to change faulty drives are in the monthly opex). It quickly takes you to surplus of 200k/year with current prices (which will rise another 50% in 5 years).

On the other hand, you may negotiate a very sizable discount from AWS for 10Pb storage for 5 years.


Does Snowball let you exfiltrate data from AWS? I was under the impression it was only for bulk ingestion.


First sentence on the linked page: "With AWS Snowball, you pay only for your use of the device and for data transfer out of AWS."


Wow that’s up to $500,000 just to export 10PB (depending on region).


According to https://aws.amazon.com/snowball/pricing/, egress fees depends on the region, which can range from $0.03/GB (North America & parts of Europe) to $0.05/GB (parts of Asia and Africa).

So US$300K to US$500K for egress fees + cost of Snowball devices.

The major downside of Snowball in this export case is the size limit of 80TB per device - from https://aws.amazon.com/snowball/features/ :

"Snowball Edge Storage Optimized provides 80 TB of HDD capacity for block volumes and Amazon S3-compatible object storage, and 1 TB of SATA SSD for block volumes."

That'd be around 125 Snowball devices to get 10PB out.

If OP actually has 10PB on S3 currently, the OP may want to fallback to leaving the existing data on S3 and accessing new data in the new location.


> If OP actually has 10PB on S3 currently, the OP may want to fallback to leaving the existing data on S3 and accessing new data in the new location.

I remember asking an Amazon executive in London when AWS was very new and they were evangelising it to developers; I asked him what is the cost of getting data out of AWS if I want to move it to other service provider, or how easy it will be? And he avoided giving a straight simple answer. I realised then than the business model from the start was to lock-in developer/startups/companies in to the AWS ecosystem.


> If OP actually has 10PB on S3 currently, the OP may want to fallback to leaving the existing data on S3 and accessing new data in the new location.

Another option would be to leave data on S3, store new data locally, and proxy all S3 download requests, ie, all requests go to the local system first. If an object is on S3, download it, store it locally, then pass it on to your customer. That way your data will gradually migrate away from S3. Of course you can speed this up to any degree you want by copying objects from S3 without a customer request.

An advantage of doing this is that you can phase in your solution gradually, for example:

Phase 1: direct all requests to local proxies, always get the data from S3, send it to customers. You can do this before any local storage servers are setup.

Phase 2: configure a local storage server, send all requests to S3, store the S3 data before sending to customers. If the local storage server is full, skip the store.

Phase 3: send requests to S3, if local servers have the data, verify it matches, send to customer

Phase 4: if local servers have the data, send it w/o S3 request. If not, make S3 request, store it locally, send data

Phase 5: store new data both locally and on S3

At this point you are still storing data on S3, so it can be considered your master copy and your local copy is basically a cache. If you lose your entire local store, everything will still work, assuming your proxies work. For the next phase, your local copy becomes the master, so you need to make sure backups, replication, etc are all working before proceeding.

Phase 5: start storing new content locally only.

Phase 6: as a background maintenance task, start sending list requests to S3. For objects that are stored locally, issue S3 delete requests to the biggest objects first, at whatever rate you want. If an object isn't stored locally, make a note that you need to sync it sometime.

Phase 7: using the sync list, copy S3 objects locally, biggest objects first, and remove them from S3.

The advantage IMO is that it's a gradual cutover, so you don't have to have a complete, perfect local solution before you start gaining experience with new technology.


There's also the snowmobile https://aws.amazon.com/snowmobile/


The AWS Snowmobile pages only talk about migrating INTO AWS, not OUT OF.

from https://aws.amazon.com/snowmobile/ :

AWS Snowmobile is an Exabyte-scale data transfer service used to move extremely large amounts of data to AWS. You can transfer up to 100PB per Snowmobile, a 45-foot long ruggedized shipping container, pulled by a semi-trailer truck. Snowmobile makes it easy to move massive volumes of data to the cloud, including video libraries, image repositories, or even a complete data center migration.

from https://aws.amazon.com/snowmobile/faqs/ :

Q: What is AWS Snowmobile?

AWS Snowmobile is the first exabyte-scale data migration service that allows you to move very large datasets from on-premises to AWS.


I mean, the title on the snowmobile page says:

> Migrate or transport exabyte-scale data sets into and out of AWS


Unfortunately the header is misleading. The FAQ says:

Q: Can I export data from AWS with Snowmobile?

Snowmobile does not support data export. It is designed to let you quickly, easily, and more securely migrate exabytes of data to AWS. When you need to export data from AWS, you can use AWS Snowball Edge to quickly export up to 100TB per appliance and run multiple export jobs in parallel as necessary


That wording is not inconsistent with the interpretation that Snowball is for in only.


You realize you can't fit 10 appliances of 4U in a rack? (A rack is 42U)

There's network equipment and power equipment that requires space in the rack. There's power limitations and weight limitations on the rack that prevents to fill it to the brim.


I've put 39U of drives in a rack before. You only need 1U for a network switch, and you can get power that attaches vertically to the back, so it doesn't take up any space. If you have a cabinet with rack in front and back and all the servers have rails, the weight shouldn't be an issue.

The biggest issue will be cooling depending on how hot your servers run.

Specifically, it was a rack full of Xserve RAIDs, which are 3U each and about 100lbs each. So that was over 1300lbs.


Looked up some specs.

* A typical rack is rated for somewhere between 450 and 900 kg (your mileage may vary).

* A disk is about 720g.

* A 4U quanta enclosure is 36 kg empty.

* With 10 enclosures of 60 disks, that's a total of 792 kg inside the rack.

You will want to check what rack you have exactly and weight things up.

The rack itself is another 100 to 200 kg. You will want to double check whether the floor was designed to carry 1 ton per square meter. It might not be.

My personal tip. Definitely do NOT put a row of that in an improvised room in a average office building. You might have a bad surprise with the floor. ;)

Anyway. The project will probably be abandoned after the OP tries to assemble the first enclosure (80kg fully loaded) and realize he's not going to move that.


You run a single network switch for a rack full of drives to the brim?


Sure. A single rack is a common failure domain so you make sure to replicate across racks.

E.g. Dunno about anyone else, but Facebook racks (generally) have a single switch.


That seems rather unnecessary risk. Sure you stripe across the racks but another tor switch in mlag configuration is a minuscule expense compared to the costs involved here


You could easily run two switches, there would be enough room. But normally yes, I'd run one switch per rack. Switch failure is pretty rare, and when it does happen it's pretty easy to switch it out for a spare.


Gold standard APC PDUs are all 0U side mount.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: