Hacker News new | past | comments | ask | show | jobs | submit login

> Measuring only one-and-half-feet by three-and-half-feet by six inches, each Diskotech box holds as much as a petabyte of data

This number is very interesting. Basically Diskotech stores 1PB in 18" × 6" × 42" = 4,536 cubic inch volume, which is 10% bigger than standard 7U (17" × 12.2" × 19.8" = 4,107 cubic inch).

124 days ago Dropbox Storage Engineer jamwt posted here (https://news.ycombinator.com/item?id=10541052) stating that Dropbox is "packing up to an order of magnitude more storage into 4U" compared to Backblaze Storage Pod 5.0, which is 180TB in 4U (assuming it's the deeper 4U at 19" × 7" × 26.4" = 3,511 cubic inch). Many doubted what jamwt claimed is physically possible, but doing the math reveals that Dropbox is basically packing 793TB in 4U if we naively scale linearly (of coz it's not that simple in practice). Not exactly an order of magnitude more but still.

Put it another way, Diskotech is about 30% bigger in volume than Storage Pod 5.0 but with 470% more storage capacity.

That was indeed some amazing engineering.

Yev from Backblaze here -> Yup! It's pretty entertaining! Granted we only store 180TB b/c we use primarily 4TB drives and it's inexpensive for us to do so. If we had 10TB drives in there we'd be pushing up to 450TB per pod, but the price for us would increase dramatically. Pod 6.0 will be a bit more dense!

To be fair to Backblaze this level of storage density is really only possible with recent advances in disk technology (higher densities, SMR storage, etc).

Also not everyone wants to be packing a petabyte into a box. At that level of density you need to invest a lot of effort in replication strategies, tooling, network fabric etc to handle failures with high levels of availability/durability.

Yes, a 1PB+ failure domain only made sense because Magic Pocket is very good at automatic cluster management and repair.

Are you using spindle or ssd or flash? (admittedly I dont know if you consider flash and ssd to be the same)

What is your price per GB raw?

> Are you using spindle or ssd or flash? (admittedly I dont know if you consider flash and ssd to be the same)

We have flash caches (NVMe) in the machines, but the long-term storage devices are spindle--PMR and SMR.

> What is your price per GB raw?

I cannot disclose, exactly, but it is well below any other public price I've seen.

So NVMe is "ready for production"? Do you think you'd use it at home, or only in the enterprise?

I mean, if at home you need 400k IOPS, 1GB/s of writes, and 2.2GB/s of reads... go for it! I sure don't, but more power to you. :-)

NVMe's lower latency is great. I've never really known how "slow" regular SATA/SAS SSD's were until I used NVMe ones.

I'm tempted to get one and set it up as some kind of cache in my NAS. I'm already in silly territory with it now, anyway. 1GB/s of writes sounds crazy though - I haven't got anything that can write to it that fast!

I have a PCIe NVMe card (with a Samsung 950 pro M.2 in it) in my home server, which serves as an L2arc device for a ZFS pool. It is pretty nice. Runs a bit warm though.

We are now well offtopic but I've seen some advice recently that advises using an SLOG rather than L2ARC for a home NAS.

I think that a home nas would have a lot of async writes and very few/none sync writes. So a ZIL dedicated device (the SLOG) is not really useful/helpful. I'm not sure even if you really need an L2ARC device, just slap 8/16Gb of RAM on it and you will be happy...

Obviously if you are playing seriously with VMs and databases and whatever else both of those (SLOG/L2ARC) may become important for you, I'm going for the "i'm just using this to store my raw-format photos, backups for the taxes and other big files" kind of usage here. :)

You don't have any contact details but I'd definitely like to talk to you more about home NAS.

Aside, Google Cloud has NVMe available on almost all machine types if you want to play with it [1].

[1] https://cloud.google.com/compute/docs/disks/local-ssd

I put it in production, and it's something else. Whether I'd use it at home depends entirely on how rich I feel I am ;)

We have been primarily using Intel NVMe storage for our database servers since fall of 2014 with no major complaints. Our high end desktop/laptop systems are also using the latest Samsung NVMe M.2 drives which are screaming fast.

Just to put a number to "screaming fast", 2.5 GB/s sequential reads averaged across the entire drive on a Samsung 950 Pro NVMe SSD: https://pbs.twimg.com/media/CaMhc2oVAAE4q3q.jpg:large

Surely it's the other way round?

That is, the better your software can handle repair, the smaller your failure domain can be?

A larger failure domain would only make sense if you wanted to minimize the compute required per unit stored?

We do want to minimize that--compute costs money, and storage is irreducible. But you can only reduce aggressively if your larger distributed system is great at repairs, since the failure of a single box kicks off 1PB of repair activity!

Thanks, do you have a time lapse backup of data in Magic Pocket? For example, in case of software error?

Not sure what is more amazing, the project of this scale (love the disk drawers!) or that infrastructure for managing the fleet of drives gets top billing!

In datacenters space is cheap in comparison to power so often it's simply not economical to cram more in less space.

Sometimes; but depreciation costs are usually more than either of those (for storage).

Did you build your own centers? Running our hosting we had far more constraints on our power than anything else. Even though we only had 20U of equipment we ended up taking a 44u rack. Our disk arrays being the largest consumers. This made the case for moving to SSDs even stronger.

That was the problem I always had in my pre-cloud days. We were usually running racks half-full because of power restrictions in the DC. It was convenient because you could keep you cold spares and tools in your expensive space, but it always seemed like you should be putting more useful stuff in the racks.

You can substantially change the economics if you design your own power distribution systems.

Aye, but even off the shelf 3ph 50A 208v circuits gets you a lot of power. That's enough for 7 or 8 4U storage nodes with 70 drives each.

Then again, I don't really like putting too much weight that high up in a rack that already has 2600lbs of gear in it.

Is dropbox using SMR? (I never used dropbox. Do they offer incremental backup? Otherwise I would have thought they would need to be able to do random writes, unless SMR is so cheap that it is economical to keep some dead data when a user uploads a new version of a file)

There was a post a while back here from a Dropbox employee explaining how they manage SMR disks manually via a direct HBM card and basically doing what the firmware would do otherwise. I can't find the post but it was talking about working around the architectural deficiencies of SMR while still getting to use more disk space.

Magic Pocket is a block storage system, and all storage in it is append only. We'll go into more detail on the on-disk format in a blog post. But, yes, we use the SMR disks on an HBA, and we directly address the disk on a LBA (and zone) basis.

The difference to get even higher is our system supports host-managed SMR disks that come in 10T and 14T sizes.

I've never heard of a 14TB disk drive. Is that what you mean? Who makes these?

10 TB drives can be gotten off the shelf -- http://www.hgst.com/products/hard-drives/ultrastar-he10 -- and 14 TB drives are probably in the state of availability for large customers; it's not unusual for drive makers to make available drives coming down the pipe to certain customers.

True, it's just that they also tend to shout about upcoming drives and boast that they are previewing to select customers. Case in point, HGST marketing is in full swing on that He10 but good luck finding any stock in a mainstream retailer.

Yep, vendors do sometimes provide stock to large customers ahead of general availability.


2.5" 15TB ... and SSD, and expected to get a lot bigger too!

But that's not an SMR drive.

What interface and protocol do your drives have/use? Is it SATA, SAS or a custom interface/protocol?

SATA SMR disks, zone management is ZAC.

Is that in production system now? W-O-W! :O

Not to knock Dropbox's engineering, because it is sexy, but there are off the shelf enclosures right now that will fit 90 drives in a 4u case. Given that said enclosures have full rack rails and all, and are narrower, I don't doubt that a more densely designed server is feasible in the least with dropbox's design.

These are standard-width rackmount chassis btw. They're 1.5x depth but otherwise it's standard 4U.

The dimensions given are just an approximation.

There are other concerns to keep in mind, eg. the upgradability of storage to use next gen cards that use 3d nand, etc. I'm sure they thought through a list of concerns before going this route.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact