Hacker News new | past | comments | ask | show | jobs | submit login

The only thing I need (and is sorely missing from Restic) is that the metadata be kept separate from the actual data. That way I can store the data in AWS S3 Deep Glacier at a cost of nothing per year, and still do incremental backups. Currently the architecture of Restic for instance requires all data to be quickly and cheaply accessible; which makes it impossible for this.

I have terabytes of data that I'd be happy to dump encrypted and compressed in Deep Glacier and happy to pay $500 to retrieve if I were to mess up my hard drives, but otherwise don't want to pay for the costs of normal S3.

Does Kopia separate metadata from the actual encrypted/compressed blobs?

Hi, Kopia author here.

Yes, Kopia does segregate metadata from data. Directory listings, snapshot manifests and few other minor data pieces are stored in pack files starting with "q" while the bulk of data is stored in pack files starting with "p". Indexes into both kinds of pack files are stored in filenames starting with "n".

This has some interesting properties. For example we can easily cache all metadata files locally, which provides fast directory listing, manifest listing and very fast verification without having to download actual data.

Thanks! I'll have to investigate Kopia properly.

Would you mind sharing how much you pay for S3? I assume you’ve considered options like Backblaze B2 and Wasabi?

Google Workspace is still around $10/month for unlimited. 50TB here and counting, uploaded a few more TBs just this week. Incredible value proposition! Only some limits on traffic, 750GB/day ingress. Works very well with rclone.

Legally that price is supposed to be for 1TB storage but the quota is still not being technically enforced, at least for me with a grandfathered gsuite account. Not sure about new accounts, it seems something may have changed couple of weeks ago with new ToS.

Wow... this is insane, nothing can compete with that, I'm pretty sure! 750GB/day ingress is huge.

Surely they've closed this "loophole" on new accounts.

It's been an open secret for many years that they don't enforce quotas. It's not profitable but google has really deep pockets and can afford to not care. Not sure if they really closed it this time - like I said, I'm still able to upload multiple extra terabytes onto my supposedly 1 TB account even now. So it doesn't seem like they started enforcing quotas to me.

Look/ask around in r/DataHoarder for recent experiences of other people, they also discuss other storage services in general a lot.

Oh yeah I know, I just didn't think that people wre doing that much with it! I definitely lurk r/datahoarder, absolutely love seeing people that are so excited about storage (and haven't been found by mainstream reddit for the most part yet). r/zfs is also pretty good for nerdy drive stuff from time to time.

Suppose I have ~4 TB of data. If I dump it into Deep Glacier, it'll be ~$50/year (free ingress), if I ever need to retrieve the data, it's like $370.

Normal S3 would be ~$1.1k/yr, or around half of that for the infrequent access tier, both of which are way too expensive.

Thanks for sharing! This is why I asked... Wasabi prices 4TB of data, with 100% of it downloaded every month @ $287/year according to their price calculator.

Backblaze B2's calculator is a little more sophisticated, and putting in the numbers for an absolutely pathological usecase where you start with 4TB, download, delete and upload that same amount every month puts you at $720/year. A much less pathological use case (I think) that assumes you upload, delete and download 1TB/month puts out around $360/year.

Hetzner storage boxes offer 10TB for ~$48/month, which is $576 a year -- free ingress/egress, no hidden fees for operations or whatever else, but you do have to set up a node with minio (or use FTP, etc).

Amortized over the happy time (time where you don't need to rely on your backups) this does make sense, but I wonder what the percentages look like on that kind of metric. To be fair I haven't had to restore from backup for years so this probably makes a lot of sense. I guess there's no need to test your backups/restore either if you're using a tool like borg/restic/etc and have tested it with local hardware.

Also, what happens if you have to retrieve data twice from Glacier? You've got access to it for 24 hours so I assume you're planning on just keeping the data on some non-glacier storage medium for a while after the initial post-disaster pull?

This wouldn't be the primary backup, but Deep Glacier is just such a good deal that: I'd be happy to pay the $50 per year for a call option on my data, it'd make me sleep better at night!

Part of my calculus is that I have quite strong confidence in AWS in terms of business continuity and reliability/availability. If I dump my files on AWS, I have high confidence in the files (and AWS) being around in 10 years and retrievable for roughly the same price (or at least no more).

Hetzner would have much lower durability. I'm a bit suss on Backblaze, though I do trust them to be more durable than my self-managed disks (and uncorrelated to my failures). I don't know much about Wasabi; but it's not a good sign for me that their landing page touts their latest funding round at the top: seems young and you never know if the price is subsidized with VC money (and won't be in n years) or similar.

> Also, what happens if you have to retrieve data twice from Glacier?

The killer is the egress. I'd just buy a new set of disks and download it straight there.

I suppose you can't check for backup data integrity inside Glacier.

S3 Glacier Deep Archive is $1/TB/month. Super cheap storage costs, but retrieval costs are insane.

My question was more about just how many TBs and ingress/egress was making AWS S3 cost prohibitive -- Wasabi's sticker price is $5.99/month (so 6x glacier but ~0.2x regular S3), and I know that Hetzner will give you a storage box that is 1TB for 9.40EUR (but the kicker there is that 10TB of traffic is included which is amazing), and there are no API/operation fees when you run your own Minio (or just use FTP/all the other built-in access methods).

Network is one thing but what am I missing here? Maybe I just think $10/month is reasonable for 1TB (because I don't have enough TBs? or use remote storage enough?), and that's different from most people who are interested in this.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact