Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Cloud Storage with ZFS send and receive over SSH (rsync.net)
59 points by rsync on July 28, 2015 | hide | past | favorite | 32 comments

I've used ZFS as my primary data workstation FS (as opposed to NAS), first under OS X and then FreeBSD, for over 4 years now, and amongst the many benefits backup/replication definitely rates highly for me. But while I'm interested to see native send/receive in the cloud, particularly once resumable support makes it into mainline OpenZFS, I'm actually not sure how useful it'd be in general for plain backup purposes. One of the most valuable choices Sun made in send/receive was that it uses normal stdin/stdout, and that in turn can give ZFS immense flexibility in terms of cloud storage targets. While I do local online replication to a NAS, for remote backup I just dump to files. Simplifying here without all the exact options:

  zfs send tank@snapshot | lz4 | openssl enc | par2 > file
OpenSSL of course using AES-256 and a key, and par2 adjustable based on characteristics of the target storage. This is part of a simple script to produce and cycle incrementals. The result are an original and then set of deltas that are client encrypted and have some redundancy even in the face of errors in the foreign filesystem or transmission process, and thus can be used practically anywhere with minimal thought towards remote security, no need to trust any sort of closed client or service, etc. It works nicely with Amazon's offerings (including Glacier), particularly since they allow actually physically sending in a hard drive which is really handy for freelancers or small businesses with significant datasets but highly mediocre netlinks (ADSL 5/1 or some equivalent is still depressingly common in America at least). Since par2 can also split output to arbitrary sizes and numbers, it'd be possible to use this workflow for anything from optical to storage lockers as well (dataset size permitting).

Really, with stdin/stdout the sky is the limit, and there are minimal ties to any specific service since only the most generic raw cloud storage features are being used.

> openssl enc

Careful. Openssl enc is basically a poor toy example, it's not a robust authenticated encryption tool for real world use.

It explicitly doesn't support modern AEAD ciphers or even a simple MAC (so the ciphertext can be silently tampered with - note anyone who can do that can also replace your par2 files), and it doesn't use a robust key derivation algorithm for password use (single-round of a hash, basically, default MD5 - about as weak as it comes).

hpenc looks like a nice high performance modern alternative: https://github.com/vstakhov/hpenc

"Really, with stdin/stdout the sky is the limit, and there are minimal ties to any specific service since only the most generic raw cloud storage features are being used."

That's the key and that's why it's here. It's a very, very stretched analogy, but from the very beginning, we at rsync.net have tried to run rsync.net itself as a unix primitive.

After already supporting actions like this, for years:

  ssh user@rsync.net sha256 some/file

  ssh user@rsync.net s3cmd get s3://rsync/mscdex.exe

  mysqldump -u mysql db | ssh jo@rsync.net"dd of=db_dump"
It made perfect sense to support workflows just like the one you describe above. And yes, you can indeed do just what you've written (with lz4 and openssl encryption)at rsync.net (with or without breakout to files).

Are you doing a full dump each time and syncing that? Or are you dumping and syncing incremental streams (i.e. zfs send -i [...])? If the latter, how do you deal with expiring old dumps, without having to re-sync a full dump periodically?

That's what I thought as well. It's impossible to expire old backups that way since all backups sets depend on each other.

Something I don't see in the docs, is it possible to mount rsync.net storage with something like sshfs and use it like a filesystem? If possible, is there anything stopping it from being a good idea?

I'm looking for an EBS style mountable storage for digitalocean instances, since they don't scale up disk space to price very well.

"is it possible to mount rsync.net storage with something like sshfs and use it like a filesystem?"

Yes. Thousands of people do this, and it is in fact our recommended "browsing" recipe for OSX.[1]

Feel free to email us to discuss this further - we would be very happy to extend the long-standing "HN discount" to you.

[1] http://www.rsync.net/resources/howto/mac_sshfs.html

Yep, it's entirely possible to do that. It's how I actually backup to them: rather than use scp or rsync, I mount via sshfs and just cp the data to the mount.

Latency is really the only down side I can think of... if this is for backup it shouldn't be an issue. If this is for interactive data, I'd suggest looking deeper into the rsync.net locations and DO's, to see if they are close... or even if they have a direct uplink at any locations.

One possibility if you want mountable storage, is to use our ObjectiveFS[1] filesystem on your digitalocean instances. It is a bit different from EBS in that it is a shared filesystem instead of just a cloud block device, so it depends on your use case if it might work for you.

[1] https://objectivefs.com

Shameless plug:

ExpanDrive @ http://www.expandrive.com enables a very fast/robust version of this use case.

Don't want to post off topic but you should really update UserVoice. I bought Expandrive V4 but haven't (and likely won't) upgrade to V5 since your communication has been so limited.

The specific issues I had:

- You promised transparent client-side encryption in V5 [0]. V5 is now here and encryption is nowhere to be found.

- You advertise a Linux beta yet when contacted about participation you haven't responded [1]

[0]: https://expandrive.uservoice.com/forums/205560-expandrive-fe...

[1]: https://expandrive.uservoice.com/forums/205560-expandrive-fe...

Just in case people see your plug and decide to purchase based on what they see in UserVoice.

I've been a happy rsync.net customer for some time now. I don't use it much but it's there when I need it. When maintenance or issues arise, notifications come with good warnings and explanations – the cost is reasonable and the service good.

Thanks! We're very glad to be serving you.

As a reminder to folks reading this, email us about the long-standing "HN readers' discount".

For all the great ingestion capabilities of rsync.net, their pricing still isn't nearly competitive with any of the cloud storage services.

Could someone from rsync.net explain why? Am I looking at rsync.net the wrong way? Is it meant to serve a different use case?

The front page of the website says "Cloud Storage for Offsite Backups" but the pricing shows that it costs 8-20c/month depending on usage. Meanwhile GCS Nearline and Amazon Glacier (also offsite backup products) are at 1c/GB and even their regular storage is at ~3c.

Sell me. What does rsync.net offer me that justifies an 8x-20x price bump?

I don't have anything to do with rsync.net, but having looked at both AWS and them:

How cheap Amazon is heavily depends on your retrieval needs. Glacier is basically for gigantic data vaults where you will never need to retrieve more than a small fraction of it. It's very cheap for that, but has retrieval fees if you need to retrieve >5% of your data at any given time, and they can be very high if you need to retrieve a significant amount of the data quickly (also, there's a 4-hour minimum retrieval latency, so you wouldn't want any possibly-needed-for-operations backups there). S3 allows your storage to be "online" full-time, but adds a $0.09/GB bandwidth charge for outgoing data, in addition to the $0.03/GB storage fee, so overall price depends heavily on what you're pulling out of it.

The $0.06/GB promotional offer in this rsync.net post actually seems surprisingly cheap, for always-online storage with no additional bandwidth fees. Even their normal prices seem pretty fair to me, for something that comes with full phone/email support, provides a regular POSIX filesystem with SSH access instead of a weird custom API, etc. If I were warehousing petabytes of never-to-be-needed data, the price difference over Glacier would be hard to ignore. But for a lot of needs it seems quite competitive to S3.

Thanks for asking - I like having the chance to sketch out our value.

First, glacier and nearline are not online storage. They are not random access datastores that you can interact with in real time with arbitrary applications. There's not a comparison to be made there.

I think the proper comparison is with Amazon S3 and I hope you'll allow me to approximate their pricing, on average, to 4-5 cents per GB, per month. The headline price for storage, of course, is ~3 cents, but you are charged for all outbound data as well as certain IO operations. I think 4-5 cents is a decent approximation.

So ...

If you're just walking in off the street and getting a small quantity of rsync.net storage, yes, it is 4-5x the price of S3 and you are paying that premium for UNIX-native storage and a phone number / email address that connects you to a real engineer (sometimes me, actually). Given that, at these smaller quantities, you're paying $10 or $20 per month, I think that's a very, very high value for the money.

If you're using larger amounts of storage, the pricing gets down to ~6 cents per GB (assuming annual discount) or even 3 cents per GB at petabyte levels. So at 10+ TB quantities we are just barely more expensive than S3 and at petabyte quantities we are decidedly cheaper.[1]

As always, if you're a small end user, email us about the HN discount which is quite substantial.

[1] Actually, it's even more competitive relative to S3 since, by default, all accounts have 7 daily ZFS snapshots enabled and 1+ TB accounts have 7 daily + 4 weekly. Those typically add between 20-40% space usage on our end, but you don't pay for that ... on S3 you would indeed pay for that retention and you'd have to roll your own snapshot/retention script/logic. At rsync.net, Apple-style "time machine" snapshots are "just there".

I've been going over beginning a ZFS storage service for a long time, however two technical problems have held me back so far:

  * ZFS send/receive does not support pause/resume

  * SSH overhead may be overkill especially if the ZFS data is already encrypted

  * Do you have any plans to solve these issues?

  * Have you found them to be a problem in production?

  * Do you support encrypted ZFS?

OpenZFS resumable send/receive is coming soon http://blog.delphix.com/matt/2015/03/25/resumable-zfs-sendre...

However, the tank/home/darren@snap1 stream is not encrypted during the send process.


That's for Solaris only though. I wouldn't expected any other implementations to have on-the-fly encryption of that kind. So OpenSSH is unlikely to be adding overhead on that front.

Nobody outside of Oracle supports encrypted ZFS, and that's probably not going to change any time soon (I'd love to be wrong, though).

Their ZFS encryption is vulnerable to watermarking attacks, so most people don't want to see it publicly used anyway.

While a bit cumbersome and arguably suboptimal, at least on FreeBSD you can use geli+zfs to get full disk encryption.

Sure, but that doesn't help in the context of zfs send, which serializes at the filesystem level for transport. The stream is still going to be in plain text. In fact, Oracle's ZFS still sends the stream decrypted and decompressed, even if those properties are enabled.

When I scroll down the page on my iPad, it appears to instantly reload and snaps back to the top of the page again! What on earth is the page doing?

Sorry... It's a new website design and we are working out the kinks. I don't like fancy hip scrolling either, but somehow it seemed very nice as we prototyped.

Custom scrolling seems to be the new Flash.

I don't mind the parallax affect nearly as much as their homepage... it's just a weird UX.

Same on android here.

I may use this if it supports built-in encryption. Are encryption keys saved on the client or are they saved on your servers?

We give you an empty filesystem to do what you want with.

So you can indeed encrypt your data "at rest" and you would indeed control your own keys. We recommend the 'duplicity' tool which works very well for encrypted offsite backups at rsync.net:



Do you plan on giving an encrypted filesystem?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact