If you're interested in point-in-time snapshots, you're probably also intrigued by our ZFS platform that gives you day/week/month/quarter/year snapshots that you don't have to configure or maintain - you just do a "dumb" rsync (or whatever) to us and the snapshots just appear.
If you're interested in encrypted backups you should look into the 'borg backup' tool which has become the de facto standard for remote, encrypted, changes-only-upload backups.
Finally, if S3 pricing is important, you should email us about our "HN Readers'" discount.
 Examples: http://www.rsync.net/resources/howto/remote_commands.html
Their unencrypted pricing page links to that encrypted order form page. We all agree there should be no http to http transitions like that, right?
If you'll note, that encrypted order page is on the same host as their unencrypted pages. Both rsync.net and www.rsync.net covered by the cert. They have SSL set up already, and they just purposely redirect away to http for their static pages. That is a well-known ssl antipattern.
To be clear: I don't like transitions like that either but that concern is something I've only previously had with sites that do e-commerce or login portal that's not on a different (sub)domain. Apple and some banking sites are notable examples that used to concern me (though I doubt they are still like that).
The model of “surrogate origin server” is sometimes more helpful than “middlebox” and similar.
The reason that the vast majority of rsync.net is not covered by https is because I personally have not yet decided how I feel, philosophically, about the notion of https-only websites.
As silly as it may sound to you, I very much like the fact that you can read rsync.net HOWTO pages with netcat. Or telnet'ing to port 80. Or grab them with old versions of 'fetch'.
I used to do all kinds of interesting and time-saving tasks on the pre-https internet with simple, static UNIX tools. As you can imagine, almost all of those workflows are now either broken or require a big huge pile of python libraries.
So in summary, I am sorry to deprive you of your righteous indignation. You can see from our properly implemented order form that we understand https just fine, thank you.
The real answer is that https-everywhere sounds like it's probably a good idea ... but also sounds like you should maybe get off of my lawn ... but I'm not sure yet.
In any case, modern software toolchains have no trouble with HTTPS. I can interact with it from the command line pretty easily with `wget` or `curl` and similar.
When going from a HTTP page to a HTTPS page, what guarantees can you give that I see the original content and clicked the proper link and that I was not MITM-ed?
A potential attacker can MITM and easily redirect me to any order page they want...
This is security 101.
What you don't understand is that people that don't notice being redirected to a different domain are not smart enough to be using rsync.net in the first place.
So it's not an issue.
Or, I should say, in the fifteen years that we have been providing "cloud storage" it hasn't been an issue.
 It wasn't called "cloud storage" back then - our service predates the term.
Maybe even https://order.rsync.net could be the link and YOU (the sysadmin of the service) might not even notice, because I'm pretty sure you don't check/monitor your DNS records every couple of minutes.
The reason "it did not happened yet", is not valid, because if could happen anytime in your service's lifetime. It's like an open door and no robbery happened yet, but the likelihood of it is happening is worse than if you at least close the door. It would be silly to complain "It has been open for a long time and there were no robbery." after it happened.
> "people that don't notice being redirected to a different domain are not smart enough to be using rsync.net in the first place."
This is just an assumption, I would not make that. You could be surprised. Sometimes even web developers don't understand how x509 certs and https work.
Obvious disclaimer: openssl binary may or may not be part of your system (although the same can be said about netcat and telnet)
How can they be trusted with backups when they can't even run a HTTPS website.
However despite the outages I have to say I've been genuinely impressed at how smoothly the process of signing up was, and how well the (borg) product works.
I have my own local backups, so I'd only ever need to restore from your copies in the event of a catastrophic failure, and I while I hope that never happens I do test-restores now and again and things have always been great.
Their rates are incredible, and their support is excellent. I highly recommend it.
(I have no affiliation with Tarsnap other than Colin seems like a nice guy and I am a customer.)
I have full backups configured every 12 hours. Just for example, on my last backup, the total logical size was 12 GB; the compressed size of the undeduplicated blocks was 7.2 GB; the total size of actual new data, uncompressed, was 181 MB; and the final sum uploaded for this new full backup was 72 MB. Logically, Tarsnap stores 2.9 TB of my backups, but after compression and deduplication the "physical" requirement is only 16 GB.
For this I pay about 17¢ USD/day, or $62/year. I could probably try to lower my use storage use somewhat (the largest component of that cost, 13.4¢/day) but it hasn't been worth my time yet.
Systems that split metadata and data streams do a lot better (restic, attic, borg, bup).
Today I'm using zfs with real snapshots. For systems with no zfs support (my wife's iMac for instance), I have a zfs fs that those systems rsync to, after the rsync is done I create a snapshot. All scripted. The snapshots can be stored in another server for an additional layer of backup, or incrementally send them to s3 if you want.
I build something similar, that runs on a Raspberry Pi and creates backups for the machines in my home network . The Pi pings each machine every hour, if a machine is online and a backup is due, it starts a backup process. My Pi uses a USB battery as a UPS (unlimited power supply) . I put all the hardware in a little medicine cabinet on the wall . It's been running stable for month now, without a single reboot. It needs about 15 minutes to backup my dev machine over WiFi. It's a little independent backup module, I'm really happy with it. :)
I use rsnapshot to a encrypted external usb disk.
The design goals were to make something that didn't store data in a proprietary format (you can analyse the backups using straight sql commands, and access the data using lzop), and to be able to back up systems without installing a client agent on them, and support compression, and avoid filesystem issues you run into with a large number of hard links (such as https://news.ycombinator.com/item?id=8305283). So far I've been using it to back up a few dozen RHEL 4 - RHEL 7 systems over the last couple years without issues.
My data set is about 8 TB (my wife is a professional photographer), and it would be too expensive to keep in S3, so I have an "offsite backup system" that is hosted at my in-laws. It's just a RaspberryPi + an 8 TB drive encrypted with LUKS (in case it gets stolen or tampered with).
Every night, the RPi syncs the data from my house with rsnapshot (which is a TimeMachine like tool that uses hard links with rsync over ssh).
Because of how rsnapshot works, I can always go there and look for a file in one of the directories: it looks just like the original hierarchy, and I can just browse the filesystem as I would on the original system.
I also don't have to "trust a 3rd party" that the data is really there. I remember some posts on HN about people who used some backup services successfully... until restore time. I'm always cautious about the magical "set it and forget it" service that is a black box.
The first sync has to be done "on the premises" because of the sheer amount of data to transfer, but then the few daily Gigs of data can easily go over the net while everyone is sleeping.
I have a couple of drives in RAID 1 (mirror), yet I still don't rely on it exclusively for really important data.
My single offsite drive is indeed a ticking time bomb, but it's easily replaceable with no loss of data when it dies.
The problematic case is when all the drives hosting the 3 copies happen to die at the same time. Perhaps I don't have good protection about this case, but I think I've reached diminished returns in terms of backups.
Not the ideal 3-2-1 rule of thumb, but also not a ticking time bomb.
Without even talking about the difficulty of dealing with Glacier files in an incremental backup scheme, doing the initial sync, and checking the backup data regularly, we're talking about something that will probably cost $250 for a couple of years of backup (assuming the drive only lasts 2 years) vs $720 (glacier) or $2,400 (S3) if I use your numbers.
It seems like a significant difference to me, especially because the assumptions on how often my drive will fail are quite pessimistic.
Anyway, I would recommend against using it - errors like that should be impossible when you're not doing anything exotic.
> Although you should never have to look at a duplicity archive manually, if the need should arise they can be produced and processed using GnuPG, rdiff, and tar.
I haven't set it up yet, but it's going to be my holiday project this Christmas..
Also don't forget zfs/btrfs functions that might be relevant.
The ability to recover from a left state seems pretty amazing:
> You can back up directly to a remote bup server, without needing tons of temporary disk space on the computer being backed up. And if your backup is interrupted halfway through, the next run will pick up where you left off.
Arq supports painless back ups to multiple cloud storage providers, with budget enforcements, etc.
While I love ZFS, let's not forget that btrfs has its strengths (a lot more flexibility, mainline support), and, provided that your use case is single disk, Raid 1 or raid 10, has been working super reliably for some years now.
Zstandard at a high compression level gives a good tradeoff of decompression speed vs underlying physical throughput, be it network or spindle.
Even so, yes, the sequential nature of tar is not great for this reason.
Squashfs files, which can be used on Linux and Mac (using fuse for OS X), are a very good alternative for archIves, though without some of the features described in the linked article.
if it already resides locally on a seekable media , there's litle drawback
Think of it like a linked list.
*: As opposed to a few large files.