Hacker News new | comments | show | ask | jobs | submit login
Desktop Backup: Traditional, and Torrent-Like (shmichael.com)
8 points by shmichael 2827 days ago | hide | past | web | 12 comments | favorite

No mention of tarsnap? http://www.tarsnap.com/

Is there any special feature of tarsnap that I'm overlooking, or is it just another kid down the block?

Tarsnap is unique in a few ways, but I don't know if you consider these important enough to be worth mentioning. For example:

* Tarsnap was designed to be secure against even the most skilled attackers -- and was written by someone (myself) with non-trivial expertise in cryptography and computer security.

* Because Tarsnap is built around tar(1), it is heavily scriptable; for experienced users this makes it far more flexible than any other tools.

* Tarsnap is AFAIK the only backup system which works as a metered service -- pricing per byte of bandwidth and per byte-month of storage used, starting at a (very small) fraction of a cent per month. Where other services have fixed monthly fees, Tarsnap just looks at your usage and charges you accordingly.

* I'm not sure if Tarsnap's snapshotting model is unique, but it's certainly unusual; and once Tarsnap users follow my advice of "forget everything you know about incremental backups", they all tell me that it's far more intuitive than other approaches.

Thanks for the clarification. I've added tarsnap to the post.

Thanks! Just one small correction: Tarsnap isn't linux-only. It runs quite happily on BSD, Linux, OS X, Solaris, Cygwin... basically anything UNIXy.


From the Design page (http://www.tarsnap.com/design.html):

The design of tarsnap was guided by the following four principles:

Security: Backups should be secure against attackers ranging from "script kiddies" up to major world governments, even if they can compromise the systems on which the backups are being stored. Backups are supposed to be a tool for mitigating damage — not a potential vulnerability to worry about!

Flexibility: Backups should be flexible and convenient. When you decide you want to create an archive, you should be able to store in it whatever files you want; if you decide that you want to delete an archive, you should be able to do it whenever you want, without impacting other archives; and there should be no arbitrary limits on how many archives you have stored, how often you can create new archives, or how long you can keep them for.

Efficiency: Backups should be efficient, using a minimal amount of storage and bandwidth. If you archive the same file twice, it should still only be uploaded and stored once; likewise, if you move, rename, copy, or make small changes to a file (e.g., adding a small amount of new data to the end of a log file or mail spool) you should never need to re-upload the entire file.

Utility: Backups should be provided as a utility, with linear (i.e., per-GB) pricing. Forcing people to figure out ahead of time how much data they want to back up so that they can sign up for the right "plan" is dumb, and having some customers subsidize other customers is inherently unfair.

This seems like more-or-less the standard feature set of other products. See, for example, products mentioned in the post: http://www.wuala.com/en/learn/features https://spideroak.com/engineering_matters#storage_savings http://jungledisk.com/personal/desktop/features/

tarsnap offers a unique mix of these features, but so does any other product.

So, what's wrong with a tape backup drive and a box (for the tapes)? My dataset's 6gb. I currently use zfs incremental and full snapshots to generate a single file per backup to save.

There are two issues I haven't seen addressed:

(1) No guarantee of privacy: all my data's on someone else's box. I haven't seen any of them go to court to defend a person's data yet. And this isn't a phone log or URL list, it's everything they have. Backup privacy is not an area where you screw around.

(2) Upload bandwidth. Most network links available anywhere I've lived are asymmetric, with a massive bias downlink-side. Uplink speeds are still measured in 100s of kilobits/sec. $160/month bought me 1.5 mb down, and 768kb up, with a static ip.

I'd rather run my dataset through gpg and write it out to tape.

A tape has several downsides:

a. It is physically close to your original data's location.

b. It's a hassle to back up (remember to perform backup, insert tape, zfs, label tape).

c. Redundancy and versioning is at an extra cost.

d. Cannot be utilized for remote access to files.


As for the two issues missing -

(1) The data is encrypted before uploading. In addition, it is sliced. So, no single machine should have access to the whole file - nor would it even know what the other parts are. Moreover, requests for slices would be digitally signed, so a different machine could not even request slices it does not "own".

(2) Upload bandwidth is indeed an issue, but not a serious one. The general argument goes that what works for torrents would work for backups in this matter.

I don't know if you are adequately addressing issue number 1. He is talking more about the rule of law and a court ordering the data to be disclosed versus what you are talking about which is the technical issue of one person gaining access to another persons files via technology. One would imagine that seeing as this is a backup solution for total/partial failures, one would need to be able to bootstrap a machine from this information. If all the required information to recreate a machine was on the machine that failed then this backup solution would be an expensive waste of time. By that logic there must be a centralized repository of information at the server side thus they could in theory be forced to divulge the information on the server if there was a court order. I am not a lawyer though.

I can't comment on the legal issues of this, since I too am not a lawyer, and don't live in the US.

What I can say is that your key - whether automatically generated or your favorite pass phrase - is known only to you, and without it all of your data is lost.

Thus, if you want to you could store it at the server (so you'd never lose the data) or you'd keep it to yourself (and never risk the data being exposed).

Applications are open for YC Winter 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact