

Online backup to S3 for the Mac - sreitshamer
http://www.haystacksoftware.com/arq/

======
iamcalledrob
This page could really benefit from a screenshot or if you're feeling
adventurous, a screencast.

I don't want to commit to downloading this app without seeing how it
functions. I understand it lets me back up to S3, but how?

I dare say if you add these images, your conversion rates will skyrocket.

Assuming this is your site, anyway.

~~~
sreitshamer
You're probably (hopefully?) right. It was on my list, but maybe I'll move it
to the top of my list.

~~~
bugs
Most definitely, I for one don't try many new programs without some sort of
screenshot (if you do a screencast also put screenshots up as not everyone can
always watch a video or wants to watch a video).

------
mikebo
I had Mozy and I used their built-in encryption to backup ~250GB. I went to
restore some wedding photos while at my parents' house, so I downloaded their
decryption utility, and started the decryption of a 10gb subset of my backup.

Only about 1/3rd of the files decrypted correctly. The rest of the files were
totally corrupted -- not cool. I contacted support and we investigated the
problem for ~3 months. Luckily I didn't have dataloss, but what exactly was I
paying for here?

We never got the problem resolved and I ended up getting a full refund for 1.5
years of service. Make sure to test your backups no matter where you store
them.

~~~
cperciva
_what exactly was I paying for here?_

You were paying for not worrying about whether you would lose your data. And
that's exactly what Mozy gave you... at least until you discovered the
corruption. :-)

~~~
mstevens
This has just reminded me I've got a lot of data backed up with you and no
idea if the restore works :)

~~~
cperciva
I'll be happy to accept $0.30/GB while you test restores to your heart's
content. :-)

------
cperciva
As the author of Tarsnap, I'm a bit biased, so I'm going to refrain from
comparing the two; but I have one question for the author: Aren't S3 per-
request fees a problem? One of the reasons I have all Tarsnap data go through
the Tarsnap server is to allow me to "bundle" blocks together and amortize
per-PUT costs.

~~~
sreitshamer
The per-request fees ($.01 per 1000 PUTs and $.01 per 10,000 GETs) aren't a
problem for 2 reasons:

1) Arq packs small files (under 64k) together into "packs", a bit like git
does. So instead of thousands of PUTs of tiny files (which would also be very
slow) it does fewer larger PUTs.

2) Arq caches indexes of those pack files locally, and caches the list of
larger S3 objects as well. So it doesn't have to do a GET to figure out
whether an object with a given SHA1 is already in the S3 account. (Arq stores
every object by its SHA1 hash, like git does).

The file format is open (it's your data after all):
<http://www.haystacksoftware.com/arq/s3_data_format.txt>

~~~
cperciva
_Arq packs small files (under 64k) together into "packs"..._

Ok, so if you have lots of 64 kB files they're all PUT separately, using 15625
PUTs/GB (i.e., $0.15/GB)? I guess that's not too bad.

 _The file format is open_

Nice! Good to see someone who believes in openness. (How about the source
code, so that people can confirm that you're doing what the spec says?)

One thing caught my eye glancing through that document, though: "The name of
each blob is its SHA1 hash." That's the SHA1 hash of the encrypted data,
right? (If not, it definitely should be!)

~~~
sreitshamer
_Ok, so if you have lots of 64 kB files they're all PUT separately, using
15625 PUTs/GB (i.e., $0.15/GB)? I guess that's not too bad._

I guess I could ask you to choose the threshold instead of picking 64k, but
generally I've tried to choose reasonable defaults so that the user doesn't
have to make those kinds of decisions.

 _Good to see someone who believes in openness. (How about the source code, so
that people can confirm that you're doing what the spec says?)_

Well, you could actually write your own code to read the files, since the
format is open.

 _That's the SHA1 hash of the encrypted data, right?_

It's the SHA1 of the contents of the object, and the object is encrypted data,
so yes.

~~~
statictype
>Well, you could actually write your own code to read the files, since the
format is open.

I think the parent's point was not vendor lock-in but rather, that publishing
the source code to _your_ particular implementation would lend more
credibility to it since anyone can inspect it and verify that it does what it
says it does.

~~~
sreitshamer
Right, but with Arq, unlike most online backup offerings, you have direct
access to the files being stored online (they're in your S3 account). So you
can verify that Arq says what it does because you can look at the results.

~~~
DavidSJ
With source code, you can reason about edge cases.

No matter how many tests you do, you can never catch all of these.

------
jwr
Good thing somebody brought up the problem of encryption with other backup
providers (like Backblaze).

Still, problem is that if you want to back up a lot of data, S3 quickly
becomes rather expensive. In my case, storage costs alone for around 550GB
that I currently store on Backblaze would run me $82.50 a month. Not exactly
small change.

------
pclark
I can't judge how much it'll cost to back my iMac up. I have about 750Gb of
data.

~~~
Willie_Dynamite
You can't figure out what 750 * 0.15 comes to?

~~~
Frazzydee
Perhaps he is saying that the per-gig cost is not clear. It says buy now for
$29, but the extra $0.15/GB/month to use S3 is hidden under "Why use S3?" (you
might not expect costs to be there)

We both know what S3 is, and know that they charge per gig. But the market of
people who want online backups for their macs don't necessarily know this.

~~~
pclark
true, but this service is also incremental so it'll (in theory) backup more
than 750Gb.

------
PStamatiou
I have been using Arq for the last 3 weeks. It's very basic software at the
moment, but it does the job. It lets me see file revisions and restore files
as necessary. Unfortunately these files can't be seen directly in S3 due to
the packing nature of how they're stored.. so if your mac crashes and you want
a file, you'll need to find another mac and install Arq to retrieve your
files.

~~~
sreitshamer
Right. The files are named using the SHA1 hash of their contents (to enable
de-duplication) and small files are packed together (to dramatically improve
performance). I thought that was the best solution given the limits of the S3
API. JungleDisk does something similar I think.

What sort of alternative for restoring would you be interested in? Maybe an
open-source command-line utility?

~~~
zacharypinter
A command-line utility is always a big plus for those who like scripting.
Maybe a FUSE script/app/plugin?

~~~
sreitshamer
I posted an open-source command-line utility on github for restoring
(downloading) Arq backups from your S3 account:

<http://sreitshamer.github.com/arq_restore/>

------
anotherperson
From the Carbonite Backup Bouncer test:

One of backup-bouncer's tests is called "combo-tests" where it tests several
different file metadata types on the same file. Carbonite didn't restore this
file at all.

Instead, Carbonite left a file called "Carbonite_Restore_F161_G1.tmp". It's a
1-line text file that reads "gotta boogie".

------
mtigas
The price scaling issue worried me at first, and then I noticed the monthly
budgeting — great feature. In the long run, you’ll always sacrifice price (vs.
an external hard drive) for off-site data persistence anyway.

In any case, this looks really awesome for documents/projects backup to
supplement my TimeMachine drive.

~~~
sreitshamer
Right, I use it as a supplement to my Time Machine drive as well, for my most
important stuff (documents + photos).

Maybe I should call out that use case on the landing page.

------
Willie_Dynamite
Sounds nice. Just too bad the contents of my NAS would cost $700 a month to
keep on s3.

~~~
sreitshamer
I was thinking of adding Pogoplug support. That way storage would be much
cheaper, although backing up to a Pogoplug at a friend's house may be much
less reliable than backing up to S3.

Would anyone be interested in this?

~~~
Willie_Dynamite
I'd be far more interested in a generic ssh target thingy, than support for
specific peripherals.

~~~
mikepurvis
I second the motion. I have a ton of storage on Dreamhost that I could totally
be using as backup. (And have tried a bunch of times, but it's a pain getting
cron and rsync to do the right thing, and then remembering to set it up again
when my computer gets changed around...)

~~~
m_eiman
Keep in mind that they don't allow unlimited private backups (they allow 50GB
of backups store in a special backup user account), the data you store there
is supposed to be web accessible.

[http://www.dreamhoststatus.com/2007/10/17/policy-
clarificati...](http://www.dreamhoststatus.com/2007/10/17/policy-
clarification-personal-storage-back-ups/)

------
KirinDave
I'm confused how this product can charge so much more than JungleDisk but not
really do anything different? Is there a difference I'm missing?

~~~
sreitshamer
Well, JungleDisk is a monthly service from $2-5 month (plus S3 charges)
depending on the plan you choose. Arq is a 1-time $29 cost (plus S3 charges).
The other big difference is that JungleDisk is a Java app and doesn't have a
native Mac look and feel.

Also Arq is a lot simpler to use. I really need to put up a screencast to show
that.

------
matthewer
Has anyone tried both Backblaze and this?

~~~
sreitshamer
I have. I chose my own encryption password, but in order to restore files, I
had to enter the password into the Backblaze web site. Backblaze used it to
decrypt my files on their server and assemble them into a zip file on their
server. They then sent me email saying I could download the zip file.

Also the Mac metadata weren't restored properly:
[http://www.haystacksoftware.com/arq/backblaze-backup-
bouncer...](http://www.haystacksoftware.com/arq/backblaze-backup-bouncer-
test.txt)

------
rshields
Is there some way to make the app not suck up all the bandwidth on my teeny
weeny cable modem?

~~~
sreitshamer
Yes, you can set a maximum transfer rate. You can also choose "automatic
transfer rate" which slows backups when you're using the Internet
interactively.

------
quizbiz
If I do not have S3, what would be the total effective cost?

~~~
sreitshamer
It doesn't work without an S3 account. The total cost is $29 (one time) plus
monthly S3 charges. For example, 33GB backed up at S3 would cost $4.95/month
in storage charges, plus some (relatively small) transmission and transaction
charges. Upload transmission to S3 is free through June 2010.

------
sunchild
Looks good. What about workgroup admin?

