
Tell HN: AeroFS - File Syncing Without Servers - yurisagalov
http://aerofs.posterous.com/
======
chime
So many questions...

1) Does it work well with huge files? 1GB+ etc? Will a 1-byte change mean
complete download to all devices?

2) Does it work well with 100k small files in deeply nested folders?

3) Will you charge for software and/or support?

4) What happens when one of the devices doesn't have enough storage? 4GB SSD
laptop vs. 100GB HDD.

5) Will any of my computers have to be up 24/7?

~~~
yurisagalov
1\. It should work great for huge files, especially over LANs. Right now a
1byte change will mean a re-download, but one of the first things we're doing
is introduce diff-based downloads :)

2\. It should

3\. We're not 100% sure how we're going to charge for it yet, but anyone who
signs up for the beta will be grandfathered into whatever system we end up
using (i.e. you won't have to pay)

4\. I'm actually going to address this in a separate post, but we've designed
it in such a way where you'll actually stream files from one device to the
other (based on a least-recently-used policy), so in effect you should have
access to ~104GB of data.

5\. This relates to #4, but is up to you, largely. If you have computers that
are up 24/7, great. If you don't, you can leverage our cloud servers for
better availability.

~~~
derefr
#4 is interesting—but how do you define "devices?"

I have, say, 500GB of music. It doesn't fit on my laptop, so it "lives" on an
external HD. Could it sync the laptop with the HD so that I always still have
access to a set chunk—say, 10GB—of my most-recently-used music files? This is
a capability I've been waiting for _something_ to support ever since I bought
the drive.

~~~
yurisagalov
a device is anything that can run AeroFS. We can (and plan) to do exactly what
you described for caching data in between _devices_, but we haven't really
considered the use case where you may want access to your most recently used
external HDD data (in this case, the external HDD is really more like a part
of your laptop device)

~~~
anthonyb
Could you run multiple instances of AeroFS on the one device and assign
different stores to each? That might be a reasonable workaround, assuming that
it's not more work than doing it "properly".

~~~
weihan
Actually in theory AeroFS can run multiple instances on one device, and we are
actually thinking it as the workaround :)

------
Goosey
This is just me throwing out a toy idea here, but.. Consider that most
computers have large chunks of unused space. Now draw a parallel to projects
like Folding@Home which make use of unused CPU cycles for computation.
Wouldn't it be nice to make use of the world's unused storage in exchange for
providing your unused storage? This made me think of that as it sounds like
this 'p2p supercloud' could be just another peer type. You opt in to share
100GB == you get 80GB of redundant off-site backup for free type system.

Now, I am not asking you to implement this peer type (although that would rock
my world if you did), but would it be possible for someone to implement it
themselves? In other words will you be providing a 'peer API'?

~~~
sliverstorm
80GB of redundant storage = 160GB at least. More than that, because you can't
count on any one node almost at all, so you need more than two copies.

This means you should get more like 10-20GB per 100GB you commit, otherwise
the cloud simply will not have enough space.

Then if you consider that even with many nodes containing your data, there is
a decent chance all of them go offline at a certain time. You have to have
many, many nodes for the odds to be small enough. Which means the best
solution is to use the storage you committed as one of the nodes, so it is
always available to you. Then, it really transforms into a cloud backup
system, rather than a cloud file system.

~~~
zhyder
You wouldn't store full copies, you'd stripe it across multiple machines using
some type of error correction coding, like Reed-Solomon, which has less than
2x overhead.

~~~
sliverstorm
I'm thinking of it in terms of RAID.

You are talking about RAID5. However, RAID5 is useless if more than a few
disks go offline at the same time.

RAID1/10 is most useful when there's a higher chance of multiple disks failing
at a time, or when the odds of multiple disks failing in your RAID5, while
low, are unacceptable.

Of course there are other things at work when you talk RAID0/5/10, but this is
a large part of it.

------
blocke
"Each AeroFS device has its own 1024bit RSA key pair, which is certified by us
to be authentic."

That suspiciously reads like the AeroFS people get a copy of your key. If
that's the case then it's only marginally more secure than DropBox. Hope I'm
reading that wrong...

~~~
yurisagalov
We don't get a copy of your private key (neither should anyone else, ever). We
do get a copy of your public key, to certify it (we use OpenSSL's CA)

~~~
blocke
So how do you "invite" someone? Swap public keys?

~~~
weihan
We generate a temporary password for the user being invited and encode it in
the invitation code sent to the user's email address. We use this temp pass to
verify the user when he/she signs up and destroy the pass immediately after.
During initial setup, the user's device generates its own public key pair and
sends a CSR (certify signing request) to us for certification.

------
j_baker
I'm curious to know more about the technical details about how this works.
Like what protocols and technologies you're using. If that information isn't
too sensitive that is. :-)

~~~
weihan
We developed a lot of the protocols and technologies ourselves, and could talk
about them for hours :) Let me know if you have a specific area you want me to
discuss.

~~~
j_baker
Not necessarily anything in particular. Just a big picture overview of how the
technology works.

~~~
weihan
In short, AeroFS is a decentralized data management system running on top of
p2p overlay networking.

The overlay network layer presents to the data management layer a transport-
agnostic view of the Internet, and addresses peers using network-independent
identifiers. In this way, data management can talk to any peer regardless of
network topologies and firewall restrictions, as if the world is flat :)

The data management layer controls data versioning and update propagation in a
fully decentralized way. As I described in another comment, we use version-
vector-like data structures to track versions and mange conflicts. We use
modified epidemic algorithms (<http://portal.acm.org/citation.cfm?id=41841>)
for fast update propagation. AeroFS distinguish between peers and super peers.
Super peers can help update propagation and peer communication in many ways.

------
endlessvoid94
I think the consumer market for this doesn't care how their files are backed
up. They just want it to work.

~~~
yurisagalov
RE: "it just works" - you're right of course, and we're definitely trying to
make it as simple as possible.

There are some features we're going to implement down the road that can be
done better with p2p solutions though (aggregated storage across devices, for
example), so I hope you give us a chance! :)

~~~
StavrosK
Yeah, if your thing means I can sync my data across computers without you
seeing any of it, I'm never using Dropbox again.

------
slantyyz
This offering sounds like a hybrid of Crashplan and Dropbox.

Could be quite good for the use cases they specify.

~~~
SnowLprd
Absolutely. This project has the potential to remedy a problem that, to date,
I have not been able to solve.

I have computers in multiple geographically-diverse locations and need a large
amount data (terabytes) to always appear in each location. Other requirements:

1\. Direct sync between my devices, with no third-party cloud involved

2\. Fast local sync when two devices are on the same subnet are detected

3\. Since individual files can be 20 GB in size, interrupted synchronizations
should automatically resume when the connection is re-established (without
having to start over from the beginning)

4\. Encrypted transmission of data, but not encrypted on disk

5\. When renaming or moving a file from one folder to another, the system
should be smart enough to detect that there's no need to re-transmit that file
(i.e., it just needs to rename it or move it to the new location on all other
devices)

6\. Ability to throttle upstream/downstream bandwidth on a per-device basis

Neither Dropbox nor CrashPlan -- nor any other tool -- has been able to meet
all of these requirements.

In short, this is a very exciting and welcome development. I sincerely hope
that this problem will soon be solved!

~~~
yurisagalov
Just as a quick note: 1,2,4,5 are already done on our end. #3 and #6 are in
our top priority list and should be done soon

If you want to chat more about your particular needs/use case, give me a shout
at yuri@aerofs.com, I'd love talk!

------
antgiant
Two questions

1) How does this system handle two devices behind separate NATs? (aka a work
device and a home device.)

2) What is the conflict resolution protocol if a file is modified in two or
more locations? (Newest wins, automatic duplication for manual resolution,
etc.)

~~~
weihan
1) we ICE/STUN as well as relay for firewall penetration. 2) we use a modified
version of version vectors (<http://en.wikipedia.org/wiki/Version_vector>) and
accompanying algorithms to detect and resolve conflicts. In a decentralized
system, conflict management boils down to managing causal relationship between
distributed updates, and version vector was invented just for that :)

~~~
antgiant
Thank you.

However, having looked at the wikipedia page on Version Vectors it appears
that is a protocol for detecting conflicts. I was interested in how you
resolve them.

A simple example is a zip file that I add file A to on one computer and later
file B to on another computer. When I sync up do I end up with a zip
containing no new files, file A, file B, both files or a corrupt zip file.
(Does the answer change if the zip file is encrypted?)

~~~
weihan
I see. There are two categories of conflicts to resolve: meta conflicts (like
when you rename a file to "foo" on device A and meanwhile rename it to "bar"
on B) and data conflicts (i.e. the example you gave).

We will formally describe meta conflict resolution in a separate post. Because
resolution for data conflicts is very application specific, we will publish an
API to allow application developers to write their own conflict resolvers.
Meanwhile, we will try to provide resolvers for popular file types by default.

From the end user's view, in most cases conflicts are automatically resolved
without being noticed. User intervention is required if automatic resolution
fails or the user wants to manually merge.

~~~
antgiant
Thank you. That's what I wondered.

------
blocke
On another note, without yet seeing the software, I'd assume the Mac and Linux
ports are both using FUSE. Is there a FUSE alternative for Windows yet?

~~~
weihan
Yes. We use CBFS (<http://www.eldos.com/cbfs/>) for Windows.

~~~
blocke
Ah, sadly nothing for a weekend hacker but nice to know it exists.

I'd guess anyone who spent time figuring out how to do Windows filesystems
would want to be paid for the trauma.

Thanks for the pointer and good luck with the project, I've signed up for an
invite. :)

~~~
weihan
There are a few (Google "fuse windows"). In particular, Dokan (<http://dokan-
dev.net/en/>) seems to be a good one. The project is quite active recently.

~~~
fragmede
Just wanted to point out that Dokan has its own version of SSHFS -
<http://dokan-dev.net/en/2010/01/18/open-source-dokan-sshfs/>

------
rlpb
So it's Dropbox but without the requirement for a central cloud component? Is
there anything I've missed?

~~~
weihan
Exactly. In addition, because cloud servers to us are merely a "super peer",
AeroFS offers a superset of what a cloud-based solution can provide.

------
fname
I like the "Chat with the Founders" on the website... Is that something you
built?

EDIT: Thanks guys.

~~~
yurisagalov
We actually use olark.com for that! It's been super helpful today in
responding to comments directly on the website

------
seancron
I see your company doc on "How To Earn HN Karma" has worked out well. Now if
only you could turn karma into money...

Edit: This is a reference to the picture on their signup page.

------
yason
Can I share stuff with my friends? Suppose they already host my backups and I
host theirs, can I easily give them access to some of my files?

~~~
yurisagalov
You can definitely share stuff with friends. Strictly speaking, right now your
friends don't "host your backup" _unless_ you share with them.

~~~
mcritz
Can you go into a little more detail about sharing? Are my files kept on my
devices when shared or does it implicitly mean shared files are read/write
accessible by others? If so, is there a way of managing permissions?

~~~
weihan
We have implemented full-fledge access control including file ownership,
read/write permissions on data/metadata, list/add/remove permissions on
directories, etc. But we disabled it from the interface to keep user
experience as simple as possible.

Later on we may enable them based on use cases and user feedback. Our API will
include ACL management as well. Currently files are read/write accessible once
shared.

------
santimt
Does it use P2P transfer or the owner of a file should update as many copies
of the file as peers syncing?

~~~
weihan
It uses p2p syncing: Any device that was invited and has sufficient privileges
on a file can serve the file to peers.

------
tsmith
Way cool guys, and way to represent the Toronto startup scene!

BTW, loved the reference to HN karma in the screenshot!

------
bobf
File syncing that includes mobile devices seems to becoming increasingly
important as their storage space grows. I'm excited about AeroFS and am
looking forward to seeing more posts about the technical aspects.

------
martyhu
What about performance? I assume since many routers these days use asymmetric
dsl, download/upload from hosts may be poor if only a few hosts are involved,
and the hosts use ADSL.

------
hassenben
Could someone clarify the difference between aerofs and rsync? What can aerofs
do that rsync doesn't?

------
sabat
Anyone have invites to give out?

~~~
santimt
You can sign up in the web page to get (i hope) an invite

~~~
sabat
I'd already done that, but I'm impatient. :-)

------
c00p3r
a wrapper on top of git, I guess? ^_^

~~~
yurisagalov
Actually, no ;) We wrote most of the underlying magic ourselves

