Hacker Newsnew | comments | ask | jobs | submitlogin
Tell HN: AeroFS - File Syncing Without Servers (aerofs.posterous.com)
242 points by yurisagalov 41 days ago | comments




31 points by chime 41 days ago | link

So many questions...

1) Does it work well with huge files? 1GB+ etc? Will a 1-byte change mean complete download to all devices?

2) Does it work well with 100k small files in deeply nested folders?

3) Will you charge for software and/or support?

4) What happens when one of the devices doesn't have enough storage? 4GB SSD laptop vs. 100GB HDD.

5) Will any of my computers have to be up 24/7?

reply

34 points by yurisagalov 41 days ago | link

1. It should work great for huge files, especially over LANs. Right now a 1byte change will mean a re-download, but one of the first things we're doing is introduce diff-based downloads :)

2. It should

3. We're not 100% sure how we're going to charge for it yet, but anyone who signs up for the beta will be grandfathered into whatever system we end up using (i.e. you won't have to pay)

4. I'm actually going to address this in a separate post, but we've designed it in such a way where you'll actually stream files from one device to the other (based on a least-recently-used policy), so in effect you should have access to ~104GB of data.

5. This relates to #4, but is up to you, largely. If you have computers that are up 24/7, great. If you don't, you can leverage our cloud servers for better availability.

reply

27 points by sabat 41 days ago | link

Your headline is ingenious: "We may be late to the party, but we brought a keg."

reply

5 points by adamgravitis 41 days ago | link

That's pretty much Yuri's modus operandi.

reply

5 points by unavoidable 41 days ago | link

Ain't that true. I think Yuri owes us a keg next time we see him.

reply

2 points by samvj 41 days ago | link

second that =)

reply

3 points by derefr 41 days ago | link

#4 is interesting—but how do you define "devices?"

I have, say, 500GB of music. It doesn't fit on my laptop, so it "lives" on an external HD. Could it sync the laptop with the HD so that I always still have access to a set chunk—say, 10GB—of my most-recently-used music files? This is a capability I've been waiting for something to support ever since I bought the drive.

reply

1 point by yurisagalov 41 days ago | link

a device is anything that can run AeroFS. We can (and plan) to do exactly what you described for caching data in between _devices_, but we haven't really considered the use case where you may want access to your most recently used external HDD data (in this case, the external HDD is really more like a part of your laptop device)

reply

1 point by anthonyb 41 days ago | link

Could you run multiple instances of AeroFS on the one device and assign different stores to each? That might be a reasonable workaround, assuming that it's not more work than doing it "properly".

reply

1 point by weihan 41 days ago | link

Actually in theory AeroFS can run multiple instances on one device, and we are actually thinking it as the workaround :)

reply

2 points by LiveTheDream 41 days ago | link

The streaming part is neat, providing access to data well beyond the storage capacity of your current machine. ZumoDrive (former YC company) also focuses on the streaming aspect.

reply

2 points by hxr 41 days ago | link

I am working on something exactly similar in my free time as a side project. Nothing to show yet, but I am currently playing with rsync to support incremental diff-based syncs.

Will your program be open source?

reply

2 points by weihan 41 days ago | link

yeah rsync is the de-facto standard for delta sync I think :) Open API and/or open source part of AeroFS are in our plans.

reply

9 points by Goosey 41 days ago | link

This is just me throwing out a toy idea here, but.. Consider that most computers have large chunks of unused space. Now draw a parallel to projects like Folding@Home which make use of unused CPU cycles for computation. Wouldn't it be nice to make use of the world's unused storage in exchange for providing your unused storage? This made me think of that as it sounds like this 'p2p supercloud' could be just another peer type. You opt in to share 100GB == you get 80GB of redundant off-site backup for free type system.

Now, I am not asking you to implement this peer type (although that would rock my world if you did), but would it be possible for someone to implement it themselves? In other words will you be providing a 'peer API'?

reply

4 points by sliverstorm 41 days ago | link

80GB of redundant storage = 160GB at least. More than that, because you can't count on any one node almost at all, so you need more than two copies.

This means you should get more like 10-20GB per 100GB you commit, otherwise the cloud simply will not have enough space.

Then if you consider that even with many nodes containing your data, there is a decent chance all of them go offline at a certain time. You have to have many, many nodes for the odds to be small enough. Which means the best solution is to use the storage you committed as one of the nodes, so it is always available to you. Then, it really transforms into a cloud backup system, rather than a cloud file system.

reply

5 points by zhyder 41 days ago | link

You wouldn't store full copies, you'd stripe it across multiple machines using some type of error correction coding, like Reed-Solomon, which has less than 2x overhead.

reply

1 point by sliverstorm 40 days ago | link

I'm thinking of it in terms of RAID.

You are talking about RAID5. However, RAID5 is useless if more than a few disks go offline at the same time.

RAID1/10 is most useful when there's a higher chance of multiple disks failing at a time, or when the odds of multiple disks failing in your RAID5, while low, are unacceptable.

Of course there are other things at work when you talk RAID0/5/10, but this is a large part of it.

reply

11 points by mkuhn 41 days ago | link

You should take a look at http://www.wuala.com

reply

2 points by dwiel 41 days ago | link

I've used wuala for linux and have found it quite rough around the edges.

  * not completely decentralized or open source.  If wuala goes out of business, your data may not be recoverable.
  * web interface is lacking (poor folder navigation/listing)
  * doesn't work without X on linux
  * even with X, not all features are available through the command line or API, though some are
  * the interface that is provided is clunky
  * the status messages leave me wondering where in the process an update is.  If a piece of software can't reliably tell me where it is in a process, I can't trust that the process is happening the way I expect.

reply

1 point by mkuhn 41 days ago | link

Yep, Wuala is not completely open source and it is a business. Your worry that they might go out of business soon is mitigated quite a bit by the fact that they have been acquired by LaCie [1] over a year ago.

I suppose Linux definitely isn't their main market and while the Web Interface is lacking at the moment they are working on an overhaul.

[1] http://eu.techcrunch.com//2009/03/19/wuala-merges-with-lacie...

reply

1 point by aik 41 days ago | link

I agree, though I believe they're working on it. The technology is very impressive if you look into it.

Concerning it not being completely decentralized - I consider it a plus since that fact ensures greater reliability.

reply

2 points by ajb 41 days ago | link

Sounds like tahoe: http://tahoe-lafs.org/trac/tahoe-lafs

reply

1 point by jodrellblank 41 days ago | link

I wonder about this too, but in offices where bandwidth is LAN speed and uptime predictable, and disks unlikely to be full of personal media and larger than necessary.

Although I have no good ideas where it would be beneficial apart from being fun.

reply

2 points by yason 41 days ago | link

Of all things that modern computers have plenty of, it's hard disks I've rarely seen not being almost full...

reply

7 points by j_baker 41 days ago | link

I'm curious to know more about the technical details about how this works. Like what protocols and technologies you're using. If that information isn't too sensitive that is. :-)

reply

12 points by weihan 41 days ago | link

We developed a lot of the protocols and technologies ourselves, and could talk about them for hours :) Let me know if you have a specific area you want me to discuss.

reply

2 points by j_baker 41 days ago | link

Not necessarily anything in particular. Just a big picture overview of how the technology works.

reply

19 points by weihan 41 days ago | link

In short, AeroFS is a decentralized data management system running on top of p2p overlay networking.

The overlay network layer presents to the data management layer a transport-agnostic view of the Internet, and addresses peers using network-independent identifiers. In this way, data management can talk to any peer regardless of network topologies and firewall restrictions, as if the world is flat :)

The data management layer controls data versioning and update propagation in a fully decentralized way. As I described in another comment, we use version-vector-like data structures to track versions and mange conflicts. We use modified epidemic algorithms (http://portal.acm.org/citation.cfm?id=41841) for fast update propagation. AeroFS distinguish between peers and super peers. Super peers can help update propagation and peer communication in many ways.

reply

3 points by hxr 41 days ago | link

A lot of research went into building group communication tool kits like Spread and Ensemble. Today they are being used in server/cluster environments. I was playing with re-purposing one of these (JGroups) for my pet project. The toolkit implements the abstractions I need (A peer communication Channel, peer discovery etc.). It also provides various protocol stacks so I can use a different one for streaming movies vs. syncing files.

reply

7 points by endlessvoid94 41 days ago | link

I think the consumer market for this doesn't care how their files are backed up. They just want it to work.

reply

6 points by yurisagalov 41 days ago | link

RE: "it just works" - you're right of course, and we're definitely trying to make it as simple as possible.

There are some features we're going to implement down the road that can be done better with p2p solutions though (aggregated storage across devices, for example), so I hope you give us a chance! :)

reply

7 points by StavrosK 41 days ago | link

Yeah, if your thing means I can sync my data across computers without you seeing any of it, I'm never using Dropbox again.

reply

5 points by Goosey 41 days ago | link

On the other hand, I do. You don't have to be as big to be the biggest fish in a smaller pond. (Or as 'The Dip' puts it, to be the best in the world start by shrinking the world)

reply

2 points by aik 41 days ago | link

The business or enterprise markets are massive though.

reply

4 points by fname 41 days ago | link

I like the "Chat with the Founders" on the website... Is that something you built?

EDIT: Thanks guys.

reply

6 points by yurisagalov 41 days ago | link

We actually use olark.com for that! It's been super helpful today in responding to comments directly on the website

reply

4 points by prabodh 41 days ago | link

It is Olark http://www.olark.com/

reply

5 points by rlpb 41 days ago | link

So it's Dropbox but without the requirement for a central cloud component? Is there anything I've missed?

reply

10 points by weihan 41 days ago | link

Exactly. In addition, because cloud servers to us are merely a "super peer", AeroFS offers a superset of what a cloud-based solution can provide.

reply

2 points by Osiris 41 days ago | link

No storage limitations like Dropbox.

reply

7 points by slantyyz 41 days ago | link

This offering sounds like a hybrid of Crashplan and Dropbox.

Could be quite good for the use cases they specify.

reply

12 points by SnowLprd 41 days ago | link

Absolutely. This project has the potential to remedy a problem that, to date, I have not been able to solve.

I have computers in multiple geographically-diverse locations and need a large amount data (terabytes) to always appear in each location. Other requirements:

1. Direct sync between my devices, with no third-party cloud involved

2. Fast local sync when two devices are on the same subnet are detected

3. Since individual files can be 20 GB in size, interrupted synchronizations should automatically resume when the connection is re-established (without having to start over from the beginning)

4. Encrypted transmission of data, but not encrypted on disk

5. When renaming or moving a file from one folder to another, the system should be smart enough to detect that there's no need to re-transmit that file (i.e., it just needs to rename it or move it to the new location on all other devices)

6. Ability to throttle upstream/downstream bandwidth on a per-device basis

Neither Dropbox nor CrashPlan -- nor any other tool -- has been able to meet all of these requirements.

In short, this is a very exciting and welcome development. I sincerely hope that this problem will soon be solved!

reply

8 points by yurisagalov 41 days ago | link

Just as a quick note: 1,2,4,5 are already done on our end. #3 and #6 are in our top priority list and should be done soon

If you want to chat more about your particular needs/use case, give me a shout at yuri@aerofs.com, I'd love talk!

reply

1 point by codemechanic 41 days ago | link

Already Tonido meets all this requirements. How it is new? http://www.tonido.com

reply

3 points by seancron 41 days ago | link

I see your company doc on "How To Earn HN Karma" has worked out well. Now if only you could turn karma into money...

Edit: This is a reference to the picture on their signup page.

reply

5 points by blocke 41 days ago | link

"Each AeroFS device has its own 1024bit RSA key pair, which is certified by us to be authentic."

That suspiciously reads like the AeroFS people get a copy of your key. If that's the case then it's only marginally more secure than DropBox. Hope I'm reading that wrong...

reply

10 points by yurisagalov 41 days ago | link

We don't get a copy of your private key (neither should anyone else, ever). We do get a copy of your public key, to certify it (we use OpenSSL's CA)

reply

5 points by blocke 41 days ago | link

So how do you "invite" someone? Swap public keys?

reply

13 points by weihan 41 days ago | link

We generate a temporary password for the user being invited and encode it in the invitation code sent to the user's email address. We use this temp pass to verify the user when he/she signs up and destroy the pass immediately after. During initial setup, the user's device generates its own public key pair and sends a CSR (certify signing request) to us for certification.

reply

4 points by antgiant 41 days ago | link

Two questions

1) How does this system handle two devices behind separate NATs? (aka a work device and a home device.)

2) What is the conflict resolution protocol if a file is modified in two or more locations? (Newest wins, automatic duplication for manual resolution, etc.)

reply

6 points by weihan 41 days ago | link

1) we ICE/STUN as well as relay for firewall penetration. 2) we use a modified version of version vectors (http://en.wikipedia.org/wiki/Version_vector) and accompanying algorithms to detect and resolve conflicts. In a decentralized system, conflict management boils down to managing causal relationship between distributed updates, and version vector was invented just for that :)

reply

4 points by antgiant 41 days ago | link

Thank you.

However, having looked at the wikipedia page on Version Vectors it appears that is a protocol for detecting conflicts. I was interested in how you resolve them.

A simple example is a zip file that I add file A to on one computer and later file B to on another computer. When I sync up do I end up with a zip containing no new files, file A, file B, both files or a corrupt zip file. (Does the answer change if the zip file is encrypted?)

reply

3 points by weihan 41 days ago | link

I see. There are two categories of conflicts to resolve: meta conflicts (like when you rename a file to "foo" on device A and meanwhile rename it to "bar" on B) and data conflicts (i.e. the example you gave).

We will formally describe meta conflict resolution in a separate post. Because resolution for data conflicts is very application specific, we will publish an API to allow application developers to write their own conflict resolvers. Meanwhile, we will try to provide resolvers for popular file types by default.

From the end user's view, in most cases conflicts are automatically resolved without being noticed. User intervention is required if automatic resolution fails or the user wants to manually merge.

reply

1 point by antgiant 41 days ago | link

Thank you. That's what I wondered.

reply

4 points by blocke 41 days ago | link

On another note, without yet seeing the software, I'd assume the Mac and Linux ports are both using FUSE. Is there a FUSE alternative for Windows yet?

reply

6 points by weihan 41 days ago | link

Yes. We use CBFS (http://www.eldos.com/cbfs/) for Windows.

reply

4 points by blocke 41 days ago | link

Ah, sadly nothing for a weekend hacker but nice to know it exists.

I'd guess anyone who spent time figuring out how to do Windows filesystems would want to be paid for the trauma.

Thanks for the pointer and good luck with the project, I've signed up for an invite. :)

reply

6 points by weihan 41 days ago | link

There are a few (Google "fuse windows"). In particular, Dokan (http://dokan-dev.net/en/) seems to be a good one. The project is quite active recently.

reply

3 points by fragmede 41 days ago | link

Just wanted to point out that Dokan has its own version of SSHFS - http://dokan-dev.net/en/2010/01/18/open-source-dokan-sshfs/

reply

1 point by jodrellblank 40 days ago | link

There's one at http://www.eterlogic.com which at a glance looks free for non commercial use - Virtual Drive SDK.

reply

2 points by yason 41 days ago | link

Can I share stuff with my friends? Suppose they already host my backups and I host theirs, can I easily give them access to some of my files?

reply

1 point by yurisagalov 41 days ago | link

You can definitely share stuff with friends. Strictly speaking, right now your friends don't "host your backup" _unless_ you share with them.

reply

2 points by mcritz 41 days ago | link

Can you go into a little more detail about sharing? Are my files kept on my devices when shared or does it implicitly mean shared files are read/write accessible by others? If so, is there a way of managing permissions?

reply

2 points by weihan 41 days ago | link

We have implemented full-fledge access control including file ownership, read/write permissions on data/metadata, list/add/remove permissions on directories, etc. But we disabled it from the interface to keep user experience as simple as possible.

Later on we may enable them based on use cases and user feedback. Our API will include ACL management as well. Currently files are read/write accessible once shared.

reply

3 points by tsmith 41 days ago | link

Way cool guys, and way to represent the Toronto startup scene!

BTW, loved the reference to HN karma in the screenshot!

reply

2 points by sabat 41 days ago | link

Anyone have invites to give out?

reply

3 points by santimt 41 days ago | link

You can sign up in the web page to get (i hope) an invite

reply

4 points by sabat 41 days ago | link

I'd already done that, but I'm impatient. :-)

reply

3 points by santimt 41 days ago | link

Does it use P2P transfer or the owner of a file should update as many copies of the file as peers syncing?

reply

4 points by weihan 41 days ago | link

It uses p2p syncing: Any device that was invited and has sufficient privileges on a file can serve the file to peers.

reply

2 points by bobf 41 days ago | link

File syncing that includes mobile devices seems to becoming increasingly important as their storage space grows. I'm excited about AeroFS and am looking forward to seeing more posts about the technical aspects.

reply

1 point by martyhu 41 days ago | link

What about performance? I assume since many routers these days use asymmetric dsl, download/upload from hosts may be poor if only a few hosts are involved, and the hosts use ADSL.

reply

0 points by hassenben 40 days ago | link

Could someone clarify the difference between aerofs and rsync? What can aerofs do that rsync doesn't?

reply

1 point by c00p3r 41 days ago | link

a wrapper on top of git, I guess? ^_^

reply

2 points by yurisagalov 41 days ago | link

Actually, no ;) We wrote most of the underlying magic ourselves

reply




Lists | RSS | Search | Bookmarklet | Guidelines | FAQ | News News | Feature Requests | Y Combinator | Apply | Library

Analytics by Mixpanel