

ClearSkies – open-source file syncing without cloud - urza
https://github.com/jewel/clearskies

======
nl
Works well for me. For those who want to try it, this is what I did.

Create a Dockerfile:

    
    
      # Ubuntu 12.04 + Python + Git
      FROM nlothian/python-git
    
      # Ruby
      RUN apt-get -y install libgnutls26 ruby1.9.1
      RUN apt-get -y install ruby1.9.1-dev
    
      RUN gem install rb-inotify ffi
    
      RUN git clone https://github.com/jewel/clearskies
    

Build:

    
    
      sudo docker build -t clearskies .
    

Run:

    
    
      sudo docker run -i -t clearskies /bin/bash
    

Then:

    
    
      mkdir /testdir
      echo 'testing' > afile
      cd /clearskies
      ./clearskies start
      ./clearskies share /testdir
    

That will print something out in the form:

    
    
      clearskies:SYNCXXXXXXXXXXXXXXXXXXXXXXXX
    

Note this, then start another clearskies docker container.

In that one:

    
    
      mkdir /testdir
      cd /clearskies
      ./clearskies start
      ./clearskies attach clearskies:SYNCXXXXXXXXXXXXXXXXX /sharedir
    

Wait a few seconds, and your file should appear.

~~~
nl
Hmm.

There do seem to be problems with some firewall scenarios (or at least I
presume that is what it is..)

When I setup a node at home and one on an Azure box I see them discover each
other, but no files seem to get copied.

~~~
jewel
If you'll open an issue on github and attach the logs from both peers, I can
look at it.

------
maho
I like that "untrusted peers" are defined in the protocol, but unfortunately
it's only an optional addendum [1]. To me, untrusted peering is the most
important feature and I hope that not being defined in the main protocol does
not mean it will be step-mothered in the implementation.

BitTorrent Sync only supports untrusted peers via API [2], and the only other
open-source BitTorrent Sync alternative that I am aware of [3] left it out
completely.

[1]
[https://github.com/jewel/clearskies/blob/master/protocol/unt...](https://github.com/jewel/clearskies/blob/master/protocol/untrusted.md)

[2]
[http://www.bittorrent.com/intl/de/sync/developers/api](http://www.bittorrent.com/intl/de/sync/developers/api)

[3]
[https://github.com/calmh/syncthing/wiki](https://github.com/calmh/syncthing/wiki)

~~~
atmosx
What exactly is the function of untrusted peers? Speed up the sync?

~~~
jffry
To use a friend's machine as an off-site secondary backup. They can store
encrypted data, but cannot view or push changes to the data. I'm guessing.

~~~
toomuchtodo
If its untrusted, how can it be a canonical reference?

~~~
scott_karana
Cryptographic hashes and signatures, I presume?

------
rahimnathwani
How do the peers find each other? The installation instructions suggest that I
just need to run 'clearskies share ...' on one computer, and 'clearskies
attach ...' on the second computer. How will they find each other (on a LAN
or, especially, if they're each behind different NAT routers)?

I see that there is some 'tracker' code in the
repo:[https://github.com/jewel/clearskies/tree/master/tracker](https://github.com/jewel/clearskies/tree/master/tracker)

Must I run that somewhere that both computers can access, and tell them its
address?

~~~
e12e
Looks like there are various modes of discovery, see under "Peer discovery":

[https://github.com/jewel/clearskies/blob/master/protocol/cor...](https://github.com/jewel/clearskies/blob/master/protocol/core.md)

I can't find any references to DHT in the code (but the protocol lists that as
an extension).

For lan udp broadcast:

[https://github.com/jewel/clearskies/blob/master/lib/broadcas...](https://github.com/jewel/clearskies/blob/master/lib/broadcaster.rb)

For tracker client (apparently gets a list of tracker URIs from the config):

[https://github.com/jewel/clearskies/blob/master/lib/tracker_...](https://github.com/jewel/clearskies/blob/master/lib/tracker_client.rb)

[https://github.com/jewel/clearskies/blob/master/lib/conf.rb](https://github.com/jewel/clearskies/blob/master/lib/conf.rb)

------
jedc
Another way to do file syncing without a particular cloud provider is
Camlistore: [http://camlistore.org/](http://camlistore.org/)

Brad Fitzpatrick is one of the creators (LiveJournal, memcache, etc.) and it's
rapidly getting better and better. That said, it can still be a bit tricky to
get everything set up.

~~~
MoosePlissken
Camlistore looks really cool, thanks for posting it. I've been thinking
something like this should exist for some time now.

------
skimmas
Finally... :) I've been waiting for a project like this to appear for ages...
thankyou OSS developer angels.

------
PeterSmit
This is excellent. I have been waiting for an open source btsync clone.

~~~
thejosh
gitannex? :)

~~~
rquirk
Have you been able to use it? It is less user friendly than the proprietary
solutions. There's tons of documentation, but nothing that says "here's how to
use it like bittorrent sync". In fact IIRC the docs specifically say somewhere
"you cannot use this like drop box".

Just sharing stuff between 2 PCs was very difficult (or I couldn't figure it
out) and the annex program sat at 100% CPU most of the time doing nothing.
Being written in Haskell is a turn off too. If I have to fix something, I want
C, python, etc, not this crazy write-only language :-)

~~~
icebraining
Have you tried the Assistant and its web UI? It comes bundled with git-annex:
[http://git-annex.branchable.com/assistant/](http://git-
annex.branchable.com/assistant/)

~~~
rquirk
I did. Maybe I should give it another shot.

I recall my problem was trying to understand how to sync files I already had
in other directories. Things certainly were not sync'd automatically. In the
walkthrough it says you need to git-add files and then git-commit them
[http://git-annex.branchable.com/walkthrough/#index3h2](http://git-
annex.branchable.com/walkthrough/#index3h2)

The ~/annex/ directory ended up with symlinks to git objects and the files
themselves are nowhere to be found. I didn't know where things were/weren't
sync'd already. Nothing sync'd across and the assistant just said "all done"
or something similar. At one point I remeber it just containing a bunch of
broken symlinks. Good job I was just testing it out, imagine if it replaced my
actual files with broken symlinks.

What all this boils down to is that git-annex is not as fool-proof as the
proprietary solutions claim to be and something equally Free as g-a, but less
complicated, would be great.

------
rahimnathwani
Dev discussion group here:
[https://groups.google.com/forum/#!forum/clearskies-
dev](https://groups.google.com/forum/#!forum/clearskies-dev)

------
beagle3
The JSON packets are limited to 16MB - if this is supposed to contain a
manifest of a deep-directory-with-lots-of-files, that might not be enough. I
regularly rsync (on a lan) trees of a million files and 8 levels deep. The
manifest for such a configuration will not fit within 16MB.

I see there's an "rdiff manifest" extension, which is cool for syncing later
changes - but the initial manifest will have to be transferred some other way.

~~~
jewel
This is handled by the protocol; see
[https://github.com/jewel/clearskies/blob/master/protocol/cor...](https://github.com/jewel/clearskies/blob/master/protocol/core.md#large-
manifests).

As an aside, I am currently adding a more sophisticated manifest exchange in
the "protocol_cleanup" branch that will remove the need to keep sending the
entire manifest (other than on the first connection).

------
polskibus
I use nas4free + samba/CIFS daily, works like a charm. On each client (mobile,
PC) I have an application that performs regular backups. In terms of
alternatives, there's also OwnCloud that can be deployed in-house.

How does ClearSkies compare to existing private cloud solutions?

~~~
jewel
ClearSkies doesn't require a central server like OwnCloud, it's peer-to-peer.

------
slavko
Incredible! The OP is using SQLite, particularly, Fossil, [http://fossil-
scm.org/](http://fossil-scm.org/). It upsets me that this this open source
venture does not give credit where it is due. If I a wrong, please correct me.

~~~
jewel
I'm confused. The protocol spec doesn't mention SQLite. The ruby proof-of-
concept doesn't use SQLite. I've not seen fossil before that I can recollect
(I am the author of the clearskies protocol spec).

------
jmspring
Granted, I'm on mobile, but how does this differ from BTSync for two
computers? There needs to be some agent proxying discovery if machines aren't
on the same local network.

~~~
p4bl0
I think the goal is to have a free (as in freedom) BTSync.

------
tjaerv
Potentially interesting, but my interest waned considerably when I saw it's
GPLv3.

~~~
welly
Really? Because of the license, you're not interested in this project?

I find that very, very bizarre. Can you explain your thinking?

~~~
euank
Not him, but I can see a few good reasons.

GPL is rather limiting in how you can use the code. A large company (e.g.
Apple) won't let something gpl be used heavily internally if they can help it
since they won't be able to apply patches or modify it without releasing these
changes ... this obligation adds significant legal burden and, furthermore,
releasing the changes could reveal private details about the companies
internals.

I don't think that the commentor's reason is a good one, but I can understand
the viewpoint; GPLv3 is quite limiting for some uses.

~~~
chrj
As long as it's only used and distributed internally, I don't believe the GPL
has a problem with them modifying it without disclosing their changes. That's
my understanding.

------
atmosx
Why does the author wants to port the daemon from ruby to C++? Is it speed?

~~~
jewel
Speed isn't actually too much of an issue with the ruby client, since all the
CPU time is spent in the GnuTLS and Digest::SHA256 code, which is already
written in C.

The problem is portability to android and iOS. We additionally want to make
the core easy to embed in other applications.

------
sarreph
I love the name for this. :)

