

Show HN: Pullbox – A dead-simple dropbox alternative using Git - prashanthellina
https://github.com/prashanthellina/pullbox

======
bantunes
Git is no good for binaries. And SparkleShare is dead. The proper alternative
is SyncThing [https://syncthing.net/](https://syncthing.net/) I have it on my
home and work laptops, a NAS and a VPS and it just works brilliantly.

~~~
dantillberg
Awww, dammit. Why did I never hear of this before? I just spent like 400 hours
building something equivalent but far less finished.

Syncthing has almost 10,000 stars, so it's not like others hadn't discovered
it yet. Is there a way for me to browse all the popular repos on Github so I
can see what other amazing things I've been missing? There's [0], but that
just shows me an utterly useless view of all the repos _I 've_ starred.

[0]:
[https://github.com/stars?direction=desc&sort=stars](https://github.com/stars?direction=desc&sort=stars)

~~~
Silvus
Try the trending page :
[https://github.com/trending?since=monthly](https://github.com/trending?since=monthly)

~~~
dantillberg
Thanks. Yeah, I guess that's not too bad (I'd overlooked it before because
"Trending" is a section I avoid (and/or install extensions to hide) on other
social sites).
[https://github.com/trending?l=go&since=monthly](https://github.com/trending?l=go&since=monthly)
shows syncthing. I'll have to go explore a bunch of the other listings. :)

------
dantillberg
Your story is very familiar to me, including trying and failing to use Dropbox
due to terrible symlink-handling!

I built something similar recently [0] (also a bit like Sparkleshare, which I
_just_ learned about from another comment in this thread...), but for me I
needed to sync all of my source code between multiple machines. You can't do
that with stock git, but I was able to s/git/gut/gi to make "gut" (which uses
.gut folders to store its own state and will happy gut-add files inside .git
subfolders) and now I have a "~/work" folder on my laptop, on my desktop, and
on a dev box on AWS, with all of my source code synced in real-time.

I also wrote much the same thing in Node and also in Python before rewriting
(again) to Go. You can peruse my Python code at [1] if you're curious. I used
plumbum (awesome library) for executing local/remote commands (makes it a bit
simpler than using subprocess and calling ssh with shell commands). Also, you
can use inotifywait with the "\--monitor" option which makes it keep running
when there are changes; you have to then parse out what changed, and make sure
you handle state correctly, but that way you don't miss changes made while
inotifywait starts back up again.

[0]: [https://github.com/tillberg/gut](https://github.com/tillberg/gut)

[1]:
[https://github.com/tillberg/gut/tree/0ce233d44f7f55448c15411...](https://github.com/tillberg/gut/tree/0ce233d44f7f55448c1541147c01136f0b28fa5f/gut)

~~~
rakoo
> What would happen if you took git, sed, rsync, and inotify, and you mushed
> and kneaded them together until smooth?

You'd get unison
([https://www.cis.upenn.edu/~bcpierce/unison/](https://www.cis.upenn.edu/~bcpierce/unison/))
:)

------
jamiesonbecker
Alternatives:

Git Annex[1]

Very robust and well-written (Haskell), and extremely fast. An extension
(daemon, client, and webui) to git. Gitlab uses this to provide large file
support (which is similar to Github's LFS)[2]. Author is a Debian maintainer
(and obviously a Haskell dev).

Sync Thing[3]

Fast and professional golang synchronizer with its own protocol[4]. Usually
compared to BTsync.

1\. [https://git-annex.branchable.com/videos/](https://git-
annex.branchable.com/videos/)

2\. [https://about.gitlab.com/2015/02/17/gitlab-annex-solves-
the-...](https://about.gitlab.com/2015/02/17/gitlab-annex-solves-the-problem-
of-versioning-large-binaries-with-git/)

3\. [https://syncthing.net/](https://syncthing.net/)

4\.
[https://github.com/syncthing/specs/blob/master/BEPv1.md](https://github.com/syncthing/specs/blob/master/BEPv1.md)

~~~
kevin
Linking to alternatives is not exactly constructive criticism for someone
looking for feedback from a Show HN. It feels dismissive and out of spirit of
what's trying to be accomplished here. Without any commentary or questions,
it's almost like you're saying, "Why did you bother?"

Now, I very much doubt that was your intention, but please have some empathy
for the person taking a chance when sharing something on Show HN. It's not
easy and I'd hate for someone to give up on something just because someone
linked to other things that already existed in the world. We wouldn't have
Dropbox or Google under those conditions.

At the very least ask the creator how his project is different from these
other projects or ask if he's seen these alternatives or even ask why he might
have decided to build his own. A little bit of effort on your end shows you
recognize his effort and isn't tone deaf to what's going on here.

~~~
jamiesonbecker
Point taken, and you're right. I'm extremely sensitive about my work too. My
deepest apologies to the OP!

~~~
prashanthellina
No apologies required! Thanks for taking the time to post the links. I
discovered "syncthing" via this thread and while it sucks that I did not come
across it before, I am excited to try it out.

~~~
jamiesonbecker
You're a champ mate. Try out git annex too. The presentation might put you off
a bit but it's really powerful.

------
kevin
Despite all the competitors everyone is linking to here, I bet you learned a
ton while building this. What was most surprising to you? What was the
trickiest part?

~~~
prashanthellina
It was a ton of fun on a saturday afternoon when I wrote this code. I was
trying to avoid rolling my own syncing solution but I guess my Google-fu
failed me :)

My goal was to write a very thin wrapper around the workflow I would follow if
I had to sync the changes myself manually. The tricky part was in figuring out
how to inform multiple client machines when the backup server noticed a change
in the file system. I wanted to avoid writing a server-side component that I
had to install on the server and maintain.

When I found that I could use a combination of "ssh" and "inotifywait" (run
inotifywait on the server using ssh from the client and listen for changes), I
was pleasantly surprised that this even worked! I see my implementation in
this aspect as the equivalent of AJAX long-polling that used to be applied for
chat like communication in the browser in some implementations. i.e. When some
modification happens on the server filesystem, the "inotifywait" command quits
thereby unblocking the "ssh" command upon which I do a "git pull".

Because of the above, I was able to keep my implementation really simple - The
whole functionality was achieved in under 300 lines of code.

------
jamiesonbecker
Really great code - clean and extensible! I also really like inotify for this
sort of code and actually use it myself to kick off my build scripts
automatically when changes are detected. BTW, have you run into too many
files/dirs issues with inotifywait?

~~~
prashanthellina
Thank you! I tried hard to keep the code very simple sacrificing some
20%-scenario requirements (such as automatic conflict resolution).

I am using this to synchronize my Markdown based notes files across machines.
There are under a hundred files now so I haven't hit any issues in that
department yet.

However, there is a gradual memory leak which persists even when I kill and
restart my process. I observed this only on a KVM based Linux guest - not sure
if this is because of inotify based listening but I'm going to have to dig
deeper to find out.

~~~
jamiesonbecker
Forking might be easier to debug.. but I think I might have missed some of the
functionality in your script.. something like (totally untested):

    
    
       (while true
            do
                inotifywait \
                    --exclude '.*\.sw.' \
                    --recursive  \
                    --event close_write  \
                    --event delete \
                    --event moved_to \
                    --event moved_from \
                       dir1/
                 git push
                 sleep 0.1
            done
        ) 2>&1 >/var/log/push_log.log &
       
       (while true
            do
              cat << 'EOF' | ssh remotebox
                inotifywait \
                    --exclude '.*\.sw.' \
                    --recursive  \
                    --event close_write  \
                    --event delete \
                    --event moved_to \
                    --event moved_from \
                       dir1/
                 git push #??
                 sleep 0.1

EOF

    
    
            # (second EOF cannot be indented)
            done
        ) 2>&1 >/var/log/pull_log.log &
    
        echo "Kicked off pull and push processes."
        wait

~~~
prashanthellina
That is Gold! Implementing the functionality in shell itself was what I
considered first however I needed two things

1\. Make it work on OSX 2\. Use a lock file to prevent multiple instances from
running when I put this in crontab

I found watchman[0]. Got to see if I can use that in place of "inotifywait".

[0]:
[https://facebook.github.io/watchman/docs/install.html](https://facebook.github.io/watchman/docs/install.html)

~~~
jamiesonbecker
Yeah that should work!

------
tckr
Did you had a look at Sparkleshare?
[https://news.ycombinator.com/item?id=1424299](https://news.ycombinator.com/item?id=1424299)

~~~
prashanthellina
I tried using Sparkleshare. It was buggy and kept crashing after being
unresponsive. When I checked the repo, I found that there wasn't much
activity.

------
orahlu
One drawback is that deleted files take disk space permanently.

------
LukeB_UK
How does it deal with conflicts?

~~~
prashanthellina
It is based on Git so uses the same mechanism. When a merge cannot be done
because of conflict, you can resolve in the usual git workflow.

~~~
dantillberg
It's also possible to specify other conflict resolution options, such as `git
merge --strategy=recursive --strategy-option=theirs` [0], which avoids the
need for manual conflict resolution (and it's "worked for me"). Discarded
hunks are still available in the repo history, in case you needed to restore a
file/hunk that was discarded in this manner.

[0]: [http://git-scm.com/docs/git-merge](http://git-scm.com/docs/git-merge)
(be sure to look for the "ours" option for the "recursive" strategy, not the
"ours" _strategy_ )

~~~
prashanthellina
Thanks for the tip. I wanted to keep my implementation extremely bare-bones
and simple so I chose to leave conflict resolution (which itself is a rare
event in my a use-case as I usually only make modifications on one synced
machine at a time).

