Hacker News new | comments | ask | show | jobs | submit login
Git-annex assistant: Like DropBox, but with your own cloud (kickstarter.com)
174 points by urza on Aug 15, 2012 | hide | past | web | favorite | 51 comments

Joey blogs his progress on git-annex assistant in detail at http://git-annex.branchable.com/design/assistant/blog/

Here's a post from his personal blog that hints at how he can afford to work on this project for a year on only $20,000- http://joeyh.name/blog/entry/notes_for_a_caretaker/

I love it. Back in 1999-2001 I used to live in a shack on top of a mountain in Boone, NC which is less than an hour from Bristol.

Our network was driven by dual US Robotics courier modems bridged until the local phone company gave us DSL (we were located close to the switching facility).

There were 3-4 of us devs living there. We cut our own wood, got our water from a natural spring, drank lots of coffee and local beer, and coded 24/7.

Its there where I started to really grow as a dev because there was little distraction and the enviro was very inspiring.

My goal is to get back to the mountain (move my family) and focus on writing good code.

That sounds wild, you should blog about this if you haven't, I'm sure many here would love reading more :)

Thanks guys - I think I will blog about it - that'd be cool.

You sir, were living a dream!

I'm coding more than ever - hence why I want to move back.

My family just visited the 'cabin' last month while vacationing in NC. Since then my wife and are are dead set on returning to the mountain.

For me - this life is crazy busy. Constant 'suits' wanting new things built and never fast enough - and never able to understand the amount of work it takes to make 'that magic button'. Seems apetites are insatiable. That's fine as its job security - work is work afterall.

However, it seems that living in a tranquil environment helps run this marathon.

That's awesome. I was doing similar 1.5 hours away at the time, and it can be hard to find local tech folks; wish I'd known about you. :)

So apparently he lives in the woods in a solar powered shack. I guess that would keep costs down.

http://joeyh.name/blog/entry/solar_day_2/ http://joeyh.name/blog/entry/solar_year/

A solar powered 3-bedroom split-level house, but yeah. ;)

My favorite part of this is the slow Internet connection, which tells me that git annex will be optimized for poor connections. :)

Similar to athletes training in mountainous regions

Check once a day (especially on windy or stormy days) that the fridge is lit, by touching the black pipe sticking above it on the back. That's its chimney.


link for people using real computers. :) http://en.wikipedia.org/wiki/Absorption_refrigerator

Very cool, but I suspect that making it user-friendly and slick will end up being much more difficult than implementing the actual functionality was.

Edit: I have another question/concern, does the implementation in Haskell mean that the end product will have a runtime dependency on GHC or the Haskell Platform?

No, ghc produces statically linked binaries by default (-dynamic if you don't like that).

A quick check of the git-annex package in Ubuntu shows that you're right, no runtime Haskell dependency. I can only assume that the same will be true for the user-friendly "assistant". Thanks!

This is true, but then those static binaries tend to be very big (50+ mb).

50M!? 5 would be more likely, assuming stripping. As opposed to 5k with dyn libs.

I'm up to 15 mb :/

Yeah, I am concerned by the plan to create a web front-end for managing files in one month, it seems... optimistic at best.

The web frontend is functional.. took 2 weeks. Yesod rules.

(Still many weeks more polish and additional functionality, but kickstarter funded me for a whole year!)

I would say never under-estimate the amount of time polish takes - that last 10% of tweaking tiny details always takes a long time (or as the oft-repeated saying goes, the first 90% of work takes the first 90% of time, the last 10% of work takes the other 90% of time).

Good to hear progress is promising though :)

This seems quite similar to SparkleShare: http://sparkleshare.org/

Edit: Oops, SparkleShare is mentioned on the page. Looks like this will work better with larger files. He does mention that SparkleShare is a GUI and not 'just a folder', although in my experience it is 'just a folder', like Dropbox.

This sounds similar in practice (not implementation) to AeroFS [1] which is a peer to peer syncing service. Just make one of your peers something cloud-based and reliable.

Very interesting space!

[1] https://www.aerofs.com/

I'd love to hear Joey's answer on this question: why Haskell?

He had said previously:

Joeyh: "One (git-annex) is a large-ish, serious work, and I have been very pleased with how haskell has made it better, even though there was a learning curve (took me two weeks to write the first prototype, which I could have probably dashed off in perl in two days), and even though I have occasionally been blocked by the type system or something and had to do more work.

One concrete thing I've noticed is that this is the only program where I have listed every single bug I fixed in the changelog -- because there have been so few, it's really a notable change to fix one!"

Joey about the reusability of code: http://markmail.org/thread/nurcqm4yotgkmhbr

rsync.net will fully support Git-annex (it was brought to our attention a month or so ago by our friend Jason Scott).

We're excited about this project.

At first glance, OwnCloud seems to be similar:


(That said, git-annex assistant looks very cool.)

Yes they both looks good. git-annex has all the advantages of git, but is lacking windows support, whereas ownCloud is has better multi-OS support, because thanks to Qt it runs on Win/Mac/Linux, but uses csync as synch algorithm..

git-annex doesnt currently support windows, but changing that is on the roadmap provided on kickstarter

This is similar to the features Opera launched in 2009


I didn't look closely at the kickstarter page but I use gitdocs[1] which is basically works like dropbox but syncs to a git repository. It works ok for my (light) usage.

[1] https://github.com/bazaarlabs/gitdocs

[edit] It looks like git-annex solves the large binary blob problem with git (which I don't think gitdocs does) so maybe they could be integrated?

Nothing is ever a new idea, it's all about implementation and marketing. It will be interesting to see how git-annex assistant compares to the other products linked here when it's finished.

This seems to do what Sparkleshare does, except Sparkleshare uses Github - http://sparkleshare.org/

i'm not sure if git is the right thing to choose to version big binary files.

The project is based on git-annex, which is an extension that treats big binary files differently. Namely, it doesn't check in the file contents, so you don't get full-file versioning. You can find out more at http://git-annex.branchable.com/.

Technically you can use the SHA backend to git-annex, so the actual file contents can be tracked by git, giving you "full file versioning". It's just not checked into git.

Additionally there is a bup backend target for git-annex. Bup is targetted as a rsync-like backup service that can do incremental backups and ought work OK with large binary files. https://github.com/apenwarr/bup/

It was through Bup that I originally discovered git-annex.

hm ok, looks interesting, i'll have to read more about it's internals, thanks.

git-annex allows managing files with git, without checking the file contents into git. While that may seem paradoxical, it is useful when dealing with files larger than git can currently easily handle, whether due to limitations in memory, time, or disk space.

-- http://git-annex.branchable.com/

ok read, what's the advantage of using this over rsync? it's git + haskell + rsync, loads of dependencies, to make git do something that it's not designed to do. or am i missing something crucial?

If you're using git for revision control, it lets you "version" files without having all the historical versions locally.

Example: You check in bigencryptedfile.big, which is 100Mb. Then you modify it, and check it in again. Repeat 8 more times.

In git with and without git-annex, you can check out the repository at any time and end up with the local files from that checkout.

In normal git, your local repository is now a gigabyte (the encryption in this hypothetical file prevents git from being able to delta-compress; in reality git would likely be able to compress it somewhat, but it still may be hundreds of megabytes).

With git-annex, all the previous copies are stored on the SERVER, but not in git itself. Even if you don't care about local hard drive space since hard drives are cheap, consider that if I then clone your repository, I would only need to download 100Mb instead of 1Gb. The downside, of course, is that you need to be connected to the server to get historical versions of a particular file.

When you're dealing with, say, a game project with 20Gb of binary data that's been versioned 20x on average, you end up with 400Gb to clone your art repository, which is a non-trivial download size. And if, for some reason, you want your repository cloned to multiple folders on your drive, then again you're using 10Gb each instead of 400Gb each. Even on cheap hard drives, multiple folders of 400Gb each adds up quickly.

EDIT: OH, and one other advantage of doing it this way: If you just use rsync, and you accidentally overwrite a file and don't notice for a day or two, rsync will happily destroy your backup file as well, while git-annex will just store a new revision. Should have thought of that first. ;)

git-annex keeps track of what file is where, including any duplicate copies you wish to keep on other storage mediums. If I want some file I archived, git-annex will tell me which external disk it's on (and it can do S3 and some other online storage mediums too). rsync keeps track of nothing between runs.

I find it really useful for archiving large files - an entirely different use case than git.

It also keeps the hash of the files, so you can verify their integrity even without comparing to a different copy.

Why? What is?

because your local history is bound to get really big in a short period of time?

what is? well, I don't really think anything is. it works with svn and dropbox, but that doesn't mean that is a good choice either.

obviously git etc. is mainly designed for text files. i've long thought about what the right way to approach this issue is.

edit: will read more about annex

Some of the commercial version-control systems handle big binaries reasonably well. That's one reason many game companies, for example, use Perforce, since it doesn't choke on piles of art assets.

Maybe this is really juvenile, but does the logo look like a bladder to anyone else?

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact