Hacker News new | comments | show | ask | jobs | submit login
GitTorrent: A Decentralized GitHub (printf.net)
1038 points by luu 663 days ago | hide | past | web | 169 comments | favorite

Lovely work!

We ask for the commit we want and connect to a node with BitTorrent, but once connected we conduct this Smart Protocol negotiation in an overlay connection on top of the BitTorrent wire protocol, in what’s called a BitTorrent Extension. Then the remote node makes us a packfile and tells us the hash of that packfile, and then we start downloading that packfile from it and any other nodes who are seeding it using Standard BitTorrent.

The disadvantage here is that any given file in the repo could be stored in an ever increasing number of packfiles. Each existing version of the repo will generate a new packfile to get it to the newest version, and it's up to the authenticated masters to generate and seed each of those packfiles, while peers either do or do not cache these replicated datas. In short, this means of syndicating updates ignores the Merkle-DAG-ness (DAG-osity?) of Git.

The un-updateability of torrents is something that seems to seriously limit it's use. There are a lot of interesting attempts to hack around this- LiveStreaming and Nightweb are two that spring to mind. https://www.tribler.org/StreamingExperiment/ https://sekao.net/nightweb/protocol.html

Check out IPFS (http://ipfs.io/)

You can use it for git repos essentially out of the box by uploading your repo.

It is made of content addressed chunks which will get re-used on each re-upload.

I just checked this out thanks to your link. Awesome. ipfs solves a whole class of problems - distributed git repos being just one of them. Great stuff.

I believe that these technologies are bound to wither not because of technical problems but because of "political" problems. These technologies are a huge problem with the regard to how intellectual property is managed today and how it is monetized.

Every IP owner will try to slow down the progress of these technologies, mainly by not adopting them. I think these technologies won't be adopted until the IP monetization problem is solved.

To me, this means that if the media are to be easily decentralized and shared, the monetization (how you get money for your work and how you give money when you benefit from it) must become equally easily decentralized and shared (between the publishers, authors, etc.).

Creative Commons cense is important, that is why. Research and development can scale expression if we look for new ideas before old fear. #CreativeCommonsCense? (#CCC?)

I agree it is really stellar. The billion dollar concept for me though is encrypted repo torrents. Imagine a group of servers that are hosting encrypted chunks which form the basis of a homomorphic encryption protocol for distribution using forward error correction to allow recovery of the deltas if n of m components of that delta can be recovered. Basically if you have the key you can pull out of this amorphous cloud your source code, and if you don't have the key you won't even know it is out there.

I started building a toy version of this about 5 years ago but got distracted by work. Essentially the repo key encrypted the packfile, the storage reliability layer used its another key to encrypt the chunks. The latter key would find the chunks, with enough reliability to re-create the encrypted packfile, which the latter key could decrypt and apply to your repo.

A very fun problem in distributed systems and data structures.

If I'm not mistaken, other than the Git specific aspects, this is more or less what Tahoe-LAFS strives to accomplish.


Unfortunately Tahoe-LAFS is something I've been looking at recently and it suffers from a few missing features which have been known for many years.

(The one that I want most is the ability to rebalance on the fly, as storage-hosts become full.)

Tahoe-LAFS also doesn't support deletion.

I've always thought this would be amazing for file storage in general. A Dropbox that just syncs with the cloud encrypting your content with a private key and you can withdraw it at any time. You simply give back say 5 x as much storage as you use for the data mirroring.

AFAIK this was the concept behind Wuala. Founded in Switzerland, later bought by French company LaCie. According to their website the clien side encryption is still there, however they discontinued the collaborative disk space business model.

We're trying to recreate that concept with peergos, https://github.com/ianopolous/Peergos we're considering trying to use ipfs, http://ipfs.io under the hood which would be nice. Wuala was great but they never open sourced it and stopped the storage earning in 2008.

Like http://storj.io/ ? I think they have a lot of code to write though.

Don't forget http://maidsafe.net/

Wow they are moving to Rust. It's going to be quite interesting

How would that work? There's no homomorphic encryption schemes anywhere near general enough for that, as far as I know.

That's why it's a fun problem.

> The un-updateability of torrents is something that seems to seriously limit it's use.

I am not involved with the development of torrents at all but (please bear with me until the end) my initial reaction is that we should think of the lack of ability of torrents to update as a feature and not as a bug.

Perhaps if ability of torrents to update is a concern then it warrants a new peer to peer protocol? (Please note that this is not the case of http://xkcd.com/927/?cmpid=pscau as I am not advocating a new protocol for every use case)

It seems like we can sign torrent files with gpg keys. Perhaps I am wrong. Perhaps, we can allow updating in torrents if we require that the updates be signed with the same private key as the original torrent? Am I barking up the right tree here?

Edit: Oops. I edited this post before I saw the reply about BEP-0039. Apologies.

> Perhaps if ability of torrents to update is a concern then it warrants a new peer to peer protocol?

There is a new peer to peer protocol, it's even an IETF draft, it's called PPSP and is full of nice stuff:



Do you imply that PPSP's live stream protocol could be used to create updateable keys?

Yes, if append-only is enough then a streaming protocol like PPSP is good.

Otherwise we need something else, which I hope to achieve in rakoshare (https://github.com/rakoo/rakoshare).

it's not append only since there's churn. There is not "delete" per se. Except maybe with the PPSP private network.

Updatability does not need to imply mutability. It could be possible to have a torrent with 50 files, and then the torrent is updated with a new file, and a client that has already downloaded the new torrent will only need to download one new file. And another client that still wants the old torrent would still get the same 50 files, but would be able to download from clients that have the new torrent. (Similarly, clients downloading the new torrent could download the first 50 files from clients seeding the old one.) That's updatability without mutability.

Follow-up question: does BitTorrent work this way currently? If I take your 50-file torrent, add one file and re-seed, will you be seeding the original 50 files in the same swarm as my 51 file torrent?

Every torrent have a hash id that identifies it, even if 2 torrents have the exact same files you are only seeding one of these hashes.

I regularly download mame updates by pointing the updated torrent at my ROMs directory. All changes are redownloaded, and new ROMs are grabbed.

Edit: just realized I misread the question. No. In my case, the previous mame romset swarm is usually abandoned, and the new torrent takes all the traffic. The swarms are unique.

What I described would be the ideal situation. I don't know of any existing P2P system that works like that, unless it's a system for sharing individual files instead of "torrents" containing multiple files.

So basically, git over IP? Each "update" is just a commit containing the modifications since the last one. If you want the most recent, ask for the HEAD.

The article goes into why that's not really an option. Making a separate request for each blob/file/commit/whatever would be way too slow for large repos.

I was more commenting that the very updatability you were discussing is pretty much directly analogous to what git is in the first place. So now we're building a system that has the same functions as git, but isn't git.

Support for mutable/updateable torrents was proposed in BEP-0039 (http://www.bittorrent.org/beps/bep_0039.html) and is the basis of how BitTorrent Sync functions.

Great to have. Alas it relies on a central system of record, is just a feed that can be re-fetched. The 'originator' key is probably worth standardizing around for any future approaches, but using PEX or the DHT to notify the most recent magnet uri or what not.

Simply signaling new Magnet URI's would have the disadvantage Gittorrent sought to avoid: resyndicating the entire contents with every single change: a killer for things like the Linux kernel or projects like Debtorrent. Git's merkle-dag avoids this problem, allows multiple concurrent versions to share the bulk of the content-indexing, and best-of-all-worlds solutions would preserve this capability.

bittorrent sync is actually very much like this. People can offer read only subscriptions to their repositories, and then everyone distributes that repository to everyone else bittorrent style. So you have multiple read-writers who can publish and update the repository and multiple readers who can just subscribe to it and help to distribute.

So an example where this technology could be put to a unique use: Minecraft streamers and lets players sometimes like to distribute the world they are using. So they could make a repository of their world and distribute the read only keys for it to other users. This would allow them to play it, even temporarily make changes because sync kicks in and refreshes it, and the repository would be kept current as the world progresses. That should be viable right now with Sync.

Problem is, I think the bittorrent foundation is doing their damnedest to keep themselves firmly planted in the distribution and ownership of the technology. So we won't see an explosion of third party clients. I don't think it will see much adoption for this reason, and that's a real shame, because it would be a wonderful bit of kit for the internet.

Peers don't have to seed packfiles the way we're used to for "traditional" seeding of movies or music; these packfiles actually represent transition from one commit to another (instead of "all content from beginning to now"), so they are inherently ephemeral. They don't even need to be kept on the disk, because they will be generated on the fly every time a client is interested.

The DAG-osity of git really helps here (because you only have to transfer what's really needed), and the "immutability" of git helps because if your project is popular and you update your branch, everyone will want to go from the old commit to the new commit, so everyone can share the diff between them directly.

I found this discussion interesting regarding conceptual approaches about how to handle partial torrent updates:

"Thinking about 'meta' torrent file format." https://gist.github.com/mait/8001883

Truly Meta - Meta: https://news.ycombinator.com/item?id=6920244

I had no idea torrents had that limitation. Could torrents be tagged with versions and then old ones could be removed from seeders?

> It surprised me that nothing like this seems to exist already in the decentralization community.

There was a GSoC project in 2013 which did exactly this, using Freenet as decentralized storage backend with a Web of Trust for Spam resistant and updatable identities (note that in the gittorrent scheme once someone claimed a username, that username will stick there forever). It works and compared to GitTorrent it adds anonymity and upload-and-run.

A current article describing it is here: http://draketo.de/english/freenet/real-life-infocalypse (it got referenced here, too: https://news.ycombinator.com/item?id=9562749 )

The GSoC project was done by Steve Dougherty: http://www.google-melange.com/gsoc/project/details/google/gs...

> I’d be happy to work on a project like this and make GitTorrent sit on top of it, so please let me know if you’re interested in helping with that.

Have a look at Gitocalypse: https://github.com/SeekingFor/gitocalypse

Major fan of this idea. But how does one address the GUI challenges presented by leaving GitHub behind? It can't be understated that GitHub provides an amazing (communal/social) user experience.

(Author here.)

You're right, of course. This is just a first step.

One interesting followup idea might be that the BitTorrent library I'm using, webtorrent, also works in browsers over WebRTC. But I'm not using that because I wouldn't know what to do with a git cloned repo inside of a browser tab. Maybe someone else will though. :)

GitHub provides: - Repo hosting, - Search, - Community

In comparison to decentralized Search and Community, decentralized file storage is easy. Conveniently, centralized repo hosting is the biggest problem. Not being able to Search / Comment / Report a bug during a DDOS decreases productivity, but not being able to push commits / run CI tools is a productivity halt.

The best next move, might be to focus on decentralized repository hosting, solve that well, and allow users to conveniently mirror the GitTorrent repos on GitHub. Giving the best of both worlds until Search and Community can also be solved well.

This may mean GitTorrent would need some form of post push hooks (i.e. to update mirrors or run CI). Which I'm sure is doable.

Hmm, decentralized search here should probably just use a Kademlia-like implementation (http://en.wikipedia.org/wiki/Kademlia) where a 'XOR metric' is used to measure distance between nodes. (That way the max number of lookups will be log2(n) where n is the number of nodes (with further optimizations possible))

Obviously someone would need to build a user-friendly interface for all that, etc.

Decentralized search can be done; Tribler has implemented it for the torrent network, in fact: https://www.tribler.org/ContentSearch/

What about using git notes (`man git-notes`) for tracking issues, comments, etc? They are stored as git objects (right?) and could be used for this task?

I thought about using git notes for this and didn't how it is better than adding issues, comments and wiki inside the repo itself. We are used to put documentation and tests alongside the code in our repo, why not add wiki and issues?

You can make an excellent case for that: this would require documentation and tests to be up-to-date before a commit would be accepted by whoever maintains the repo.

Perhaps approaching it the way (I assume) TPB does it, and have the comments section, etc, handled through BitTorrent?

Comments, issues, etc could also be objects that are updated through the protocol, and signed with a public key.

That's roughtly what I was thinking, yeah.

TPB is just a website and doesn't use bittorrent at all.

It's the same old problem of trying to build a peer-to-peer social network. How do you ensure that large files are distributed correctly and quickly with minimal security implications in an environment where nodes are constantly joining and leaving the network? Perhaps it's possible, but if it were an easy way of doing it, there would be more of that sort of thing around.

> How do you ensure that large files are distributed correctly and quickly with minimal security implications in an environment where nodes are constantly joining and leaving the network?

The owner of a project will keep a node always online to fight its own churn. There is no big files in the case of issue tracker. This is not video social network. Also, video social network do work, e.g. private trackers, the only thing is that they use the web publish magnet links which can be done over a DHT cf. PPSP and tribler.

Wouldn't it be the equivalent of creating a BitTorrent client? There are lots of attractive and functional clients for that.

I think they are talking about the issue tracking etc

I guess the projects that can't be on github won't mind the GUI challenges as long as they have some way to have a central repository without having to maintain a server on their own.

Do the same thing you did for the Gittorrent. A GUI client that runs a torrent of a php file that connects to a torrent of a database file. You just need to always be connecting to the latest and greatest.

Love this.

The post mentions using the blockchain for unique username registration and mapping to public key hashes, and as it turns out there's a project I and others have been working on that does exactly this called Blockstore.

Here's the link if anyone wants to check it out: https://github.com/namesystem/blockstore

The way it works is there's a mapping between a unique name and a hash in the blockchain, and then there's a mapping in a DHT from the hash to the data to be associated (which can be a plain old public key and can also be a JSON file that references a public key and other identity information).

(GitTorrent author here.)

That's great, thanks! I should just use this (preferably with the DHT I'm already using to look up Git commits) instead of reimplementing myself.

What do you think about the idea of making pluggable modules to connect Blockstore with web frameworks (Django, Rails), without the framework/website authors having to get involved in understanding Bitcoin themselves?

Oh I love that idea.

As far as Django, Blockstore is on Pypi so you can just install and import the library.

I think it'd be great to have modules for Rails, Node, etc.

I saw your tweet and I'll shoot you an email. Also feel free to open an issue on github.com/namesystem/blockstore and we can discuss the idea openly there.

The remaining hard part is the adapter layer to enable the extra applications such as the issue tracker to use the Git repository for storage. Joey Hess has a good article about Git "databranches" here: https://joeyh.name/blog/entry/databranches/

A very interesting idea, GitTorrent, but I have one question which comes to me whenever I read about a delta-based distribution scheme: who is going to generate and share all those deltas?

Some Linux distributions have experimented with delta-based package repositories, examples are deltup for Arch Linux and rpm-delta for RPM-based distros. Some of the known issues are:

- choosing the number and spacing between deltas. Fine-grained deltas require more storage space, coarse-grained deltas require more download bandwidth.

- retiring old deltas: periodically deleting all deltas older than a certain version, replacing them with the full package of that version. Again a trade-off between storage space and download bandwidth.

For Git repositories, this would roughly translate to:

- choosing the number, history spacing, and size of the Git packs per repository.

- retiring old Git packs: periodically deleting Git packs older than a particular revision, replacing them with a bare repository at that revision.

Git is built out of deltas. You're already storing all of them.

This is isn't quite how I would describe it. Git does do delta compression in packfiles, but the fundamental primitives lack deltas. It's just:

* The contents of this directory is this list of of files whose contents have these SHAs.

This is called a "tree".

The SHA of a tree is also an object, and can appear in another tree.

To see this for yourself, in any git repository run `git cat-file -p HEAD`. You'll see the (more or less) raw commit object for HEAD, which will point at a tree SHA. To see the contents of that tree-sha, run `git cat-file -p <the tree SHA>`. That tree object has a one-to-one correspondence with what you'll see on-disk in the objects directory, (if the object has not been put in a pack file).

Above I have more or less fully described the contents of the files found in `.git/objects`.

The delta'ing doesn't happen until later, if and when packfiles are constructed. But they're just a storage/bandwidth optimisation. AFAICT, these deltas have nothing to do with what you might think of as "git diff", which is just some fancy porcelain which looks at objects.

The nice property of the construction is that given a large tree, even if nested, if you change a single file in that tree, you will only change as many trees as the file is deep in the tree, so computing changes between two nearby trees can usually be done quickly.

The problem is not with deltas between revisions (i.e. the commits themselves) but with the Git packs spanning multiple revisions. At the time of a `git pull`, a user's repository can be at any revision between initial and latest. Who is going to seed (and keep seeding) packs for all of those possible revision intervals?

Conceptually, git is built out of snapshots.

My impression was that peers are generating the deltas on-the-fly based on which commits the requesting peer states it needs. The problem's therefore shoved onto git itself, with the seeder just cherry-picking a specific range of commits from its own copy of the repo and bundling them together.

Ah, okay, but that would imply that GitTorrent doesn't make any use of the swarming capability which makes BitTorrent special.

Not using swarming brings back all of the old problems of NAT traversal, asymmetric upload/download bandwidth, throttling, censorship etc.

Not quite: the peer generates the pack and tells you its hash, and then you query the network for anyone who has that hash (them, for starters), and perform a swarming download of it. So git clones of popular repositories would usually swarm.

The probability of swarming would be influenced by multiple factors, eg

* Higher popularity => More peers => Higher probability that multiple peers want the same packfiles.

* Higher popularity => More commits => More permutations of packfiles => Lower probability that multiple peers want the same packfiles (and stronger trends toward small/inefficient packfiles).

* More frequent synchronizations (peers always online) => More immediacy => Smaller packfiles => Higher probability that multiple peers want the same packfiles.

* Less frequent synchronizations (peers go offline regularly) => Less immediacy => Bigger packfiles => Lower probability that multiple peers want the same packfiles.

It would be really interesting to see how these competing pressures play-out (either by doing some math or randomized experiments).

If the main goal here is strictly decentralization (without concern for performance or availability[F1]), then one might look at swarming as a nice-to-have behavior which only happens in some favorable circumstances. However, by latching onto the "torrent" brand, I think you setup some expectations for swarming/performance/availability.

([F1] Availability: If Seed-1 recommends a certain packfile, then the only peer which is guaranteed to have that packfile is Seed-1 -- even if there are many seeds with a full git history. If Seed-1 goes offline while transmitting that packfile, how could a leech continue the download from Seed-2? The #seeds wouldn't intuitively describe the reliability of the swarm... unless one adds some special-case logic to recover from unresolvable packfiles.)


Could this be mitigated with some constraints on how peers delineate packfiles?

> Could this be mitigated with some constraints on how peers delineate packfiles?


Like so many here, you have a single view of how bittorrent should be used, based on current filesharing practices, so you believe we need to map gittorrent to filesharing and have those packfiles be as static as possible in order to be shared at large.

You need to go back to the root of the problem, which is simple: there is a resource you're interested in, and instead of getting this resource from a single machine and clog their DSL line, you want to get this resource from as many machines as possible to make better use of the network.

How does gittorrent work ?

- The project owner commits and updates a special key in the DHT that says "for this repo, HEAD is currently at 5fbfea8de70ddc686dafdd24b690893f98eb9475"

- You're interested in said repo, so you query the DHT and you know that HEAD is at 5fbfea8de70ddc686dafdd24b690893f98eb9475

- Now you ask each peer who have 5fbfea8de70ddc686dafdd24b690893f98eb9475 for their content

- Each peer builds the diff packfile and sends it through bittorrent. Technically it's another swarm with another infohash, but you don't care; it's only ephemeral anyway. The real swarm is 5fbfea8de70ddc686dafdd24b690893f98eb9475.

Because of this, higher popularity will mean more peers in the swarm, whatever the actual packfile to be exchanged is. Bittorrent the way you know it is not used as-is, because there is information specific to gittorrent that helps make a better use of it.

(Author here.)

Great comment, thank you. But I think the infohash should actually be shared, packfiles are pretty deterministic in practice. So you'd be getting the diff packfile from the person who just made it, and anyone else who already did.

(If I find packfile generation to not be deterministic enough, I think I'll switch to using a custom packfile generation that is always deterministic.)

Sweet as hell. My only quibble is the idea of using the blockchain for validated naming. I think it'd end up in a landgrab which is nasty.

As much as we all hate DNS, having the ability to kick a squatter off someone's name is probably a good thing. Personally I don't think having a crazy hash for identification is a bad thing. Rather, what you need to do is just have some sort of reasonable personal contact book so you only have to deal with the crazy hash once (when you decide you want to remember that someone is who they said they were).

These are just usernames, not trade marked domain names. What did you do when someone got to gmail/twitter/whatever before you did and picked the name you liked? You just picked a different name, or added some digits to the end of the one you liked. You could do the same thing here.

My experience has shown that having names be treated as unique identifiers is a hairy situation. And frankly, unique identifiers in a system for distributing source code are as important as domain names if not way more so.

Domain names only ever run code in an ostensibly sandboxed vm.

Unlike domain names they're immutable. So if it's good now, it's good as long as the private key is kept safe.

If you want to cross reference the identity with an email address you could use a keyserver. If you want to cross reference the identity with a domain name, you could use a TXT record.

Hmm. How about mixing and matching, like by having a DNS record with a special TXT record syntax, e.g. 'IN TXT "gt=yoursha256hash"'. That way you could find your destination via DNS and verify the hash in the TXT record against the blockchain.

Sounds like what ipfs guys are doing for naming:

> We plan to transition to hosting it on IPFS as soon as DNS -> IPNS -> IPFS naming proves robust, so we can use nice looking URLs.


I don't really understand the landgrab complaint: this costs $0.08 per username registered ($0.16 if you want to avoid races with miners), in comparison to centralized sites, which usually cost $0 per username registered. Why does this lose?

Centralization is about equivalent to ICANN being able to kick off a squatter. A truly decentralized system with first to register doesn't have a way to kick off squatters.

Well, the current ones don't, but it could. Decentralized networks just need decentralized decision, ie., you could have a process by which the majority of the nodes decide to kick off a squatter.

At which point someone uses a botnet to game the system.

It's a really good question and hopefully a good solution can be implemented. I always thought namecoin was onto something because whoever was willing to spend the most (run the most miners) got to control transactions, so market forces prevented excessive squatting.

Of course that has the downside that large entities grab all the good names and the little people are left with the leftovers.

Look at Namecoin, it is DNS via blockchain technology.


> There are philosophical reasons, too: GitHub is closed source, so we can’t make it better ourselves.

There is GitLab.

GitLab CEO here, thanks for mentioning us as the open source alternative. We think in the short term multiple organizations hosting their own GitLab is the way to go. It is hard to do issues and pull/merge requests in a decentralized way (the OP is impressive but it shows distributed git instead of distributed GitHub). I would like to see federated merge requests http://feedback.gitlab.com/forums/176466-general/suggestions...

It is hard to do issues and pull/merge requests in a decentralized way

...Over the web...

I mean, Git and Linux are developed via email lists, which is a decentralized way of sending pull/merge requests, isn't it? I guess you could argue the mailing list is hosted on a server, fine. So then fine, usenet.

Yeah, email and nntp are old crufty technologies and there are obviously advantages to having a web based interface and a central place to go for a project. But git itself certainly supports a decentralized mechanism for merging.

So then the question really is, how do you decentralize a web interface, isn't it? The dvcs itself isn't the problem.

Why would you want to have a web interface? The web stack is crufty as hell, and the web has the one major problem that it forces everyone to use the same UI for accessing a service. With email, for example, clueless users can get by with some simple webmail interface, but that does not prevent me from managing my mails efficiently with mutt. The standardization of UIs instead of protocls and data formats is one of the worst regressions in IT in recent years, IMO.

Wouldn't a decentralized web interface be webmail (i.e. Gmail)? Then along the Gmail theme, a plug in could make things look much like a static website. Where "conversation view" becomes a "repository view".

There's nothing decentralised about Gmail.

Yeah, decentralizing the web interface is hard. Maybe it is the wrong question. Maybe it is, how can you use usenet to get an overview of issues and do code reviews.

Not just federated merge requests, but federated one-click cloning. Given a "home" GitLab instance, you should be able to click a button on any GitLab repository (or on repositories elsewhere, based on some spec), and end up with a clone of that repository in your "home" instance.

That should be possible via browser support for site-based URL handlers; for browsers without such support, you could also have a URL on the main GitLab site that people can register with to redirect to their "home" instance.

Interesting idea. I think it is harder to do this than the federated merge requests because of the permissions, but I love the idea.

You'd be logged into your "home" instance, which gives you the permissions you need. You click a link on the remote site, which takes you (either directly via URL handler or indirectly via central site redirect) to your home instance with appropriate values filled in, and prompts you to confirm (to avoid CSRF).

That sounds like federated two-click cloning, not federated one-click cloning. Maybe you could batch up the confirmations and get federated 1.1-click cloning?

OK, that seems pretty straightforward.

By "site-based URL handler", do you mean https://developer.mozilla.org/en-US/docs/Web-based_protocol_...?

Yes, exactly. Use URLs like gitfork:https://... and register a handler for the "gitfork" scheme. (The site could also register a handler for git:// , but that won't help for http or https, and git:// isn't secure, so I'd suggest avoiding that.) As a backup, also provide links to some central domain that allows people to configure where they'd like to redirect it to.

There's Phabricator, too: http://phabricator.org/ (not a Github clone, though)

Agreed that the argument is a bit weak, but we would still end up with a major centralized repository for a decentralized protocol. And the changes to make Gitlab more like the proposal would probably be more work than just making the proposal a new project.

I think we would end up with multiple major repositories and many smaller ones. One of the first steps towards more decentralization would be an issue tracker that doesn't under numbers but hashes and that stores its content in a git repo (like the wiki already does). EDIT I noticed in the comments the Fossil already has this, interesting http://fossil-scm.org/index.html/tktview/3711fc7cfd21d7f8684...

Which is another centralized platform with some different philosophical issues.

I'd love something like this for fossil[1], which can also handle wiki and issues inside it's repos by default.

[1]: http://fossil-scm.org/index.html/doc/trunk/www/index.wiki

I need to go more in-depth on the proposal. But the first thing that strikes me, is if you're going to use the Blockchain (with a capital "B") as storage of usernames and such. Why not use namecoin? It has the process for name consensus down. Also it won't pollute the main Bitcoin blockchain.

Hi, author here.

I have a mild bias against altcoins, and have heard bad things about Namecoin in particular: that the anti-spam incentives aren't good, leading to illegal files stored in the blockchain itself, and that there's no compact representation (like Bitcoin's Simplified Payment Verification) for determining whether a claimed name is valid without consulting a full history.

As I understand it, these two design flaws combine to mean that you have to store some very illegal files to use a namecoin resolver, which doesn't sound good to me. (I may be mistaken, since the bad things I heard about Namecoin came from Bitcoin people..)

There aren't any inherent protections to storing illegal data in the Bitcoin blockchain, either (short of evading the issue by using a thin client, I suppose). The nature of what is illegal being so encompassing, in that you can ultimately encode the information into some integer of sorts either way, means it's impractical or unnecessary to provide full protection for such things.

No inherent protections, just different incentives. Storing a 4MB image at 80 bytes per $0.08 OP_RETURN transaction would cost you $4000. Why would you do that when you can use namecoin for practically free?

Which makes the statement you made that, "this currency stuff is frankly kind of uninteresting to me," kind of funny (I assume that was on purpose) because the currency stuff is exactly the thing that makes bitcoin work so well.

Fair point!

I made that statement because a lot of people don't realize that Bitcoin, ignoring the fact that you can trade it, has solved a fundamental consensus problem in distributed systems that we should care about and use :)

> I have a mild bias against altcoins, and have heard bad things about Namecoin in particular: that the anti-spam incentives aren't good, leading to illegal files stored in the blockchain itself, and that there's no compact representation (like Bitcoin's Simplified Payment Verification) for determining whether a claimed name is valid without consulting a full history.

Security-wise Namecoin is a weaker Blockchain, but I think in this case it's not that important. -- Are not the anti-spam measures hurting users as much as spammers in this case? Since you end up with higher-costs for what is otherwise practically free with a centralized service.

From the looks of it there is no reason that Bitcoin's Simplified Payment Verification wouldn't be usable with Namecoin either.

edit: Also this may be interesting to take a look at https://people.csail.mit.edu/nickolai/papers/vandenhooff-ver...

The notion that a file can be illegal makes as much sense to the internet as a plant being illegal makes to the earth.

Especially when it has the side effect of inhibiting what otherwise might be a compelling solution.

Right, and the notion that murder is illegal makes no sense to the universe as humans are "merely" an aggregate of matter no different than any other. Nothing makes any sense to entities not capable of reasoning, that is a pointless tautology with no relevance to the discussion.

Your analogy doesn't hold up, as (for one) we have society and plants don't. I'm glad that child porn is illegal (which is usually distributed in files), as it makes the world safer for children.

Also: some plants are serious jerks.

If pi goes on forever, it contains any and all representations of child porn that could exist. The first person to calculate pi that far will be breaking the law. No, that doesn't make sense, nor does any sort of information being illegal. What does make sense is to legislate the actions that can generate that data, or the misappropriation of that data.

You took a poor example since pi is not known to be a normal number (see eg http://mathworld.wolfram.com/NormalNumber.html )

This number however is:

    0.0 1 10 11 100 101 110 111 1000 ...
This number contains all movies that have and will ever exist. However to specify where a 700mb harry potter movie is in this number, you'll need at least 700mb to represent the index. So in some sense harry potter 'exists' in this number - but in another sense its just a silly encoding method.

Hear hear. This is an easy-to-understand and concise thought experiment; use it often!

Great point !

no, it's a really contrived point.

Most law enforcement and judges don't ask what makes sense to the internet, even if you ignore users personal opinions about the matter.

Forgive me if my comment too readily reveals my ignorance on these topics, but would the IPFS project [0] be useful for this?

[0] http://ipfs.io

Definitely, checkout their example on how to host git on ipfs[0]

[0] http://gateway.ipfs.io/ipfs/QmTkzDwWqPbnAh5YiV5VwcTLnGdwSNsN...

While integration between git and ipfs is good -- I'm not sure I see how this is much different from just tar-ing up a git-archive and creating a torrent/magnet (setting aside many of the other aspects of how ipfs and torrents work).

Decentralized git pull for only a given hash isn't all that interesting?

If one could pull in updates and/or push changes -- that would be "decentralized git". This is example is more "ipfs.io as a transport for git-releases", rather than "ipfs.io as a transport for git"?

There is this script I worked on to help doing that:


Similar to Gitchain http://gitchain.org/ ?

(Author here.)

Yeah! I was a Gitchain backer. The difference is that Gitchain stored the actual git commits in the blockchain, and I leave the actual commits on the hard disks of each BitTorrent seeder.

Stored github in btsync for a startup before. It actually worked ok. We stopped doing it because btsync was trivial to crash with basic fuzzing and was closed source

This is an aside but I like the author's writing style. Not only is he clear but also describes why he's doing what he's doing and how it's important to him and others. Really helps me give the ideas more thought!

> Google Code announced its shutdown a few months ago, and their rationale was explicitly along the lines of “everyone’s using GitHub anyway, so we don’t need to exist anymore”.

I mean, sorta. It was also because running a service is expensive, and containing abuse is a constant thankless treadmill.

> We’re quickly heading towards a single central service for all of the world’s source code.

Far from it? Not that a fully-decentralized system seems bad, but there are many things that aren't github. I don't even have anything of interest on github.

Can I clone GitTorrent with GitTorrent?

Yes, it works.

npm install gittorrent

git clone gittorrent://github.com/cjb/gittorrent

Or for true decentralization (doesn't get the wanted sha1 from github.com):


It works! Very cool.

I got fatal: Unable to find remote helper for 'gittorrent'

How about "sudo npm install -g gittorrent"? It looks like the git-remote-gittorrent binary isn't ending up in your $PATH.

Or without polluting global namespace:

export PATH="$PATH:$HOME/node_modules/.bin"

I have to admit that I started reading this post feeling a little snarky about the concept. However, I think Chris makes an excellent case for the concept.

How is the new value for a mutable key propagated through the DHT? (as in how sure am I to get the newest commit hash?)

Haha, that reminds me of my comment earlier: https://news.ycombinator.com/item?id=9578307

And plenty of people schooling me that git is already distributed (git yes, github no). I am happy that someone is working on this.

This is awesome, but it centralizes on JavaScript.

It is an implementation of a standard without the standard being defined so other implementations can spring up.

Git, one could argue, is language centralized also, which is technically true. That I don't have an answer for. But I don't believe handing off so much dependence to a JavaScript application fits for me.

A C/C++ application like Git I can overlook, at least for a decade or so, but JavaScript feels like a perpetual beta/prototype only. Granted, that's my subjective feeling.

Raised issue: https://github.com/cjb/GitTorrent/issues/12

The word "centralize" in the way you're using it is very misleading. Every piece of programming in existence uses some sort of language.

What you're really stating, I think, is that this is written in Javascript and you don't like Javascript. That's totally fine, and it's your prerogative. I'm sure that, like any other piece of programming, GitTorrent can be rewritten in other languages. If Javascript bothers you that much, then do this: rewrite it in Ruby if that's what you prefer.

But please stop attacking the claim that this is "decentralised github" by claiming that this "centralises" on Javascript. It doesn't "centralise" on any language. It's just a first implementation written in Javascript.

That this implementation being uncomfortable for me is my subjective feeling, I've mentioned several times. That I don't want to depend on JavaScript is clear. I don't want to depend on JavaScript. Using the word centralize the way I have, I can't see otherwise. But I'm not defaming anyone. I'm not attacking anything. I'm very motivated to help the standard advance, but I would never for a moment install this implementation. Stupid of me? Sure, maybe. But it's not an attack.

What I wanted I already got, confirmation by the OP that this is in fact a reference implementation and not the all-end-all.

Were that clear to start with, I wouldn't have commented other than to say, awesome! This is extremely exciting to me.

I could see this integrating really nicely with mailing lists. The commit could be sent out to the mailing list, and anyone who is interested in reviewing the code or doing a merge would already have the information they need, no blockchain required.

If a version of this with friendly name support is released, I will mirror all my active GitHub repositories there.

If someone builds on this, as discussed elsewhere in the thread, to make a decentralized service that mimics 'social' functionality such as issues and pull requests, I will strongly consider using it instead of GitHub (depending on the UI, stability, etc.).

I don't even have any particularly popular repos, so there is no real reason for anyone to care about the above, but, y'know, HN comments approving of the idea don't necessarily translate into actual interest in the product, so now you know there's at least one person in the latter category. :)

I always thought Github was open source and that you could fork it, like in the spirit of Github. I find that slightly ironic =)

What's stopping a decentralized github from ending up in the same fate as the newsgroups, that the data set gets too large to handle !?

I think that Github is more then just a repository, it's a community. I kinda quit Facebook and signed up to Github instead :P And if it weren't for Github I would have never touched git.

Startup idea: Create something like Github and assembly.com, but with a complete tool-set (git+vps)

Well done.

I think this is a great piece to build a "pay for branch merging" way of promoting open source.. use decentralized currency and voila, automated programming.

What I miss in the article or discussion is consistency of refs. The moment your remote is a distributed one, it may have two different values for HEAD in single point of time in different parts of network. If two clients push in this time, HEADs diverge even further.

So as in distributed databases, you either:

* need to acquire exclusive lock on the repository metadata, or

* accept, that your push will be eventually discarded because you did not have up to date metadata

There is also Gogs: https://github.com/gogits/gogs

decentralized web interface? Isn it better to develop native, cross platform app for that?

We have both electron and NW.js now, that should be easy.

I like the idea of decentralized, but I'm not sure we need p2p as well.

It'd be interesting to see something like gitlab to have some sort of federation support; where multiple instances can talk to one another (a la xmpp/smtp), so as to clone/send pull-requests, etc across different instances.

We would love to see features like that in GitLab http://feedback.gitlab.com/forums/176466-general/suggestions...

I don't think this is new. http://code.google.com/p/gittorrent/

I don't understand why so many people are in awe of this project when it seems to be based on a number of falsehoods:

> "imagine someone arguing that we can do without BitTorrent because we have FTP. We would not advocate replacing BitTorrent with FTP, and the suggestion doesn’t even make sense! First — there’s no index of which hosts have which files in FTP, so we wouldn’t know where to look for anything."

Actually there is. It's called a mirror list. Most FTP-based repositories support this.

> "And second — even if we knew who owned copies of the file we wanted, those computers aren’t going to be running an anonymous FTP server."

Except bittorrent does turn your client into a server. Many clients silently punch holes in your firewall via uPnP, so you don't always realise you're running a server, but it does still happen.

And as for anonymous FTP servers, it depends on what you mean there. If you mean anonymous access, then that's not only supported, but actually the norm. If you mean the server itself is anonymous, then it should be noted that neither github nor torrent seeding peers are anonymous either.

> "Just like Git, FTP doesn’t turn clients into servers in the way that a peer-to-peer protocol does. So that’s why Git isn’t already the decentralized GitHub — you don’t know where anything’s stored, and even if you did, those machines aren’t running Git servers that you’re allowed to talk to. I think we can fix that."

Hang on, a moment ago you _didn't_ want to run servers. Now you're complaining that git clones aren't servers?


Then there's the matter of the github competitors, of which there are many. gitlab, gitbucket, etc. Some open source, some closed but free, but all of them largely offer the same features as github.

These days it seems trendy to use bittorrent as a bootstrap for all kinds of wacky and wonderful problems, but using bittorrent for a protocol that's already distributed and already pretty saturated with github competitors; well it just seems redundant.

I think your complaints are disingenuous. [edit: what I mean is that you've pointed out problems with small components of the overall article and used them as an argument against the article as a whole, which I think is unfair. I don't think the components' validity affects the overall idea.]

His argument may have some holes, and people are probably mostly ignorant of those holes so they can't critique them, but I don't think they're in awe of the idea because their awe hasn't been dispelled by identifying those holes.

This is a very neat idea, and none of the issues with the setup argument dispel that.

"using bittorrent for a protocol that's already distributed and already pretty saturated with github competitors; well it just seems redundant."

no, no, it doesn't seem redundant. DHT based distributed indexing is so incredibly fundamentally different than a mirror list for files in FTP or from a series of GitHub clones. It's owner-agnostic. It just exists, by virtue of having participants, with no overhead. I don't have to select a target host or find a server or even identify where my particular file (or git repository, or username, or whatever) lives. It's .. unification. It's elegant and reduces complexity and makes the whole ecosystem more simple.

Maybe I'm blinded by my own awe, but, I love this idea.

Also, yes, some people have thought of components of this before, but I haven't really seen the full stack laid out vertically like this, combined with a narrative that makes me so excited about it.

Thank you for taking the time to respond to my comments :)

There's a few more concerns I have:

1) The whole point of git is versioning, having a model like this breaks makes versioning several orders of magnitude harder.

2) If I'm pulling from repositories to install on servers, I'd rather grab them from known trusted sources rather than "the anonymous ether"

I'm normally really receptive to new distribution models, so I don't mean to be negative for negatives sake. But I'm struggling to see the practical upsides of this.

The "anonymous ether" isn't dangerous if you have some way of verifying what they're sending you, and with bittorrent, you do, since you request the content using its hash.

bittorrent hashes as susceptible to collision attacks.

Collision attacks are not really a problem, since they only happen when the attacker gets to specify the hash, which wouldn't be the case here.

Generating a file that hashes to an existing hash is called a Preimage attack, and SHA-1 (the algorithm used by bittorrent) isn't, for now and as far as we know, vulnerable to any.

SHA1 is vulnerable to it, but you're right that ive drastically overestimated the practicality of a preimage attack. Thanks for the correction :)

Why do you say that SHA1 is vulnerable to second preimage attacks? Zero have been found.

Because anything that's vulnerable to collision attacks is theoretically vulnerable to preimage attacks. Where I went wrong was assuming that preimage attacks were practical, but as you've rightly said, there's been no known exploits because of their extreme difficulty.

So it's one of those situations where everyone was right: it's so impractical to exploit that it's as good as not vulnerable even though it's mathematically possible.

Git and bittorrent use exactly the same hash format (sha1); bittorrent is no more susceptible than Git.

They use hashes for different features though. Ie You don't pull chunks anonymously with git

Well, thats a winning name, to be certain. Pretty great work too.

Interesting, I'm also very interested in the other project mentioned "bugs everywhere." Anyone have more details on day to day usage of it?

Great. Centralism and monopoly in any form makes me uneasy. This is another point for freedom...

I'm very impressed by this, and I'm a bitcoin skeptic and git usability critic.

Just awesome idea. The idea of having a torrent for git repos blew my mind.

Git itself is already distributed. Do we really need more distribution or do we simply need to learn to work more distributed?

It's clear, though, that there are aspects of 'distributed' that Git does not accomplish. This gives you a global namespace and a global identity, just like a website, without a single point of failure or a corporate interest, and with a closed source system and a trivial option for forking the whole structure into your own [friend group|company|private cluster|etc].

An argument could be made that git ought to be augmented with something like this: why run distributed protocol A on distributed protocol B; maybe we should just run distributed protocol AB from the get-go?



Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact