Hacker News new | comments | ask | show | jobs | submit login
GitTorrent: A Decentralized GitHub (2015) (printf.net)
402 points by samatman 8 months ago | hide | past | web | favorite | 89 comments

Hi! I started this project in 2015, using Bitcoin and BitTorrent -- since then, I think Ethereum and IPFS have become better substrates for this sort of work.

And, I think it remains the case that the difficult part of decentralizing GitHub is not working out how to share repositories using p2p bandwidth, but instead working out how to use p2p for the social features of the site, like comments and pull requests.

Secure Scuttlebutt has some potential there, but as a gossip protocol it doesn't have a way for people you don't know/follow to e.g. send you a PR.

Gitea, Gogs, and Gitlab all have social-federation-related feature requests open right now. Not sure where they will land, but some are mentioning ActivityPub.

Cool! That is really interesting. ActivityPub would be great for this. I've added in links to the issues below:


- https://github.com/go-gitea/gitea/issues/1612

- https://github.com/go-gitea/gitea/issues/184


- https://github.com/gogs/gogs/issues/4437


- https://gitlab.com/gitlab-org/gitlab-ce/issues/44486

- https://gitlab.com/gitlab-org/gitlab-ee/issues/4517

edit: added gitlab enterprise link (plus formatting)

ActivityPub is indeed both simple and extensible. Perfect for issues and comments, also PRs.

For the actual code behind PR I would just setup a git daemon and reference own repo. With appropriate CORS the code can even be cloned in a browser.

And (maybe as a result of this discussion.. who knows) this has just been created:


Crazy, I was learning about ActivityPub yesterday. The vocab could theoretically support this over torrents, too (although I haven't yet thought through how it would work as a protocol):

        "@context": {
            "@vocab": "https://www.w3.org/ns/activitystreams",
            "git": "https://gittorrent.org/ns/"
        "type": "git:PullRequest",
        "to": ["gittorrent://cjb/gittorrent#master"],
        "git:from": "gittorrent://zamalek/gittorrent#feature/activitystreams"

Regarding Git over Scuttlebutt, there's this apparently:


though I just stumbled upon it, no idea how it works and what are the features vs. limitations.

@cjbprime excellent work! We've been using a timegraph CRDT with signed inserts to handle non-web of trusted individuals to send or notify a user of something here at gun. I'd love to be able to chat with you more about your work and possible future solutions - shoot me an email (check my profile)?

What are your thoughts on this project: https://github.com/axic/mango

Looks like it's abandoned, but it's a start at using ETH and IPFS.

@cjbprime do you have any thoughts on this Git/Ethereum/IPFS combo? https://news.ycombinator.com/item?id=17238414

I would just like to mention, that in my opinion "decentralized GitHub" is an oxymoron, Github's innate feature is being centralized, to centralize developers into one place, being a kind of social network.

Decentralization is a scale. Complete decentralization is undesirable, you just want to give small groups the ability to be autonomous (however big they want to become).

But there are decentralised social networks (e.g. Mastodon). What are your thoughts on those?

And who uses mastodon? I have never heard of a single friend even mention mastodon. ever. Decentralised might be utopia for us, the hackers, the techies... centralised however is for the masses. It's simply easier for them to be where their friends are.

Mastodon is doing decent in Japan - with well over 500,000 users. Thanks to Pixiv hosting pawoo.net. But if you do not have Japanese friends then it is pretty unlikely you have friends using Mastodon - as it isn't as popular in the U.S (55k~ users). Disregarding all servers with <1,000 users.

If you have a tight knit group of friends getting everyone to swap over to Mastodon (and slowly bring their friends) isn't that difficult. That's how the social effect works whenever there's a max exodus of users from Platform A --> Platform B (like what happened with Myspace-->Facebook, Digg-->Reddit, etc.). It isn't impossible - just unlikely.

You can see some statistics here: https://mnm.social/

I don't remember where I first read it, but I liked this opinion: in the future, decentralization, even if only adopted by a techie minority, will serve as a threat of what could be, thus keeping the centralized powers in check.

Full technical and governance decentralization is an utopia based on computer science fundamentals. We need to start thinking about the constraints instead of just jumping to the decentralization bandwagon as if was a matter of choice.

Version control is a tool made for "us, the hackers, the techies", the ideal target population. Openness and decentralisation made git the tool it is.

GitHub is a center for activities: PRs, issues, notifications. It's only a matter of convenience. To be successful you have to create an equally (or more) convenient decentralised alternative.

If GitLab or any of the other self-hostable services implements ActivityPub, then the "utopia" will be available to all users and instances.

Decentralized social networks could easily be for the masses. The main thing keeping people on centralized social networks instead of decentralized ones is that almost everyone else is currently on the centralized ones. That could change.

The phone network is federated (even if managed only by big corps), yet it has plenty of users.

Yes, but i cannot start my own network and join others ad-hock and move around as i please. I'm in multi year contracts, heck i couldn't even transfer my number until few years ago, and it's still a pain in the ass to do so.


Do ad hock networks lead to fetlock-in?

A hock is a joint in the lower end of a quadruped's hind leg, between the knee and fetlock. It's in analogous position to the ankle in humans.

Of course 'hock' can also mean 'to sell, especially to a pawn broker'.

The word you want is 'hoc' which is Latin for 'this'. You may be familiar with the 'post hoc' fallacy: which is that if event A is followed by event B, event B must therefore be caused by event A. The full name of the fallacy is 'post hoc, ergo propter hoc', which means 'after this, therefore because this'.

The latin word 'ad' simply mean 'to' or 'for'. So 'ad astra' means 'to the stars', an 'ad hominem' argument is to literally direct your argument 'to the person' and their faults.

So finally we arrive at the combination 'ad hoc', which means 'to the purpose'. Ad hoc things are done in purely pragmatic manner, for a particular purpose. They are often short lived arrangements which end once their purpose has been meet or is made moot.

Oh, and you didn't make this mistake, but I'm going to blather about it anyway. 'etc.' is short for 'et cetera', it is not spelled 'ect.', nor 'excetera', nor is the usage 'and etc.' valid. 'et cetera' means literally 'and the rest'--meaning anything, not necessarily just the Professor and Maryanne.

ooh, TIL! Thanks!

The ad-hoc I was reffering was : https://en.wikipedia.org/wiki/Wireless_ad_hoc_network

And I am aware of "etc." ;)

Me neither... it could be partially the name though. It sounds like a condom brand :)

The masses use email.

And they're 99% on Gmail, Hotmail, Yahoo Mail, probably a handful of other big email providers.

But why does it matter if a large portion of the users are leveraging a few big providers? The important aspect is that I have the ability of picking a small provider (or even starting my own) and still be able to interact with all of them.

Here's one use-case: where I work, IT is extremely bureaucratic, and it's unbelievably difficult to get something like a Gitlab server up. It would be a huge boon to us if there were some lightweight decentralized program the dev team could use to imitate Github/Gitlab without having to negotiate with IT for a central server (and the thousands of pages/decades of man-hours that entails).

The social aspect of GitHub is centralized but it doesn't means the data in itself can't be decentralized.

Do you believe internally Github is a single big server? And that big server only have a single HDD, a single chip of ram and a single core? Even then I'm pretty sure we could argue there's some separation there. It doesn't means the interface isn't centralized.

Same goes for Bitcoin. We can all access the same data, yet it's decentralized.

You could just create a centralised interface to GitTorrent, just being a simple view, the only difference is that anyone cold host that interface.

Decentralised Centralised Decentralised Version Control System. What fun.

I think he's trying to decentralize the centralized part of git. Makes sense?

There is no centralised part of git. He's trying to decentralise Git- Hub, which is a centralised website that has many feautres, issues, projects, etc. + acts as a hosted Git remote.

Git != Github

There is no Hub in the article.

The problem with Git decentralization is that you need a publically accessible endpoint (meaning there is an eventual central host). Even when you are using email for patch files, your email becomes the central point of failure (even if you host that yourself). So far as fetch goes, this is why git.kernel.org exists.

Git behaves like an mp3-sharing website with multiple mirrors, where-as gittorrent behaves like magnet URLs. This protocol effectively obsolesces the requirement for remotes.

> There is no Hub in the article.

Not sure what you mean by this. The article mentions and compares itself to Github throughout.

> The problem with Git decentralization is that you need a publically accessible endpoint (meaning there is an eventual central host).

I disagree slightly here. It's completely possible to build a network of remotes that don't all reference a single central origin (e.g. teammates referencing eachothers' local repos over authenticated connections, possible on a LAN, etc.). This gets messy and is hard to administer securely, but Git is more than capable. Also, the challenges of doing this Gittorrent securely with private repos seem similar to those of using SSH remotes between individual dev machines.

In terms of the article talking about "hubs" (Github specifically), Gittorrent purports to replace the need for Github, but only fills one of the uses of a hub. A hub like Github serves two purposes:

1. a central origin, i.e. the same use git.kernel.org serves

2. a searchable/discoverable network, e.g. the same use the central thepiratebay.org search engine serves for Torrents. Decentralising this seems like a hard problem.

I was thinking the exact same thing. Ironic.

Putting the "Hub" in "git".

So, I think there's a couple relevant issues when talking about decentralizing a github like service. There a couple of things Github provides, and my naive approach to making GitHub less essential would be to address them one by one:

* Issue tracking. It's a bit hard to export these. I imagine it'd be a bit nicer to export if issues were actually stored inside git as trees. You could easily make the issues frontend use git as the storage backend. If users don't have an account, maybe they could issue a pull-request to the issue tracker?

* Notifications. This ones's a bit harder. Web-hooks?

* Merge requests - github uses refs internally (I think). That wouldn't be too hard to standardize. If somebody adds commits to their pull/merge request, then they just have to push the updated ref to the repo they're submitting to.

* Auth - this is hardest part. GitHub provides authentication & permissions for pushing. The issue with allowing merge requests submission from anybody (which you could do with a server side hook) is that now people can ddos you by submitting HUGE pull requests constantly. If you could make that safe, in theory you could get away without needing user account federation.

* There is also the big security issue that you can't actually use server side hooks for this stuff since libgit2 can be used to bypass it (when pushing using libgit2, it doesn't trigger server side hooks).

* Oh, and this entire time I've been thinking about the email & username as being unspoofable. But you can easily spoof them. I guess you need federated user accounts after all.

Edit: It's untrusted interactions that's hard to decentralize ---------------

All in all, I think that git is easily decentralized if you know you can trust all the actors. It's untrusted git that's harder to decentralize.

Regarding issues, the article mentions storing them in the repo using BugsEverywhere. I use Artemis for this since it's simpler than BugsEverywhere.

I don't think "pull requests" are necessary for such a system. Even if you like the pull request workflow (I personally find it tedious and cumbersome) it can be achieved via existing mechanisms like personal email, mailing lists, IRC, pastebin, etc. Trying to define one workflow and baking it into a protocol sounds like a receipe for bloat. Note that git already has built-in support for email.

Regarding authentication and permissions: I don't think git is any different from other mutable collection of files in this regard. For example, I push my git repos to IPFS just like any other files, and use IPNS to point at the latest version. IPNS uses keypairs to limit access, which is a pretty standard technology with known practices for things like revocation, indirection, etc.

Notifications are just another 'mutable collection of files', e.g. an RSS feed, so solvable in the same way.

When I wanted to read the SQLite source code, I was surprised to learn that the authoritative copy isn't stored in a Git repo, but rather in something called Fossil SCM (https://fossil-scm.org/). It turns out that it's a SCM built on SQLite and created specifically to store the SQLite source code. Interestingly, it has built-in issue tracking and a wiki.

I wouldn't necessarily expect people to switch from Git to Fossil. I certainly haven't. But I found it fascinating and figured others might as well.

I mean, the author is decentralizing the basic download and cloning of git. The protocol lets us do that.

Git by itself doesn't have issue tracking, wikis, etc. Those are part of Github/lab/bucket.

Should the git project seek to add this as part of the repo and protocol itself? Or should we be making new protocols and repositories for this meta-project/code data?

I haven't tried Fossil yet, but I understand it has a lot of this stuff built into its protocol/repository format.

git-ssb is better than gittorrent, as bittorrent itself relies on a global DHT, which can and has been censored.

Thanks for sharing. I was discussing trackerless p2p social network with offline publishing and syncing when close to a friend a few months ago, I'm amazed how Scuttlebutt seems to be exactly that.

I was thinking of using high frequency sounds (inaudible) for peer discovery when in the same room, like chromecast is doing when phone and chromecast are not on the same local network, but simply scanning network seems to work just as well, apparently.

How can the DHT be censored? Is there any way to prevent it?

DHT uses UDP and is not encrypted. Go figure

What are the practical downfalls of this? Will my data be lost if the project is not popular enough? Is it possible for peers to modify the repository? Can I use access restrictions? I'm talking about the `gittorrent://81e24205d4bac8496d3e13282c90ead5045f09ea/recursers` URLs, not the GitHub based ones.

> Will my data be lost if the project is not popular enough?

P2P data access isn't about free storage (we're not all Linus ;) http://www.webcitation.org/6P8EBZqQX ): it's about decoupling data access from device access. Web addresses like `github.com/foo` make requests to a specific machine (github.com), in the hope that (a) that machine still exists and (b) it will respond with the data we were after.

Content addressing (bittorrent, ipfs, dat, etc.) lets us request the specific data we're after. If the original host is still serving that data, it can fulfil the request for us; i.e. the existing Web setup will still work. The advantages are those situations where the Web model would break down. For example if the host changes its address without a redirect, if their layout/navigation changes, or if they disappear but someone else just-so-happens to have it (even us!).

Many people focus too much on that last scenario. I think it's nice that with something like IPFS I can have a few flaky boxes serving my files, including a laptop that's often suspended/offline, and I don't need to care about failover, load balancing, etc. Plus I don't need to do anything specific to support offline usage from a local cache: those with the files can just serve themselves.

> Will my data be lost if the project is not popular enough?

I'd like to know how that works out too. I suppose in the end it's up to you to make sure you keep a base copy of your project so it is always available, even if near zero popularity.

I can think of a few interesting approaches:

1) We set up something like the Internet Archive, spidering and caching the network, run using donated bandwidth. The hosting could be centralized like the Archive or p2p itself (donated by peers).

2) A system like FileCoin's where you pay a tiny amount of money to ensure other peers will always host your repos on the network.

3) A straight up centralized, for-profit service where you pay a company a monthly fee to host your data on the network, as people already do with GitHub itself.

Please note that FileCoin doesn't provide any guarantees. It only provides incentives for other people to host your data.

It's like paying a centralized service, except you don't have any recourse (legal or financial) if the people hosting it decide to stop.

Thanks for the correction. I guess you get a probabilistic "guarantee" in that you could see how many peers are serving your content at any given time, and make assumptions about their independence from each other and likelihood that your content will stay available?

What are the features that make GitHub? What if we used only git on the command line?

Which features would we miss?

Maybe we could add those features with something similar to git, some intelligent scripts?

PRs and issues. PRs can be distributed using git appraise: https://github.com/google/git-appraise

Also each user's "home page" of notifications. When I worked exclusively on open-source projects, it became my daily "task list". It was very efficient to have it all in that UI.

I agree, notifications page is great, I also use it daily. Another thing is profile page.

Maybe they could be solved in a federated way like Mastodon?

Personally I use http://mrzv.org/software/artemis extensively and have no complaints.

For git or hg? As git support is described as "alpha quality".

For git; I think that "alpha quality" warning is due to the git support being added by a contributor and the author not actually using git themselves.

Cool, thanks for the info, definitely worth checking out.

> Which features would we miss?

There are the incidental contributors who do not necessarily have any affinity with git. For example, I've written a plugin for JOSM (OpenStreetMap editor), and a plugin for Inkscape. Both are hosted on GitHub, and any user willing to report a bug can do so there (and they do) without having to learn developer tools.

The main thing is a place to search, to scan a README without downloading anything, and (even if a little facile) GitHub Stars are a great way to get an rough idea of the size of the user-base.

> What are the features that make GitHub?

for me it's the random people browsing github by tags and opening issues on my project.

A good way to implement this is with the Dat protocol:


They already have a competent P2P twitter clone, as an example app. If it can do that it can probably do a github.

You should also look at http://pijul.org

In case you're looking for alternatives, this thread might be of some help: https://news.ycombinator.com/item?id=17241487

I don't think this project should be gittorrent -- after all, one of the reasons bittorrent got so popular is non-distributed tracker and .torrent files, which provide efficiency and authenticity.

Maybe git-donkey or git-gnutella instead?

I like the idea. I think it would be an interesting development to counter the centralization in the big5. GitSpoke?

what 'knows' the HEAD?

A possible approach to that is discussed in the article in the section under the heading "Was that actually decentralized?"

I’m waiting for gitcoin.

Freenet/Zeronet have updatable sites already, should be trivial to do with git commits, grab the latest signed update based on this name and key hash...

Overall though I think a federated system that also handles issues is better, something more akin to Fossil.

Fossil seems like one of many potential upgrades, and it's the second time I hear about it today!

However I think the main problem is almost always existing users. Git has arguably been falling behind in terms of features for quite a while now, but its momentum makes it hard for a lot of devs (including me!) to switch to a different vcs.

Fossil is nice but when you have very large repos (gigabytes) it gets kinda slow. At this point, when git is some kind of lingua franca of SCM, maybe the "best" thing would be if someone cooked up a single executable Fossil clone but with the git command line options and the git protocol.

> Git has arguably been falling behind in terms of features for quite a while now

It’s all I’ve ever really used, and I’ve never seriously considered alternatives. What features in other VCSs are interesting?

Mercurial is very similar to git, but its CLI is considered to be easier than gits (if potentially less powerful). Another difference is that mercurial heavily discourages editing of history, which some git users seem to like doing. Git is also faster than mercurial.

GNU arch was an early DVCS but didn't seem to gain much traction. It was eventually superceded by Bazaar, which is much slower than git but was better at tracking history e.g. across renames. I think git has improved in that area, and bazaar has been getting less popular.

Fossil is built on an sqlite database and has a built-in Web UI and issue tracker, which is fine for those who want that but violates the "do one thing" principle (e.g. if you wanted to use some other issue tracker).

Darcs keeps track of patches and their dependencies, unlike git and mercurial (which track snapshots and their diffs). This approach supposedly makes life easier, and makes the darcs CLI quite usable. One problem is that some of the merge algorithms can take an exponential amount of time to complete, which has been getting addressed in recent years but many users seem to jump ship to git.

Pijul is similar to darcs, but breaks compatibility in order to solve the exponential merge problems. It looks very promising, but is still very new.

I'm not expert enough to answer that, but Fossil and its "Built-in Web Interface" seems really interesting. I remember Mercurial seemed interesting back in the day, but as far as I can tell it mostly didn't take off because it's too similar to Git.

All in all, I'd just like a distributed scm that doesn't feel like it was designed to make Linux with 10,000 other people. It is... frustratingly sophisticated at times.

It could be a simple smart contract rather than an entire ICO/coin

The project's web site or RSS feed? If there's an existing mutable, publically-readable, privately-writable location then it can just be put there.

My projects aren't big enough to have their own sites, but I do push my repos to IPFS and give links at http://chriswarbo.net/projects/repos (actually using IPNS, which should reduce regeneration but is still experimental and flaky ;) )

when ico?

This faux outrage at github being acquired by Microsoft is already boring.

Why are people upset that others are seeking alternatives?

If other people prefer to seek an alternative to GitHub, why is this upsetting to you?

Isn’t it a good thing that people can choose to move to other services?

ya.. microsoft, the committed mortal enemy of Open Source Software, is now holding the most important hub for Open Source Software.

What year are you living in?

To paraphrase what someone else said yesterday:

In a world where companies are expected to have growth every quarter, a project like GitHub is one bad quarter away from microsoft making major changes or being abandoned or being completely shutdown.

As we’ve seen with product after product, when companies are expected to have growth every quarter, they will sometimes (often?) make radical changes to the software. In the case of GitHub, if microsoft decided it needed far more profit coming in from GitHub, it is entirely reasonable to expect they may decide to shift it into a service which is only usable for their large corporate partners, or they may decide the social features need to go, or they may decide to stop allowing public repos, etc...

Microsoft has shown many many times in the past that they’re more than willing to throw out open standards and attempted to force adoption of their own proprietary standards.

It is entirely reasonable to be at least a tad skeptical given their history and their business model.

Time heals all wounds, but calling Linux and GPL a "cancer", that's pretty harsh.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact