We ask for the commit we want and connect to a node with BitTorrent, but once connected we conduct this Smart Protocol negotiation in an overlay connection on top of the BitTorrent wire protocol, in what’s called a BitTorrent Extension. Then the remote node makes us a packfile and tells us the hash of that packfile, and then we start downloading that packfile from it and any other nodes who are seeding it using Standard BitTorrent.
The disadvantage here is that any given file in the repo could be stored in an ever increasing number of packfiles. Each existing version of the repo will generate a new packfile to get it to the newest version, and it's up to the authenticated masters to generate and seed each of those packfiles, while peers either do or do not cache these replicated datas. In short, this means of syndicating updates ignores the Merkle-DAG-ness (DAG-osity?) of Git.
I just checked this out thanks to your link. Awesome. ipfs solves a whole class of problems - distributed git repos being just one of them. Great stuff.
I believe that these technologies are bound to wither not because of technical problems but because of "political" problems. These technologies are a huge problem with the regard to how intellectual property is managed today and how it is monetized.
Every IP owner will try to slow down the progress of these technologies, mainly by not adopting them. I think these technologies won't be adopted until the IP monetization problem is solved.
To me, this means that if the media are to be easily decentralized and shared, the monetization (how you get money for your work and how you give money when you benefit from it) must become equally easily decentralized and shared (between the publishers, authors, etc.).
Creative Commons cense is important, that is why. Research and development can scale expression if we look for new ideas before old fear. #CreativeCommonsCense? (#CCC?)
I agree it is really stellar. The billion dollar concept for me though is encrypted repo torrents. Imagine a group of servers that are hosting encrypted chunks which form the basis of a homomorphic encryption protocol for distribution using forward error correction to allow recovery of the deltas if n of m components of that delta can be recovered. Basically if you have the key you can pull out of this amorphous cloud your source code, and if you don't have the key you won't even know it is out there.
I started building a toy version of this about 5 years ago but got distracted by work. Essentially the repo key encrypted the packfile, the storage reliability layer used its another key to encrypt the chunks. The latter key would find the chunks, with enough reliability to re-create the encrypted packfile, which the latter key could decrypt and apply to your repo.
A very fun problem in distributed systems and data structures.
I've always thought this would be amazing for file storage in general. A Dropbox that just syncs with the cloud encrypting your content with a private key and you can withdraw it at any time. You simply give back say 5 x as much storage as you use for the data mirroring.
AFAIK this was the concept behind Wuala. Founded in Switzerland, later bought by French company LaCie. According to their website the clien side encryption is still there, however they discontinued the collaborative disk space business model.
We're trying to recreate that concept with peergos, https://github.com/ianopolous/Peergos we're considering trying to use ipfs, http://ipfs.io under the hood which would be nice. Wuala was great but they never open sourced it and stopped the storage earning in 2008.
> The un-updateability of torrents is something that seems to seriously limit it's use.
I am not involved with the development of torrents at all but (please bear with me until the end) my initial reaction is that we should think of the lack of ability of torrents to update as a feature and not as a bug.
Perhaps if ability of torrents to update is a concern then it warrants a new peer to peer protocol? (Please note that this is not the case of http://xkcd.com/927/?cmpid=pscau as I am not advocating a new protocol for every use case)
It seems like we can sign torrent files with gpg keys. Perhaps I am wrong. Perhaps, we can allow updating in torrents if we require that the updates be signed with the same private key as the original torrent? Am I barking up the right tree here?
Edit: Oops. I edited this post before I saw the reply about BEP-0039. Apologies.
Updatability does not need to imply mutability. It could be possible to have a torrent with 50 files, and then the torrent is updated with a new file, and a client that has already downloaded the new torrent will only need to download one new file. And another client that still wants the old torrent would still get the same 50 files, but would be able to download from clients that have the new torrent. (Similarly, clients downloading the new torrent could download the first 50 files from clients seeding the old one.) That's updatability without mutability.
Follow-up question: does BitTorrent work this way currently? If I take your 50-file torrent, add one file and re-seed, will you be seeding the original 50 files in the same swarm as my 51 file torrent?
I regularly download mame updates by pointing the updated torrent at my ROMs directory. All changes are redownloaded, and new ROMs are grabbed.
Edit: just realized I misread the question. No. In my case, the previous mame romset swarm is usually abandoned, and the new torrent takes all the traffic. The swarms are unique.
What I described would be the ideal situation. I don't know of any existing P2P system that works like that, unless it's a system for sharing individual files instead of "torrents" containing multiple files.
So basically, git over IP? Each "update" is just a commit containing the modifications since the last one. If you want the most recent, ask for the HEAD.
The article goes into why that's not really an option. Making a separate request for each blob/file/commit/whatever would be way too slow for large repos.
I was more commenting that the very updatability you were discussing is pretty much directly analogous to what git is in the first place. So now we're building a system that has the same functions as git, but isn't git.
Great to have. Alas it relies on a central system of record, is just a feed that can be re-fetched. The 'originator' key is probably worth standardizing around for any future approaches, but using PEX or the DHT to notify the most recent magnet uri or what not.
Simply signaling new Magnet URI's would have the disadvantage Gittorrent sought to avoid: resyndicating the entire contents with every single change: a killer for things like the Linux kernel or projects like Debtorrent. Git's merkle-dag avoids this problem, allows multiple concurrent versions to share the bulk of the content-indexing, and best-of-all-worlds solutions would preserve this capability.
bittorrent sync is actually very much like this. People can offer read only subscriptions to their repositories, and then everyone distributes that repository to everyone else bittorrent style. So you have multiple read-writers who can publish and update the repository and multiple readers who can just subscribe to it and help to distribute.
So an example where this technology could be put to a unique use: Minecraft streamers and lets players sometimes like to distribute the world they are using. So they could make a repository of their world and distribute the read only keys for it to other users. This would allow them to play it, even temporarily make changes because sync kicks in and refreshes it, and the repository would be kept current as the world progresses. That should be viable right now with Sync.
Problem is, I think the bittorrent foundation is doing their damnedest to keep themselves firmly planted in the distribution and ownership of the technology. So we won't see an explosion of third party clients. I don't think it will see much adoption for this reason, and that's a real shame, because it would be a wonderful bit of kit for the internet.
Peers don't have to seed packfiles the way we're used to for "traditional" seeding of movies or music; these packfiles actually represent transition from one commit to another (instead of "all content from beginning to now"), so they are inherently ephemeral. They don't even need to be kept on the disk, because they will be generated on the fly every time a client is interested.
The DAG-osity of git really helps here (because you only have to transfer what's really needed), and the "immutability" of git helps because if your project is popular and you update your branch, everyone will want to go from the old commit to the new commit, so everyone can share the diff between them directly.
> It surprised me that nothing like this seems to exist already in the decentralization community.
There was a GSoC project in 2013 which did exactly this, using Freenet as decentralized storage backend with a Web of Trust for Spam resistant and updatable identities (note that in the gittorrent scheme once someone claimed a username, that username will stick there forever). It works and compared to GitTorrent it adds anonymity and upload-and-run.
Major fan of this idea. But how does one address the GUI challenges presented by leaving GitHub behind? It can't be understated that GitHub provides an amazing (communal/social) user experience.
You're right, of course. This is just a first step.
One interesting followup idea might be that the BitTorrent library I'm using, webtorrent, also works in browsers over WebRTC. But I'm not using that because I wouldn't know what to do with a git cloned repo inside of a browser tab. Maybe someone else will though. :)
GitHub provides: - Repo hosting, - Search, - Community
In comparison to decentralized Search and Community, decentralized file storage is easy. Conveniently, centralized repo hosting is the biggest problem. Not being able to Search / Comment / Report a bug during a DDOS decreases productivity, but not being able to push commits / run CI tools is a productivity halt.
The best next move, might be to focus on decentralized repository hosting, solve that well, and allow users to conveniently mirror the GitTorrent repos on GitHub. Giving the best of both worlds until Search and Community can also be solved well.
This may mean GitTorrent would need some form of post push hooks (i.e. to update mirrors or run CI). Which I'm sure is doable.
Hmm, decentralized search here should probably just use a Kademlia-like implementation (http://en.wikipedia.org/wiki/Kademlia) where a 'XOR metric' is used to measure distance between nodes. (That way the max number of lookups will be log2(n) where n is the number of nodes (with further optimizations possible))
Obviously someone would need to build a user-friendly interface for all that, etc.
What about using git notes (`man git-notes`) for tracking issues, comments, etc? They are stored as git objects (right?) and could be used for this task?
I thought about using git notes for this and didn't how it is better than adding issues, comments and wiki inside the repo itself. We are used to put documentation and tests alongside the code in our repo, why not add wiki and issues?
You can make an excellent case for that: this would require documentation and tests to be up-to-date before a commit would be accepted by whoever maintains the repo.
It's the same old problem of trying to build a peer-to-peer social network. How do you ensure that large files are distributed correctly and quickly with minimal security implications in an environment where nodes are constantly joining and leaving the network? Perhaps it's possible, but if it were an easy way of doing it, there would be more of that sort of thing around.
> How do you ensure that large files are distributed correctly and quickly with minimal security implications in an environment where nodes are constantly joining and leaving the network?
The owner of a project will keep a node always online to fight its own churn. There is no big files in the case of issue tracker. This is not video social network. Also, video social network do work, e.g. private trackers, the only thing is that they use the web publish magnet links which can be done over a DHT cf. PPSP and tribler.
I guess the projects that can't be on github won't mind the GUI challenges as long as they have some way to have a central repository without having to maintain a server on their own.
Do the same thing you did for the Gittorrent. A GUI client that runs a torrent of a php file that connects to a torrent of a database file. You just need to always be connecting to the latest and greatest.
The post mentions using the blockchain for unique username registration and mapping to public key hashes, and as it turns out there's a project I and others have been working on that does exactly this called Blockstore.
The way it works is there's a mapping between a unique name and a hash in the blockchain, and then there's a mapping in a DHT from the hash to the data to be associated (which can be a plain old public key and can also be a JSON file that references a public key and other identity information).
That's great, thanks! I should just use this (preferably with the DHT I'm already using to look up Git commits) instead of reimplementing myself.
What do you think about the idea of making pluggable modules to connect Blockstore with web frameworks (Django, Rails), without the framework/website authors having to get involved in understanding Bitcoin themselves?
As far as Django, Blockstore is on Pypi so you can just install and import the library.
I think it'd be great to have modules for Rails, Node, etc.
I saw your tweet and I'll shoot you an email. Also feel free to open an issue on github.com/namesystem/blockstore and we can discuss the idea openly there.
The remaining hard part is the adapter layer to enable the extra applications such as the issue tracker to use the Git repository for storage. Joey Hess has a good article about Git "databranches" here: https://joeyh.name/blog/entry/databranches/
A very interesting idea, GitTorrent, but I have one question which comes to me whenever I read about a delta-based distribution scheme: who is going to generate and share all those deltas?
Some Linux distributions have experimented with delta-based package repositories, examples are deltup for Arch Linux and rpm-delta for RPM-based distros. Some of the known issues are:
- choosing the number and spacing between deltas. Fine-grained deltas require more storage space, coarse-grained deltas require more download bandwidth.
- retiring old deltas: periodically deleting all deltas older than a certain version, replacing them with the full package of that version. Again a trade-off between storage space and download bandwidth.
For Git repositories, this would roughly translate to:
- choosing the number, history spacing, and size of the Git packs per repository.
- retiring old Git packs: periodically deleting Git packs older than a particular revision, replacing them with a bare repository at that revision.
This is isn't quite how I would describe it. Git does do delta compression in packfiles, but the fundamental primitives lack deltas. It's just:
* The contents of this directory is this list of of files whose contents have these SHAs.
This is called a "tree".
The SHA of a tree is also an object, and can appear in another tree.
To see this for yourself, in any git repository run `git cat-file -p HEAD`. You'll see the (more or less) raw commit object for HEAD, which will point at a tree SHA. To see the contents of that tree-sha, run `git cat-file -p <the tree SHA>`. That tree object has a one-to-one correspondence with what you'll see on-disk in the objects directory, (if the object has not been put in a pack file).
Above I have more or less fully described the contents of the files found in `.git/objects`.
The delta'ing doesn't happen until later, if and when packfiles are constructed. But they're just a storage/bandwidth optimisation. AFAICT, these deltas have nothing to do with what you might think of as "git diff", which is just some fancy porcelain which looks at objects.
The nice property of the construction is that given a large tree, even if nested, if you change a single file in that tree, you will only change as many trees as the file is deep in the tree, so computing changes between two nearby trees can usually be done quickly.
The problem is not with deltas between revisions (i.e. the commits themselves) but with the Git packs spanning multiple revisions. At the time of a `git pull`, a user's repository can be at any revision between initial and latest. Who is going to seed (and keep seeding) packs for all of those possible revision intervals?
My impression was that peers are generating the deltas on-the-fly based on which commits the requesting peer states it needs. The problem's therefore shoved onto git itself, with the seeder just cherry-picking a specific range of commits from its own copy of the repo and bundling them together.
Not quite: the peer generates the pack and tells you its hash, and then you query the network for anyone who has that hash (them, for starters), and perform a swarming download of it. So git clones of popular repositories would usually swarm.
The probability of swarming would be influenced by multiple factors, eg
* Higher popularity => More peers => Higher probability that multiple peers want the same packfiles.
* Higher popularity => More commits => More permutations of packfiles => Lower probability that multiple peers want the same packfiles (and stronger trends toward small/inefficient packfiles).
* More frequent synchronizations (peers always online) => More immediacy => Smaller packfiles => Higher probability that multiple peers want the same packfiles.
* Less frequent synchronizations (peers go offline regularly) => Less immediacy => Bigger packfiles => Lower probability that multiple peers want the same packfiles.
It would be really interesting to see how these competing pressures play-out (either by doing some math or randomized experiments).
If the main goal here is strictly decentralization (without concern for performance or availability[F1]), then one might look at swarming as a nice-to-have behavior which only happens in some favorable circumstances. However, by latching onto the "torrent" brand, I think you setup some expectations for swarming/performance/availability.
([F1] Availability: If Seed-1 recommends a certain packfile, then the only peer which is guaranteed to have that packfile is Seed-1 -- even if there are many seeds with a full git history. If Seed-1 goes offline while transmitting that packfile, how could a leech continue the download from Seed-2? The #seeds wouldn't intuitively describe the reliability of the swarm... unless one adds some special-case logic to recover from unresolvable packfiles.)
---
Could this be mitigated with some constraints on how peers delineate packfiles?
> Could this be mitigated with some constraints on how peers delineate packfiles?
YAGNI.
Like so many here, you have a single view of how bittorrent should be used, based on current filesharing practices, so you believe we need to map gittorrent to filesharing and have those packfiles be as static as possible in order to be shared at large.
You need to go back to the root of the problem, which is simple: there is a resource you're interested in, and instead of getting this resource from a single machine and clog their DSL line, you want to get this resource from as many machines as possible to make better use of the network.
How does gittorrent work ?
- The project owner commits and updates a special key in the DHT that says "for this repo, HEAD is currently at 5fbfea8de70ddc686dafdd24b690893f98eb9475"
- You're interested in said repo, so you query the DHT and you know that HEAD is at 5fbfea8de70ddc686dafdd24b690893f98eb9475
- Now you ask each peer who have 5fbfea8de70ddc686dafdd24b690893f98eb9475 for their content
- Each peer builds the diff packfile and sends it through bittorrent. Technically it's another swarm with another infohash, but you don't care; it's only ephemeral anyway. The real swarm is 5fbfea8de70ddc686dafdd24b690893f98eb9475.
Because of this, higher popularity will mean more peers in the swarm, whatever the actual packfile to be exchanged is. Bittorrent the way you know it is not used as-is, because there is information specific to gittorrent that helps make a better use of it.
Great comment, thank you. But I think the infohash should actually be shared, packfiles are pretty deterministic in practice. So you'd be getting the diff packfile from the person who just made it, and anyone else who already did.
(If I find packfile generation to not be deterministic enough, I think I'll switch to using a custom packfile generation that is always deterministic.)
GitLab CEO here, thanks for mentioning us as the open source alternative. We think in the short term multiple organizations hosting their own GitLab is the way to go. It is hard to do issues and pull/merge requests in a decentralized way (the OP is impressive but it shows distributed git instead of distributed GitHub). I would like to see federated merge requests http://feedback.gitlab.com/forums/176466-general/suggestions...
It is hard to do issues and pull/merge requests in a decentralized way
...Over the web...
I mean, Git and Linux are developed via email lists, which is a decentralized way of sending pull/merge requests, isn't it? I guess you could argue the mailing list is hosted on a server, fine. So then fine, usenet.
Yeah, email and nntp are old crufty technologies and there are obviously advantages to having a web based interface and a central place to go for a project. But git itself certainly supports a decentralized mechanism for merging.
So then the question really is, how do you decentralize a web interface, isn't it? The dvcs itself isn't the problem.
Why would you want to have a web interface? The web stack is crufty as hell, and the web has the one major problem that it forces everyone to use the same UI for accessing a service. With email, for example, clueless users can get by with some simple webmail interface, but that does not prevent me from managing my mails efficiently with mutt. The standardization of UIs instead of protocls and data formats is one of the worst regressions in IT in recent years, IMO.
Wouldn't a decentralized web interface be webmail (i.e. Gmail)? Then along the Gmail theme, a plug in could make things look much like a static website. Where "conversation view" becomes a "repository view".
Yeah, decentralizing the web interface is hard. Maybe it is the wrong question. Maybe it is, how can you use usenet to get an overview of issues and do code reviews.
Not just federated merge requests, but federated one-click cloning. Given a "home" GitLab instance, you should be able to click a button on any GitLab repository (or on repositories elsewhere, based on some spec), and end up with a clone of that repository in your "home" instance.
That should be possible via browser support for site-based URL handlers; for browsers without such support, you could also have a URL on the main GitLab site that people can register with to redirect to their "home" instance.
You'd be logged into your "home" instance, which gives you the permissions you need. You click a link on the remote site, which takes you (either directly via URL handler or indirectly via central site redirect) to your home instance with appropriate values filled in, and prompts you to confirm (to avoid CSRF).
That sounds like federated two-click cloning, not federated one-click cloning. Maybe you could batch up the confirmations and get federated 1.1-click cloning?
Yes, exactly. Use URLs like gitfork:https://... and register a handler for the "gitfork" scheme. (The site could also register a handler for git:// , but that won't help for http or https, and git:// isn't secure, so I'd suggest avoiding that.) As a backup, also provide links to some central domain that allows people to configure where they'd like to redirect it to.
Agreed that the argument is a bit weak, but we would still end up with a major centralized repository for a decentralized protocol. And the changes to make Gitlab more like the proposal would probably be more work than just making the proposal a new project.
I think we would end up with multiple major repositories and many smaller ones. One of the first steps towards more decentralization would be an issue tracker that doesn't under numbers but hashes and that stores its content in a git repo (like the wiki already does). EDIT I noticed in the comments the Fossil already has this, interesting http://fossil-scm.org/index.html/tktview/3711fc7cfd21d7f8684...
I need to go more in-depth on the proposal. But the first thing that strikes me, is if you're going to use the Blockchain (with a capital "B") as storage of usernames and such. Why not use namecoin? It has the process for name consensus down. Also it won't pollute the main Bitcoin blockchain.
I have a mild bias against altcoins, and have heard bad things about Namecoin in particular: that the anti-spam incentives aren't good, leading to illegal files stored in the blockchain itself, and that there's no compact representation (like Bitcoin's Simplified Payment Verification) for determining whether a claimed name is valid without consulting a full history.
As I understand it, these two design flaws combine to mean that you have to store some very illegal files to use a namecoin resolver, which doesn't sound good to me. (I may be mistaken, since the bad things I heard about Namecoin came from Bitcoin people..)
There aren't any inherent protections to storing illegal data in the Bitcoin blockchain, either (short of evading the issue by using a thin client, I suppose). The nature of what is illegal being so encompassing, in that you can ultimately encode the information into some integer of sorts either way, means it's impractical or unnecessary to provide full protection for such things.
No inherent protections, just different incentives. Storing a 4MB image at 80 bytes per $0.08 OP_RETURN transaction would cost you $4000. Why would you do that when you can use namecoin for practically free?
Which makes the statement you made that, "this currency stuff is frankly kind of uninteresting to me," kind of funny (I assume that was on purpose) because the currency stuff is exactly the thing that makes bitcoin work so well.
I made that statement because a lot of people don't realize that Bitcoin, ignoring the fact that you can trade it, has solved a fundamental consensus problem in distributed systems that we should care about and use :)
> I have a mild bias against altcoins, and have heard bad things about Namecoin in particular: that the anti-spam incentives aren't good, leading to illegal files stored in the blockchain itself, and that there's no compact representation (like Bitcoin's Simplified Payment Verification) for determining whether a claimed name is valid without consulting a full history.
Security-wise Namecoin is a weaker Blockchain, but I think in this case it's not that important. -- Are not the anti-spam measures hurting users as much as spammers in this case? Since you end up with higher-costs for what is otherwise practically free with a centralized service.
From the looks of it there is no reason that Bitcoin's Simplified Payment Verification wouldn't be usable with Namecoin either.
Right, and the notion that murder is illegal makes no sense to the universe as humans are "merely" an aggregate of matter no different than any other. Nothing makes any sense to entities not capable of reasoning, that is a pointless tautology with no relevance to the discussion.
Your analogy doesn't hold up, as (for one) we have society and plants don't. I'm glad that child porn is illegal (which is usually distributed in files), as it makes the world safer for children.
If pi goes on forever, it contains any and all representations of child porn that could exist. The first person to calculate pi that far will be breaking the law. No, that doesn't make sense, nor does any sort of information being illegal. What does make sense is to legislate the actions that can generate that data, or the misappropriation of that data.
This number contains all movies that have and will ever exist. However to specify where a 700mb harry potter movie is in this number, you'll need at least 700mb to represent the index. So in some sense harry potter 'exists' in this number - but in another sense its just a silly encoding method.
While integration between git and ipfs is good -- I'm not sure I see how this is much different from just tar-ing up a git-archive and creating a torrent/magnet (setting aside many of the other aspects of how ipfs and torrents work).
Decentralized git pull for only a given hash isn't all that interesting?
If one could pull in updates and/or push changes -- that would be "decentralized git". This is example is more "ipfs.io as a transport for git-releases", rather than "ipfs.io as a transport for git"?
Yeah! I was a Gitchain backer. The difference is that Gitchain stored the actual git commits in the blockchain, and I leave the actual commits on the hard disks of each BitTorrent seeder.
Stored github in btsync for a startup before. It actually worked ok. We stopped doing it because btsync was trivial to crash with basic fuzzing and was closed source
This is an aside but I like the author's writing style. Not only is he clear but also describes why he's doing what he's doing and how it's important to him and others. Really helps me give the ideas more thought!
> Google Code announced its shutdown a few months ago, and their rationale was explicitly along the lines of “everyone’s using GitHub anyway, so we don’t need to exist anymore”.
I mean, sorta. It was also because running a service is expensive, and containing abuse is a constant thankless treadmill.
> We’re quickly heading towards a single central service for all of the world’s source code.
Far from it? Not that a fully-decentralized system seems bad, but there are many things that aren't github. I don't even have anything of interest on github.
I have to admit that I started reading this post feeling a little snarky about the concept. However, I think Chris makes an excellent case for the concept.
This is awesome, but it centralizes on JavaScript.
It is an implementation of a standard without the standard being defined so other implementations can spring up.
Git, one could argue, is language centralized also, which is technically true. That I don't have an answer for. But I don't believe handing off so much dependence to a JavaScript application fits for me.
A C/C++ application like Git I can overlook, at least for a decade or so, but JavaScript feels like a perpetual beta/prototype only. Granted, that's my subjective feeling.
The word "centralize" in the way you're using it is very misleading. Every piece of programming in existence uses some sort of language.
What you're really stating, I think, is that this is written in Javascript and you don't like Javascript. That's totally fine, and it's your prerogative. I'm sure that, like any other piece of programming, GitTorrent can be rewritten in other languages. If Javascript bothers you that much, then do this: rewrite it in Ruby if that's what you prefer.
But please stop attacking the claim that this is "decentralised github" by claiming that this "centralises" on Javascript. It doesn't "centralise" on any language. It's just a first implementation written in Javascript.
That this implementation being uncomfortable for me is my subjective feeling, I've mentioned several times. That I don't want to depend on JavaScript is clear. I don't want to depend on JavaScript. Using the word centralize the way I have, I can't see otherwise. But I'm not defaming anyone. I'm not attacking anything. I'm very motivated to help the standard advance, but I would never for a moment install this implementation. Stupid of me? Sure, maybe. But it's not an attack.
What I wanted I already got, confirmation by the OP that this is in fact a reference implementation and not the all-end-all.
Were that clear to start with, I wouldn't have commented other than to say, awesome! This is extremely exciting to me.
I could see this integrating really nicely with mailing lists. The commit could be sent out to the mailing list, and anyone who is interested in reviewing the code or doing a merge would already have the information they need, no blockchain required.
If a version of this with friendly name support is released, I will mirror all my active GitHub repositories there.
If someone builds on this, as discussed elsewhere in the thread, to make a decentralized service that mimics 'social' functionality such as issues and pull requests, I will strongly consider using it instead of GitHub (depending on the UI, stability, etc.).
I don't even have any particularly popular repos, so there is no real reason for anyone to care about the above, but, y'know, HN comments approving of the idea don't necessarily translate into actual interest in the product, so now you know there's at least one person in the latter category. :)
I always thought Github was open source and that you could fork it, like in the spirit of Github. I find that slightly ironic =)
What's stopping a decentralized github from ending up in the same fate as the newsgroups, that the data set gets too large to handle !?
I think that Github is more then just a repository, it's a community. I kinda quit Facebook and signed up to Github instead :P And if it weren't for Github I would have never touched git.
Startup idea: Create something like Github and assembly.com, but with a complete tool-set (git+vps)
I think this is a great piece to build a "pay for branch merging" way of promoting open source.. use decentralized currency and voila, automated programming.
What I miss in the article or discussion is consistency of refs. The moment your remote is a distributed one, it may have two different values for HEAD in single point of time in different parts of network.
If two clients push in this time, HEADs diverge even further.
So as in distributed databases, you either:
* need to acquire exclusive lock on the repository metadata, or
* accept, that your push will be eventually discarded because you did not have up to date metadata
I like the idea of decentralized, but I'm not sure we need p2p as well.
It'd be interesting to see something like gitlab to have some sort of federation support; where multiple instances can talk to one another (a la xmpp/smtp), so as to clone/send pull-requests, etc across different instances.
I don't understand why so many people are in awe of this project when it seems to be based on a number of falsehoods:
> "imagine someone arguing that we can do without BitTorrent because we have FTP. We would not advocate replacing BitTorrent with FTP, and the suggestion doesn’t even make sense! First — there’s no index of which hosts have which files in FTP, so we wouldn’t know where to look for anything."
Actually there is. It's called a mirror list. Most FTP-based repositories support this.
> "And second — even if we knew who owned copies of the file we wanted, those computers aren’t going to be running an anonymous FTP server."
Except bittorrent does turn your client into a server. Many clients silently punch holes in your firewall via uPnP, so you don't always realise you're running a server, but it does still happen.
And as for anonymous FTP servers, it depends on what you mean there. If you mean anonymous access, then that's not only supported, but actually the norm. If you mean the server itself is anonymous, then it should be noted that neither github nor torrent seeding peers are anonymous either.
> "Just like Git, FTP doesn’t turn clients into servers in the way that a peer-to-peer protocol does. So that’s why Git isn’t already the decentralized GitHub — you don’t know where anything’s stored, and even if you did, those machines aren’t running Git servers that you’re allowed to talk to. I think we can fix that."
Hang on, a moment ago you _didn't_ want to run servers. Now you're complaining that git clones aren't servers?
-----
Then there's the matter of the github competitors, of which there are many. gitlab, gitbucket, etc. Some open source, some closed but free, but all of them largely offer the same features as github.
These days it seems trendy to use bittorrent as a bootstrap for all kinds of wacky and wonderful problems, but using bittorrent for a protocol that's already distributed and already pretty saturated with github competitors; well it just seems redundant.
I think your complaints are disingenuous. [edit: what I mean is that you've pointed out problems with small components of the overall article and used them as an argument against the article as a whole, which I think is unfair. I don't think the components' validity affects the overall idea.]
His argument may have some holes, and people are probably mostly ignorant of those holes so they can't critique them, but I don't think they're in awe of the idea because their awe hasn't been dispelled by identifying those holes.
This is a very neat idea, and none of the issues with the setup argument dispel that.
"using bittorrent for a protocol that's already distributed and already pretty saturated with github competitors; well it just seems redundant."
no, no, it doesn't seem redundant. DHT based distributed indexing is so incredibly fundamentally different than a mirror list for files in FTP or from a series of GitHub clones. It's owner-agnostic. It just exists, by virtue of having participants, with no overhead. I don't have to select a target host or find a server or even identify where my particular file (or git repository, or username, or whatever) lives. It's .. unification. It's elegant and reduces complexity and makes the whole ecosystem more simple.
Maybe I'm blinded by my own awe, but, I love this idea.
Also, yes, some people have thought of components of this before, but I haven't really seen the full stack laid out vertically like this, combined with a narrative that makes me so excited about it.
Thank you for taking the time to respond to my comments :)
There's a few more concerns I have:
1) The whole point of git is versioning, having a model like this breaks makes versioning several orders of magnitude harder.
2) If I'm pulling from repositories to install on servers, I'd rather grab them from known trusted sources rather than "the anonymous ether"
I'm normally really receptive to new distribution models, so I don't mean to be negative for negatives sake. But I'm struggling to see the practical upsides of this.
The "anonymous ether" isn't dangerous if you have some way of verifying what they're sending you, and with bittorrent, you do, since you request the content using its hash.
Collision attacks are not really a problem, since they only happen when the attacker gets to specify the hash, which wouldn't be the case here.
Generating a file that hashes to an existing hash is called a Preimage attack, and SHA-1 (the algorithm used by bittorrent) isn't, for now and as far as we know, vulnerable to any.
Because anything that's vulnerable to collision attacks is theoretically vulnerable to preimage attacks. Where I went wrong was assuming that preimage attacks were practical, but as you've rightly said, there's been no known exploits because of their extreme difficulty.
So it's one of those situations where everyone was right: it's so impractical to exploit that it's as good as not vulnerable even though it's mathematically possible.
It's clear, though, that there are aspects of 'distributed' that Git does not accomplish. This gives you a global namespace and a global identity, just like a website, without a single point of failure or a corporate interest, and with a closed source system and a trivial option for forking the whole structure into your own [friend group|company|private cluster|etc].
An argument could be made that git ought to be augmented with something like this: why run distributed protocol A on distributed protocol B; maybe we should just run distributed protocol AB from the get-go?
Sweet as hell. My only quibble is the idea of using the blockchain for validated naming. I think it'd end up in a landgrab which is nasty.
As much as we all hate DNS, having the ability to kick a squatter off someone's name is probably a good thing. Personally I don't think having a crazy hash for identification is a bad thing. Rather, what you need to do is just have some sort of reasonable personal contact book so you only have to deal with the crazy hash once (when you decide you want to remember that someone is who they said they were).
These are just usernames, not trade marked domain names. What did you do when someone got to gmail/twitter/whatever before you did and picked the name you liked? You just picked a different name, or added some digits to the end of the one you liked. You could do the same thing here.
My experience has shown that having names be treated as unique identifiers is a hairy situation. And frankly, unique identifiers in a system for distributing source code are as important as domain names if not way more so.
Domain names only ever run code in an ostensibly sandboxed vm.
Unlike domain names they're immutable. So if it's good now, it's good as long as the private key is kept safe.
If you want to cross reference the identity with an email address you could use a keyserver. If you want to cross reference the identity with a domain name, you could use a TXT record.
Hmm. How about mixing and matching, like by having a DNS record with a special TXT record syntax, e.g. 'IN TXT "gt=yoursha256hash"'. That way you could find your destination via DNS and verify the hash in the TXT record against the blockchain.
I don't really understand the landgrab complaint: this costs $0.08 per username registered ($0.16 if you want to avoid races with miners), in comparison to centralized sites, which usually cost $0 per username registered. Why does this lose?
Centralization is about equivalent to ICANN being able to kick off a squatter. A truly decentralized system with first to register doesn't have a way to kick off squatters.
Well, the current ones don't, but it could. Decentralized networks just need decentralized decision, ie., you could have a process by which the majority of the nodes decide to kick off a squatter.
It's a really good question and hopefully a good solution can be implemented. I always thought namecoin was onto something because whoever was willing to spend the most (run the most miners) got to control transactions, so market forces prevented excessive squatting.
Of course that has the downside that large entities grab all the good names and the little people are left with the leftovers.
We ask for the commit we want and connect to a node with BitTorrent, but once connected we conduct this Smart Protocol negotiation in an overlay connection on top of the BitTorrent wire protocol, in what’s called a BitTorrent Extension. Then the remote node makes us a packfile and tells us the hash of that packfile, and then we start downloading that packfile from it and any other nodes who are seeding it using Standard BitTorrent.
The disadvantage here is that any given file in the repo could be stored in an ever increasing number of packfiles. Each existing version of the repo will generate a new packfile to get it to the newest version, and it's up to the authenticated masters to generate and seed each of those packfiles, while peers either do or do not cache these replicated datas. In short, this means of syndicating updates ignores the Merkle-DAG-ness (DAG-osity?) of Git.
The un-updateability of torrents is something that seems to seriously limit it's use. There are a lot of interesting attempts to hack around this- LiveStreaming and Nightweb are two that spring to mind. https://www.tribler.org/StreamingExperiment/ https://sekao.net/nightweb/protocol.html