For a similar project, see git-ssb https://git.scuttlebot.io/%25n92DiQh7ietE%2BR%2BX%2FI403LQoy...
Conceptually, Github is not different for the local ".git" folder sitting on my machine. I can do `git clone '../some/local/dir/.git" just as easily as I can with any SSH or HTTPS link: the underlying protocol used for the transaction changes but the concept does not. So I think defining a "remote" as something inherently centralised by definition just because Github et al are popular isn't helpful: it simply persists the misconception.
Basically: I'm still actually struggling to really fully understand what dgit is.
git-ssb is a protocol for accessing git repos stored in an ssb-db (which is a distributed db). git-ssb-web is a web UI for exposing git repositories stored in an ssb-db.
Can you explain dgit in those terms?
My immediate thought was that Dgit offered a centralized remote, packaged around decentralized technology. Which is to say, many people have a main "master"; a single, centralized repo/branch. Dgit might be offering the same thing, but hosted in a decentralized fashion. I could see the value in this for backup, I suppose.
Sure, Git is decentralized but many of us still prefer having some centralization. Bundling that up into a decentralized system (aka no third party host) has some value.
Though really to me if I was avoiding Github/Gitlab/etc, the primary value add I'd want to see is all of the Github/Gitlab UI features. Notably pull requests, code reviews, comments, basic issue tracking, etc.
Every [remote ...] entry in your .git/config points to some concrete URL. Even if you have a plurality of these (thanks to git being decentralized), each one of them is a single location that could be a single point for some activity.
Suppose I mount, under /path, some kind of distributed filesystem whose storage is replicated among many servers (for fault tolerance, availability, and performance) and then store a git repo into /path/to/repo.git.
If I clone from this, so that my origin is "file:///path/to/repo.git", is that then a decentralized git remote?
The `dgit://` protocol that gets registered with the `git-remote-dgit` helper is not in itself a remote resource, but rather a deterministic identity that has been registered with the Tupelo DLT (zonotope provided good details on this below). Therefore the ownership as well as the current state (branches, tags, maybe PRs in the future :), etc) of the remote repo is decentralized away from a single entity (aka GitHub/GitLab) - as the owner, you fully control it, nobody else can modify it - not even the dgit team.
The storage part of your dgit remote is much more on the "distributed" side. We chose sia's skynet because we think it's a great fit for this. However, the actual objects of the repo could be stored anywhere, S3, IPFS, exchanged over bittorrent, or even your local raspberry pi. Regardless of where the git objects live, there still remains a single, trusted, distributed index of your repository on the Tupelo DLT.
For Sia/Skynet, the Sia network is doing the hosting, meaning the file has high uptime even if the uploader does not, and even if nobody else chooses to pin/seed the file.
Pagure supports submitting pull requests with Git repos on any server (regardless of whether it's running Pagure or not) with its remote pull requests feature. Issues, docs, and pull request metadata are all stored as git repos using JSON files as data, making it easy and portable to other Pagure instances and easy to convert for any other system. This makes it an excellent base for building decentralized development workflows with the oft-expected Git forge interface model. Since all the stuff is in Git repos, they could all be backed by dgit, for example. :)
How is the git remote you provide "decentralized" as compared with the remote provided by GitHub?
there are many _many_ web based git apps (many of which provide github like features) that anyone can, in theory, set up.
why would I want to set up a sia node instead?
But if you want to use normal remotes as a distributed network, and then a new node joins, you have to add a new remote with a new URL before you can use it.
Not sure though.
One could add a Git remote (and name it "origin") that's backed by a decentralised system currently: e.g. one could do:
git remote add origin ./some/ipfs/mount
You're over thinking my example. As I said, by "master" repo I meant the centralized repo that many _(most?)_ git users have, somewhere. Aka Github, Gitlab, etc.
You're conflating Git's design with users patterns of centralization and distributed teams. Notably that often there is a single, centralized branch(es) on a single repo that everyone merges to.
This centralization could be distributed. It sounds trippy, but as you put it if your origin is on IPFS, you effectively have a single, mutable repo instance hosted on a decentralized platform (IPFS).
My guess is that is what Dgit is. A single mutable instance, hosted on a decentralized platform (Sia).
So back to my example, many people do indeed have a "master"/centralized repo. It's not a part of git, it's a user convention on top of git. But I didn't say it was in git.
What's the difference between this kind of "decentralized platform" and GitHub?
Presumably if you were to host on IPFS it would be more difficult for people to cut you off like that.
What's the practical difference between you pinning your own IPFS node and just self hosting? Pass.
* Years? Time is weird at the moment.
Which is to say, if Github goes down, gets blocked, blocks your team, etc; your team can still push to the "centralized" repo, etc etc.
But as others have pointed out, I can set up such a backup anywhere. What is the advantage of this particular way of doing it?
I feel like I'm repeating myself, but the advantage is being distributed. Distributed systems usually aim to be resistant to these types of issues. I'm a bit confused at your question. Are you asking why being distributed is good? Or are you asking why being distributed is good in this case?
If you're asking the former, I don't think this convo is in scope of explaining distributed vs centralized.
If you're asking the latter, why would this case be any different than any other case? You have a centralized entity and a while bunch of community and tooling built around it. Why wouldn't you prefer that out of control of a single entity?
The very argument against centralization is built into the foundation of Git itself. Yet you're asking why someone might not want centralization for the rest of the non-Git components?
: "non-Git components", While the source code is tracked via Git, I just mean the centralized entity your CI/CD/etc is hooked up to is it's own thing.
I can put a cloned backup of a git repository anywhere I want. (I do this with all my git repos; I certainly don't depend on Github--or Gitlab, or anyone else--to keep the only copies that aren't on my local machine.) I can't back up the other data that Github stores (issues, wiki, etc.), but that's because Github gives no way to download it, not because Github is "centralized". Github could provide the capability to download this data if it wanted to, without changing its underlying storage model at all. It just doesn't want to.
> when there is an issue with Github your team and deployment methods (CI/etc) would automatically switch to that backup
If what you are providing is "better failover performance than Github", that's fine (although I would want to see your actual uptime numbers before I'd believe any such claim), but that's not what "decentralized" means.
> I feel like I'm repeating myself, but the advantage is being distributed. Distributed systems usually aim to be resistant to these types of issues.
Sure, but unless you think Github literally stores all of their data on one single server machine, then Github itself is "distributed" in the same sense.
> I'm a bit confused at your question. Are you asking why being distributed is good?
No, I'm asking why you keep on saying "decentralized" when it turns out that what you mean is "distributed" (and when even "distributed" isn't the real issue--see below). They're not the same thing. You keep saying "decentralized is better than Github", but benefits like better failover performance come from being distributed, and Github is distributed, as I just pointed out above.
> you're asking why someone might not want centralization for the rest of the non-Git components?
If your argument is "the rest of the non-Git components shouldn't be locked up in Github's data centers where nobody else can download them", that's fine, but it has nothing whatever to do with "decentralized" or even "distributed". You're just saying that all those other things should be in a freely accessible data store that can be cloned and replicated and accessed just like the git repo for the code itself. Sure, that's fine, I agree. But all this talk about "decentralized" has nothing to do with that point; it just obfuscates it.
So.. I think we have a fundamental failure in communication here. It's difficult to discuss decentralization or distributed designs when, well, we can't even refer to the platforms that advertise themselves as such. Ie, I did not make this terminology. The platform we're discussing literally advertises itself as such in this title.
I'm not sure what's even being discussed anymore.
(A) It's decentralized! Woohoo! (I'm paraphrasing.) Which tells me nothing about why "decentralized" is better than GitHub.
(B) It's distributed! Woohoo! (Again, I'm paraphrasing.) Which, again, tells me nothing about why "distributed" is better than GitHub. Not to mention that "distributed" is not the same thing as "decentralized", so if I'm supposed to prefer something like dgit because it's "distributed", the goal posts have been moved.
(C) If GitHub goes down, you can automatically switch to pushing to <other place>. Which amounts to "this solution has better failover than GitHub", which, as I noted, (a) might or might not be true (I'd want to see your uptime numbers), and (b) has nothing whatever to do with <other place> being "decentralized" or "distributed" since "decentralized" has nothing to do with failover reliability, and GitHub is just as "distributed" as <other place>.
(D) You can keep all the "non-git components" in <other place> along with the git repo. Which might be an advantage if it's easier to get them from <other place> than from GitHub, but, again, has nothing to do with <other place> being "decentralized" or "distributed", it just has to do with how accessible <other place> makes the "non-git components" as compared with GitHub.
Is there an (E)?
Yes, I could host my file on Github. However I think it's well beyond my patience to argue for why IPFS(/sia/etc) should exist, in this conversation.
If you see no reason why someone would want to host anything on IPFS/sia/etc over Dropbox or Github, I'm not the person to change your mind.
This discussion merely spawned because I thought I had an idea on what the hell Dgit even is. I didn't write it, have never used it, and know next to nothing about it. Merely, I had envisioned a use case for my team using Git on IPFS, and mentioned that in these comments.
Not sure that would still get people to use it, but at least more feasible than making your own GitHub instance in relation to growing network effect
We see a few distinct advantages:
- Using dgit doesn't require running a "node" of any kind (aka an SSB peer). This is possible because of the unique architecture of the Tupelo DLT: https://github.com/quorumcontrol/tupelo.
- Because of ^, installing and adding a dgit remote to your existing workflow is super easy.
- The storage in dgit is separated from the ownership of the repo. This means you can distribute the actual git objects across any storage system you would like, see my comment to cfstras below.
ssb is a few years ahead here - having a web ui and more robust suite of collaboration tools is great, but we're hoping to add those features down the road as well! Another unique advantage is the Tupelo js client (https://github.com/QuorumControl/tupelo-wasm-sdk) can run fully in the browser, meaning a fully decentralized UI is possible!
I'm not sure that I understand this, could you add details? The only interpreteation that makes sense is "you don't have to run any software on your computer", but that would suggest that the storage is centralized on some internet services.
Happy to chat about SSB any time. Good luck on this project!
However, even with distributed storage, you still need to know what makes up the git index on a given git url (aka dgit://quorumcontrol/tupelo) - this is where the Tupelo DLT comes in. When you do a push, the Tupelo network verifies the request and updates the refs of that repo.
Both of these are just basic tcp/https requests, so there truly is no long running daemon on your machine required in order to push or fetch dgit repos.
Can you expand on this? An SSB "peer" is entirely local, so conceptually similar to a `.git` directory.
I'm struggling to understand how a decentralised system can operate without peers.
(I clicked through on the Tupelo link, but from what I can glean from the README, a Chain Tree sounds very similar to an SSB peer. Just trying to understand the distinction)
It's a very interesting idea though, would love to hear more. Feel free to drop by our gitter: https://gitter.im/quorumcontrol-dgit/community
Edit: oooooh, this takes the centralization away from GitHub. I feel like that's non obvious from the name, but I could be wrong.
It is pretty clever, when you sync, it will push the current branch to every eligible remote into an unchecked out branch (master -> remote/synced/master), which gets merged when that peer syncs (or there can be a daemon that does it for you on the remote).
I use git for filesets and wikis, so true p2p is really helpful. I love seeing new stuff that can be done with git.
But to answer your question, anyone can host Sia nodes. Here are their docs on that: https://support.sia.tech/category/0OpBuOHIVD-hosting
Is there a simple way right now to use another storage backend than Sia? E.g. If I have a group of people that want to participate in hosting, but only want to host data from the people in the group?
Essentially, can you use dgit to store a repo decentralised without having to set up git synchronisation tools, but while still having control over the hosting infrastructure?
We've had a lot of experience with running IPFS nodes internally, that might be a great option for you in this case? We are in early stages and will add more adapters as feedback informs.
Also, its open source and we would love PRs :) - the storage interface is pretty straight forward:
Here is an example within dgit:
All the issue/PR discussions and CI/CD are the real problems, and this doesn’t seem to help at all, so good luck collaborating without changing your workflow when GitHub is down.
But yes, if GitHub is down, then your workflow is going to change. We're hoping to close that gap down the road, but having a way to continue pushing and pulling with collaborators with a very quick setup seemed compelling to us. Not to mention the benefits that decentralization itself brings.
The vast majority of git users tend to agree on one "origin" remote and 99-100% of their pushes and pulls are to/from that remote. So git, in practice, tends to be centralized when it comes time to collaborate with others. We're trying to re-decentralize that aspect while accommodating the convenient workflows we're all used to.
I think a counter to that is only one person needs to set up the additional git remote, compared to everyone having to install additional software to use dgit.
a) the people problem of coworkers not knowing the difference between a repository (e.g. on github), a remote (the name "origin") or even a branch (the label "master")
b) github.com turning evil or going away forever
c) myhost.example.com failing for hours or days
Tupelo validates transactions against individual ChainTrees, and you can think of a ChainTree as an independent ledger (or blockchain) that represents the state of one independent real world (or digital) object. In this case, that object is a git repository.
Tupelo only allows the "owner" of a ChainTree to make modifications to that ChainTree (such as updating the current HEAD), and ownership is determined by control of a private key.
Each ChainTree has a unique DID (decentralized identifier) that is uniquely determined by the key that first created it, and the controller of that key is the initial owner. Tupelo also has a transaction type that allows the current owner to transfer a ChainTree to a different key maintained by the new owner, but its DID stays the same after that transfer.
Tupelo uses a strategy similar to the WarpWallet to manage identities. We can deterministically create a private key from a string like a repo name, and use that private key to create a ChainTree with a DID derived from that key (and hence, the DID is derived from the string). This gives us a mapping from repos to Tupelo ChainTrees. Since the initial "private" key is deterministically derived from the repo name, that initial private key is insecure. The second step of the repo registration process is to submit a ChainTree transaction to transfer ownership of the repo ChainTree to a secure private key. That way, only the controller of the secure private key can make changes to the repo ChainTree, even if anyone can reproduce the original key from the repo name.