Hacker News new | past | comments | ask | show | jobs | submit login
Dgit: Git with decentralized remotes (github.com)
147 points by QCSmello 3 days ago | hide | past | web | favorite | 84 comments

As others have commented, this appears to be a decentralised alternative to GitHUB rather than Git itself (which is already decentralised).

For a similar project, see git-ssb https://git.scuttlebot.io/%25n92DiQh7ietE%2BR%2BX%2FI403LQoy...

Another Dgit deve here. You're right. Git is decentralized, but most people use centralized git remotes like GitHub. Dgit makes using a decentralized git remote easier. Eventually we'll build more decentralized alternatives to other GitHub features, but the most important value proposition that GitHub provides now is as a git remote, so that's what we started with. We've provide a GitHub action that allows you to use GitHub's other features until we build more decentralized alternatives

While I know there are many devs out there who confuse & conflate "Git" and "Github" and don't really know the difference between the two, I don't think bringing the conflation into "informed" discussion is particularly helpful.

Conceptually, Github is not different for the local ".git" folder sitting on my machine. I can do `git clone '../some/local/dir/.git" just as easily as I can with any SSH or HTTPS link: the underlying protocol used for the transaction changes but the concept does not. So I think defining a "remote" as something inherently centralised by definition just because Github et al are popular isn't helpful: it simply persists the misconception.

Basically: I'm still actually struggling to really fully understand what dgit is.

git-ssb is a protocol for accessing git repos stored in an ssb-db (which is a distributed db). git-ssb-web is a web UI for exposing git repositories stored in an ssb-db.

Can you explain dgit in those terms?

I too am confused.

My immediate thought was that Dgit offered a centralized remote, packaged around decentralized technology. Which is to say, many people have a main "master"; a single, centralized repo/branch. Dgit might be offering the same thing, but hosted in a decentralized fashion. I could see the value in this for backup, I suppose.

Sure, Git is decentralized but many of us still prefer having some centralization. Bundling that up into a decentralized system (aka no third party host) has some value.

Though really to me if I was avoiding Github/Gitlab/etc, the primary value add I'd want to see is all of the Github/Gitlab UI features. Notably pull requests, code reviews, comments, basic issue tracking, etc.

Only problem is, according to a conventional understanding of git, there is no such thing as a "decentralized git remote"; maybe can you do a better job of explaining what that is?

Every [remote ...] entry in your .git/config points to some concrete URL. Even if you have a plurality of these (thanks to git being decentralized), each one of them is a single location that could be a single point for some activity.

I don't know anything about dgit yet but regarding this general idea: what you say true from the perspective of your git client but that URI could be a gateway to a decentralized backend. If you're familiar with IPFS, imagine IPFS gateways for git repos. The gateways themselves are centralized but they're just feeders into a decentralized network.

So "decentralized" is a synonym for "distributed"?

Suppose I mount, under /path, some kind of distributed filesystem whose storage is replicated among many servers (for fault tolerance, availability, and performance) and then store a git repo into /path/to/repo.git.

If I clone from this, so that my origin is "file:///path/to/repo.git", is that then a decentralized git remote?

love the discussion here - I'ld like to provide a bit of clarity on the architecture for context.

The `dgit://` protocol that gets registered with the `git-remote-dgit` helper is not in itself a remote resource, but rather a deterministic identity that has been registered with the Tupelo DLT (zonotope provided good details on this below). Therefore the ownership as well as the current state (branches, tags, maybe PRs in the future :), etc) of the remote repo is decentralized away from a single entity (aka GitHub/GitLab) - as the owner, you fully control it, nobody else can modify it - not even the dgit team.

The storage part of your dgit remote is much more on the "distributed" side. We chose sia's skynet because we think it's a great fit for this. However, the actual objects of the repo could be stored anywhere, S3, IPFS, exchanged over bittorrent, or even your local raspberry pi. Regardless of where the git objects live, there still remains a single, trusted, distributed index of your repository on the Tupelo DLT.

It costs "money" (Siacoin tokens) to store data on Sia IIRC, but I don't see any mention on the GitHub repo or elsewhere in this thread about having to pay anything to use this. Is Skynet effectively free?

It is free for the time being. But you point out a good reason why we built dgit to support other storage backends.

In most contexts, "IPFS" really just means "self-hosted", because it only works if someone who actually cares about the file (in practice, just the author typically) is seeding it.

For Sia/Skynet, the Sia network is doing the hosting, meaning the file has high uptime even if the uploader does not, and even if nobody else chooses to pin/seed the file.

You might want to consider collaborating on building out Pagure[0], which is a git forge that enables decentralized development.

Pagure supports submitting pull requests with Git repos on any server (regardless of whether it's running Pagure or not) with its remote pull requests feature. Issues, docs, and pull request metadata are all stored as git repos using JSON files as data, making it easy and portable to other Pagure instances and easy to convert for any other system. This makes it an excellent base for building decentralized development workflows with the oft-expected Git forge interface model. Since all the stuff is in Git repos, they could all be backed by dgit, for example. :)

[0]: https://pagure.io/pagure

> a decentralized git remote

How is the git remote you provide "decentralized" as compared with the remote provided by GitHub?

Anyone can run a Sia node, and anyone will (eventually) be able to run a Tupelo node, but only GitHub manages GitHub. It's decentralized because these networks were designed not to be managed by a single entity, unlike GitHub.

anyone who is running git is _already_ running a distributed git node which is not managed by a single central entity.

there are many _many_ web based git apps (many of which provide github like features) that anyone can, in theory, set up.

why would I want to set up a sia node instead?

I guess that the dgit remote URL specifies the sia network as a whole, without explicitly pointing to a specific node. When new nodes join, you don't have to change the URL of the remote.

But if you want to use normal remotes as a distributed network, and then a new node joins, you have to add a new remote with a new URL before you can use it.

Perhaps the Sia node gives you access to a "master" repo. Aka a single entity that represents a centralized repo, but hosted on a decentralized system.

Not sure though.

There's no such thing as a "master" repo in Git. "origin" is the default name for a cloned remote, and the convention used for adding a default remote, but nothing distinguishes the origin remote from any other remote. It isn't a master remote other than being default.

One could add a Git remote (and name it "origin") that's backed by a decentralised system currently: e.g. one could do:

  git remote add origin ./some/ipfs/mount

> There's no such thing as a "master" repo in Git. "origin" is the default name for a cloned remote, and the convention used for adding a default remote, but nothing distinguishes the origin remote from any other remote. It isn't a master remote other than being default.

You're over thinking my example. As I said, by "master" repo I meant the centralized repo that many _(most?)_ git users have, somewhere. Aka Github, Gitlab, etc.

You're conflating Git's design with users patterns of centralization and distributed teams. Notably that often there is a single, centralized branch(es) on a single repo that everyone merges to.

This centralization could be distributed. It sounds trippy, but as you put it if your origin is on IPFS, you effectively have a single, mutable repo instance hosted on a decentralized platform (IPFS).

My guess is that is what Dgit is. A single mutable instance, hosted on a decentralized platform (Sia).

So back to my example, many people do indeed have a "master"/centralized repo. It's not a part of git, it's a user convention on top of git. But I didn't say it was in git.

> hosted on a decentralized platform (IPFS)

What's the difference between this kind of "decentralized platform" and GitHub?

There was that thing a few months* back about Iranian software devs losing GitHub access due to the newer sanctions.

Presumably if you were to host on IPFS it would be more difficult for people to cut you off like that.

What's the practical difference between you pinning your own IPFS node and just self hosting? Pass.

* Years? Time is weird at the moment.

Theoretically the difference is that it is.. well, distributed. Aka any benefits you'd attribute to having a file on IPFS over Github could apply here.

Which is to say, if Github goes down, gets blocked, blocks your team, etc; your team can still push to the "centralized" repo, etc etc.

> if Github goes down, gets blocked, blocks your team, etc; your team can still push to the "centralized" repo, etc etc.

But as others have pointed out, I can set up such a backup anywhere. What is the advantage of this particular way of doing it?

You can set up a backup Github? So you mean, you'd run like, an in-sync Gitea or Gitlab, and when there is an issue with Github your team and deployment methods (CI/etc) would automatically switch to that backup (temporarily or permanently)? That seems.. convoluted.

I feel like I'm repeating myself, but the advantage is being distributed. Distributed systems usually aim to be resistant to these types of issues. I'm a bit confused at your question. Are you asking why being distributed is good? Or are you asking why being distributed is good in this case?

If you're asking the former, I don't think this convo is in scope of explaining distributed vs centralized.

If you're asking the latter, why would this case be any different than any other case? You have a centralized entity and a while bunch of community and tooling built around it. Why wouldn't you prefer that out of control of a single entity?

The very argument against centralization is built into the foundation of Git itself. Yet you're asking why someone might not want centralization for the rest of the non-Git components[1]?

[1]: "non-Git components", While the source code is tracked via Git, I just mean the centralized entity your CI/CD/etc is hooked up to is it's own thing.

> You can set up a backup Github?

I can put a cloned backup of a git repository anywhere I want. (I do this with all my git repos; I certainly don't depend on Github--or Gitlab, or anyone else--to keep the only copies that aren't on my local machine.) I can't back up the other data that Github stores (issues, wiki, etc.), but that's because Github gives no way to download it, not because Github is "centralized". Github could provide the capability to download this data if it wanted to, without changing its underlying storage model at all. It just doesn't want to.

> when there is an issue with Github your team and deployment methods (CI/etc) would automatically switch to that backup

If what you are providing is "better failover performance than Github", that's fine (although I would want to see your actual uptime numbers before I'd believe any such claim), but that's not what "decentralized" means.

> I feel like I'm repeating myself, but the advantage is being distributed. Distributed systems usually aim to be resistant to these types of issues.

Sure, but unless you think Github literally stores all of their data on one single server machine, then Github itself is "distributed" in the same sense.

> I'm a bit confused at your question. Are you asking why being distributed is good?

No, I'm asking why you keep on saying "decentralized" when it turns out that what you mean is "distributed" (and when even "distributed" isn't the real issue--see below). They're not the same thing. You keep saying "decentralized is better than Github", but benefits like better failover performance come from being distributed, and Github is distributed, as I just pointed out above.

> you're asking why someone might not want centralization for the rest of the non-Git components[1]?

If your argument is "the rest of the non-Git components shouldn't be locked up in Github's data centers where nobody else can download them", that's fine, but it has nothing whatever to do with "decentralized" or even "distributed". You're just saying that all those other things should be in a freely accessible data store that can be cloned and replicated and accessed just like the git repo for the code itself. Sure, that's fine, I agree. But all this talk about "decentralized" has nothing to do with that point; it just obfuscates it.

I'm confused; you object with my use of decentralized, but then argue (it seems) that distributed also does not fit the bill.

So.. I think we have a fundamental failure in communication here. It's difficult to discuss decentralization or distributed designs when, well, we can't even refer to the platforms that advertise themselves as such. Ie, I did not make this terminology. The platform we're discussing literally advertises itself as such in this title.

I'm not sure what's even being discussed anymore.

I'm not trying to "discuss decentralized or distributed designs". I'm trying to get somebody to explain to me why I should prefer something like dgit to GitHub on the basis of it being "decentralized". So far the answers I've gotten are:

(A) It's decentralized! Woohoo! (I'm paraphrasing.) Which tells me nothing about why "decentralized" is better than GitHub.

(B) It's distributed! Woohoo! (Again, I'm paraphrasing.) Which, again, tells me nothing about why "distributed" is better than GitHub. Not to mention that "distributed" is not the same thing as "decentralized", so if I'm supposed to prefer something like dgit because it's "distributed", the goal posts have been moved.

(C) If GitHub goes down, you can automatically switch to pushing to <other place>. Which amounts to "this solution has better failover than GitHub", which, as I noted, (a) might or might not be true (I'd want to see your uptime numbers), and (b) has nothing whatever to do with <other place> being "decentralized" or "distributed" since "decentralized" has nothing to do with failover reliability, and GitHub is just as "distributed" as <other place>.

(D) You can keep all the "non-git components" in <other place> along with the git repo. Which might be an advantage if it's easier to get them from <other place> than from GitHub, but, again, has nothing to do with <other place> being "decentralized" or "distributed", it just has to do with how accessible <other place> makes the "non-git components" as compared with GitHub.

Is there an (E)?

Github has an API, so downloading issues of your repos isn't that difficult either, although they could for sure make it easier.

GitHub's source code is open, so anyone can run a GitHub node too. The node that GitHub itself runs became a centralized place for people to share development because of network effects, not because anything in the code itself prevents more than one node from existing.

GitHub is not open-source, though alternatives like GitLab and Gitea are.

If you make your own GitHub "node", it won't have any of the information GitHub.com has. If I'm understanding correctly, the idea with this is every "node" has a portion of a canonical truth.

Not sure that would still get people to use it, but at least more feasible than making your own GitHub instance in relation to growing network effect

Did you mean that Git's source code is open?

do you provide advantages to ssb?

dgit dev here: first, let me say git-ssb is a great project as well!

We see a few distinct advantages:

- Using dgit doesn't require running a "node" of any kind (aka an SSB peer). This is possible because of the unique architecture of the Tupelo DLT: https://github.com/quorumcontrol/tupelo.

- Because of ^, installing and adding a dgit remote to your existing workflow is super easy.

- The storage in dgit is separated from the ownership of the repo. This means you can distribute the actual git objects across any storage system you would like, see my comment to cfstras below.

ssb is a few years ahead here - having a web ui and more robust suite of collaboration tools is great, but we're hoping to add those features down the road as well! Another unique advantage is the Tupelo js client (https://github.com/QuorumControl/tupelo-wasm-sdk) can run fully in the browser, meaning a fully decentralized UI is possible!

> Using dgit doesn't require running a "node" of any kind (aka an SSB peer)

I'm not sure that I understand this, could you add details? The only interpreteation that makes sense is "you don't have to run any software on your computer", but that would suggest that the storage is centralized on some internet services.

Happy to chat about SSB any time. Good luck on this project!

In the default setup, storage is provided by sia skynet, so on push all git objects are distributed across the sia network and are simply retrieved when you do a git fetch / clone from a dgit url. But as mentioned, thats configurable and could be any other storage system.

However, even with distributed storage, you still need to know what makes up the git index on a given git url (aka dgit://quorumcontrol/tupelo) - this is where the Tupelo DLT comes in. When you do a push, the Tupelo network verifies the request and updates the refs of that repo.

Both of these are just basic tcp/https requests, so there truly is no long running daemon on your machine required in order to push or fetch dgit repos.

> Using dgit doesn't require running a "node" of any kind (aka an SSB peer)

Can you expand on this? An SSB "peer" is entirely local, so conceptually similar to a `.git` directory.

I'm struggling to understand how a decentralised system can operate without peers.

(I clicked through on the Tupelo link, but from what I can glean from the README, a Chain Tree sounds very similar to an SSB peer. Just trying to understand the distinction)

Also: Does this work offline, or on just a LAN? To be completely candid, I feel very strongly that we should be moving away from decentralized protocol that require anything more than two friends and a wifi network.

Well the basics underneath Tupelo are cryptographic signatures and an immutable chain we use call a ChainTree (which is per repo), so in theory this could work between two people so long as you trusted each other's signatures. Using the Tupelo network provides trust for the specific name, as well as prevents conflicting updates to the same repo.

It's a very interesting idea though, would love to hear more. Feel free to drop by our gitter: https://gitter.im/quorumcontrol-dgit/community

I'm a little confused. Isn't git already decentralized? Like, most of the time there one true repo everyone else grabs from, but it doesn't have to be. What am I missing?

Edit: oooooh, this takes the centralization away from GitHub. I feel like that's non obvious from the name, but I could be wrong.

git is decentralized, but most git remotes are not. This is a decentralized git remote that gets the best of both worlds. It's a _decentralized_ "one true repo".

With three simple steps you can create a decentralized mirror of your existing github project. All changes will be automatically propogated to the mirror version and the git services you depend on will be there when you need them.

Isn’t this more about replication / mirroring then than about decentralization?

Git is decentralized like the web is decentralized. The architecture supports it but the infrastructure for using it in a truly decentralized way is severely lacking compared to the options for using it in a more centralized way.

It is very hard to understand what dgit really does when its using terminology (decentralized git) clearly incorrectly.

I actually do p2p syncing of git repos almost every day, with git-annex's sync feature[1].

It is pretty clever, when you sync, it will push the current branch to every eligible remote into an unchecked out branch (master -> remote/synced/master), which gets merged when that peer syncs (or there can be a daemon that does it for you on the remote).

I use git for filesets and wikis, so true p2p is really helpful. I love seeing new stuff that can be done with git.

[1]: https://git-annex.branchable.com/walkthrough/syncing/

The description says that my repo is uploaded to the "decentralized service Skynet" -- But I can't seem to figure out how one would participate in hosting this service. Does only Sia, the company, host nodes for Skynet? Or is this a plot to increase the visibility of Siacoin?

dgit dev here. Good question. For starters, dgit is not 100% tied to Sia, it's just a great decentralized file hosting service so we started with it as the first default.

But to answer your question, anyone can host Sia nodes. Here are their docs on that: https://support.sia.tech/category/0OpBuOHIVD-hosting

Thanks for the info.

Is there a simple way right now to use another storage backend than Sia? E.g. If I have a group of people that want to participate in hosting, but only want to host data from the people in the group?

Essentially, can you use dgit to store a repo decentralised without having to set up git synchronisation tools, but while still having control over the hosting infrastructure?

hey cfstras, another dgit dev here. Each repo has a storage adapter specified, so any storage infrastructure is possible, but at this moment we've only written two: sia and on tupelo network.

We've had a lot of experience with running IPFS nodes internally, that might be a great option for you in this case? We are in early stages and will add more adapters as feedback informs.

Also, its open source and we would love PRs :) - the storage interface is pretty straight forward:


Here is an example within dgit: https://github.com/quorumcontrol/dgit/blob/master/storage/ch...

Curious why you guys didn’t got with IPFS by default? Is it related to how you wanted to incentivize backups by default, or was there a more technical reason?

Thanks for the example, looks easy enough :)

What makes Skynet more decentralized than GitHub? Guess I'm confused.

Anyone can host a Sia node, but only GitHub can manage GitHub: https://support.sia.tech/category/0OpBuOHIVD-hosting

It would be nice for Sia to be useful for something.

Not being able to pull from or push to the git remote is pretty low on the list of problems when GitHub goes down. If that’s the main problem I’ll just set up gitolite on a server (and I do) and call it a day.

All the issue/PR discussions and CI/CD are the real problems, and this doesn’t seem to help at all, so good luck collaborating without changing your workflow when GitHub is down.

Our setup process is a much lighter lift than standing up a server and installing gitolite on it.

But yes, if GitHub is down, then your workflow is going to change. We're hoping to close that gap down the road, but having a way to continue pushing and pulling with collaborators with a very quick setup seemed compelling to us. Not to mention the benefits that decentralization itself brings.

The vast majority of git users tend to agree on one "origin" remote and 99-100% of their pushes and pulls are to/from that remote. So git, in practice, tends to be centralized when it comes time to collaborate with others. We're trying to re-decentralize that aspect while accommodating the convenient workflows we're all used to.

> Our setup process is a much lighter lift than standing up a server and installing gitolite on it.

I think a counter to that is only one person needs to set up the additional git remote, compared to everyone having to install additional software to use dgit.

Totally fair. We definitely want to make that install process very simple and fast for as many people as we can. And it only needs to be done once per machine. :)

Shameless plug regarding the issues: https://github.com/MichaelMure/git-bug

Cool! We'll look into that.

I think that's a different dgit

Tacking on to this subthread: another "dgit" that i'm familiar with is https://wiki.debian.org/DgitFAQ

Two different dgits from github? wow!

the previous one is from github, the new one is hosted on github but not from github.

I obviously needed to read this more closely.

Eventually everything will be of, by and for the github so you're probably fine.

So Sia's Skynet is used for hosting the blobs and Tupelo is used for pointers to the blobs & metadata?


Looks like many commenters are clearly struggling with "decentralized like bitcoin" vs. "decentralized like git" distinction

What problem/threat is dgit meant to solve/guard against? Is it:

a) the people problem of coworkers not knowing the difference between a repository (e.g. on github), a remote (the name "origin") or even a branch (the label "master")

b) github.com turning evil or going away forever

c) myhost.example.com failing for hours or days

I don’t see any mention of issue/wiki support? Without that my GitHub workflow would be dramatically changed.

Absolutely. Our strategy is to give you a decentralized on-ramp by starting with GitHub's central value proposition of the "one true git remote" that everyone can share. We made a GitHub action (https://github.com/quorumcontrol/dgit-github-action) to automatically mirror GitHub pushes on dgit so you can continue using GitHub's other tools (like issues and wikis) until dgit offers a compelling alternative for those.

This is cool, but is anyone aware of a system that is like GitHub/GitLab, but PR/MRs and commits are actually voted on by the community rather than a handful of maintainers? Something like the way DAOs are supposed to work, but applied to code?

Yeah, it's still early days, but that's part of the eventual plan for dgit. We're starting with decentralized git remotes, but we will eventually enable DAO like functionality like paying in to repos to support them, voting on things like feature/pull requests, and automatic payouts when an independent dev gets one of their pull requests merged.

How do you maintain identity in dgit? i.e how do I claim my user/org name in this decentralized manner? It isn't clear how it works, can you elaborate on this part a little bit?

Sure. First, a little background. The Tupelo distributed ledger manages repo identities and permissions, while Sia persists the actual git objects.

Tupelo validates transactions against individual ChainTrees, and you can think of a ChainTree as an independent ledger (or blockchain) that represents the state of one independent real world (or digital) object. In this case, that object is a git repository.

Tupelo only allows the "owner" of a ChainTree to make modifications to that ChainTree (such as updating the current HEAD), and ownership is determined by control of a private key.

Each ChainTree has a unique DID (decentralized identifier) that is uniquely determined by the key that first created it, and the controller of that key is the initial owner. Tupelo also has a transaction type that allows the current owner to transfer a ChainTree to a different key maintained by the new owner, but its DID stays the same after that transfer.

Tupelo uses a strategy similar to the WarpWallet[1] to manage identities. We can deterministically create a private key from a string like a repo name, and use that private key to create a ChainTree with a DID derived from that key (and hence, the DID is derived from the string). This gives us a mapping from repos to Tupelo ChainTrees. Since the initial "private" key is deterministically derived from the repo name, that initial private key is insecure. The second step of the repo registration process is to submit a ChainTree transaction to transfer ownership of the repo ChainTree to a secure private key. That way, only the controller of the secure private key can make changes to the repo ChainTree, even if anyone can reproduce the original key from the repo name.

[1]: https://keybase.io/warp/warp_1.0.9_SHA256_a2067491ab582bde77...

So, two things I see here, 1) only one person is allowed to push changes to the dgit repo then? How are you supposed to collaborate on a remote dgit repo, in that manner it is very different than a github remote, or am I missing something? and 2) Someone can hoard all popular repos and make it seem that they are original versions, pointing people to clone them from dgit, when in fact they could have been modified with malicious code? How do you address these issues?

You could just use pgp to sign your commits as a starting point

So who pays for this? It says it uses Sia as backend, but I know sia costs money. Is this currently being paid for by some org?

From a quick look, I believe "Dgit is a Git remote helper for decentralized [...]" would be a more accurate title.

The author has a very unique understanding of online security given the readme starts out with "just try it by running this unknown binary".

A little disappointing they took the name already in use by another project. Dgit is a system exposing the Debian archive as Git, allowing to upload packages with a git push.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact