Hacker News new | past | comments | ask | show | jobs | submit login
Pijul: a distributed version control system, written in Rust (2019) (pijul.com)
238 points by tosh 12 days ago | hide | past | web | favorite | 122 comments





FWIW, this has gotten a certain amount of attention on HN over the years since it was begun but it's instantly noticeable that patches tailed off and stopped last June. Naturally that raises the question of "is this even still alive", and last month the starter of the project, Pierre Meunier, posted a response to a query about that indicating a major rewrite is in progress:

----

https://discourse.pijul.org/t/is-this-project-still-active-y...

>Is Pijul still under active development?

>That’s right, the last patches are pretty old. A few things have happened since then, but they’re not public yet. This has included (1) thinking about new algorithms to solve our remaining problems and (2) thinking about the future and sustainability of the project, including its funding.

>Another thing going on is a major rewrite of libpijul. This project was our first Rust project, was started in a pretty early version of Rust, on an incomplete theory that was only finalised in 2019. Until then it wasn’t clear what functions and types we wanted to expose, and how to document them.

>I’m currently reorganising everything, in particular using a lot more proc-macros, and making it more general so that it can run on various backends (well, actually different variants of Sanakirja).

>We will hopefully have news pretty soon. I’d love to fix all the bugs in the Nest discussions, my hope is that many of them will simply disappear with the rewrite.

----

Of course, everyone here probably knows about the risks of major rewrites, though granted that's lessoned for something that began in the research phase and never got close to a 1.0 and major uptake. It is a pretty interesting project and I'd love to see it go well, so hopefully it can make the series of transitions to something more mature and self-sustaining! For now though it looks like things will be in a bit of stasis, and everyone will have to wait and see what emerges on the other side.


Rust or any procedural language seems not to be the most apt lang to do a first implementation / rewrite of a patch theory. Patches themselves have already been generalized to paths using HOTT ref: https://www.cs.cmu.edu/~rwh/papers/htpt/paper.pdf. The takeaway from https://arxiv.org/abs/1311.3903 is the use of commutation to allow for push out while the HOT ref discusses pseudocommutation regarding merges. See the footnote about commutation being inaccurate term regarding darts model. With the addition of ojo "ghost" deletes and pseudo edges, a formal implementation should provide a non-exponential model/algorithm amenable for optimization.

I tried to read that HOTT paper. I should preface this by saying that my homotopy type theory knowledge is basically 0 and my topology/homotopy/algebra knowledge is limited and rusty. Here are some notes I took as I read it:

1. I guess I’m starting off not understanding the relevance of this paper to distributed version control. If you have a type R for states of your repository and patches are then paths of type a =_R b, with certain properties of patches being certain path homotopies, surely many functions giving you a path are somewhat boring (because a path should exist between any two repositories). But maybe the point will be proofs about the equality and construction of those paths. Or maybe it is to get functoriality for free.

2. In this formalisation patches are paths so they have to form a groupoid but if you want a distributed version control system, I think you don’t want inverses of patches. You don’t want p . p^-1 = 1 because you want properties like repo state = (in some sense) the composition of all patches in a set, and if patches form a groupoid then the sets {p, p^-1} and {p, p^-1, p} are equal (ie there isn’t a way to distinguish “apply p” from “unrevert p” and you need some way to distinguish them if you want to be distributed. But maybe there is some way to work around this.

3. How to think about apd? If I think of B like a fibration of A then I guess it makes sense but it seems weaker than that. I don’t really understand why PathOver should depend on p (and not f too)

4. Perhaps the paper aims to construct a patch theory where any two patches are equal (which I guess means homotopic) under their laws. My intuition of a patch would be that the answer to this is yes but I’m not confident. Maybe that isn’t what the paper is about.

5. Well I’ve heard a bunch about the univalence axiom so I’m glad to finally get a definition of it.

6. In 2.4.1 the definition of reversing seems wrong to me. It seems like it only gets the right types because loop has the same type as refl. It feels like it’s actually splitting up a path into its individual loops, reversing them, then putting them together in the same order. On the other hand I think I candefine a type like a : I ; b : I ; i : a = b, and then write rev a = b; rev b = a; ap rev i = ! i. And I suppose one could define a circle type with two points and two paths and it would look ok. So maybe the loop example doesn’t matter because it’s all equivalent no matter how it feels.

7. Describing patch equivalence as equivalent affect on the repo isn’t clear to me: is the repository just the (visible) contents of the files or does it also include hidden state related to deletions/merges/applied patches. I’m guessing this paper needs it to be the former.

8. It would have been nice if the paper stuck to one composition order but whatever

9. My concerns about reverting are slightly allayed by the paragraph about being careful with contexts but only slightly because of the following paragraph about coincidence. I guess invert ability is fine so long as no one cannot actually commit a patch and its inverse

10. Perhaps HTT gives a better framework for defining and proving things about pseudocommutation. Maybe that is the point of this paper. I would guess (without having read the paper) that that isn’t particularly well formalised in category theory (but formalising a merge as a push out does seem foot to me).

11. Ah so inverses + pseudocommutation laws gives you merges. That seems pretty nice. Does this let one define pseudocommutation in the categorical approach? I guess if you have inverses and want the pseudocommutation of A -f-> B -g-> C, you compute the push out of the span A <-!f- B -g-> C as A -h-> D <-k- C and then say the pseudocommutation is (h,!k). But my (vague) understanding of that theory is that it wouldn’t have nontrivial inverses

12. At the end of sec 3, I feel like I’m left wanting pseudocommutation better defined. They say how their definition falls short but they don’t say they will improve it. Currently it feels like they could go on to prove a bunch of nice properties about version control systems which are allowed to resolve merges by reverting the conflicting packages. Fingers crossed everything will make sense soon.

13. The “topological meaning” paragraph seems weirdly written. If the reader knows what a universal cover is then it should say “this is the universal cover of the circle” and then provide some actual topological colour to the description. And if the reader doesn’t know what a universal cover is then this information won’t add anything. Might as well have said “this is called the fubar of the circle” and had the same effect.

14. I would have rather had a paragraph about universality than a weird one-liner about fundamental groups

15. In sec 4.2 their patch types are all the same. Are we skipping some difficulties because of this?

16. In sec 5, why do the patches not first require that the document contains the strings they care about? I guess that isn’t necessary.

17. Why no homotopy that ! s<>t@i = s<>t@i or that <> is symmetric? Maybe it’s not used. But it feels like the authors are admitting a more complex universal cover than they intend from the semantics they give by doing that.

18. I’m left somewhat wondering what the point is. I would say that the motto of the category theory paper which I haven’t read but which pijul is somewhat related to would be “patches form a category and merges are push outs. You can use category theory to extend the category of patches into one where all push outs exist.” For this paper I guess it’s something like “homotopy type theory gives you some tools for working with groupoids. Here it is applied to patches.” But otherwise I’m not really sure what the relevance is to the implementation or correctness of a dvcs is. They don’t really provide a tool to show that your pseudocommutation function does what it’s supposed to do and so you don’t know that your merges work either. I guess if you could prove that your merges are push outs, does that imply the pseudocommutation is correct? With the category theory paper it at least gives you “correct” merges and a good idea for how to think about patches.


Regarding 2, why do you need the ability to distinguish these cases? After all, those two repositories are in the same state, which is that p is applied. The only reason you'd want to distinguish is if you want to preserve history. What about the distributed scenario makes it special?

Because a DVCS has to be distributed. If Alice writes patch p, Bob reverts it, and Charlie unreverts it, you might (without contexts) have three people with different states of the repo:

  Alice: {p}
  Bob: {p, !p} (or do you write this as {}?)
  Charlie: {p, !p, p} = {p, !p}? So I guess this has to be {p}
Suppose these people are pushing and merge if in patches to the master. Then it has no way to distinguish these scenarios:

  A:
  1. Alice pushes; master applies p
  2. Bob pulls, reverts p, pushes. Master reverts p
  3. Charlie pulls, reapplies, pushes. Master does ??

  B
  1. Alice pushes; master applies p
  2. Charlie pulls
  3. Bob pulls and reverts; master reverts p
  4. Charlie pushes
In both cases, the state of Charlie’s repo is the same: {p}, but in case A, p should be applied and in case B, it should not. Therefore there needs to be some additional context (or no applying a patch and it’s inverse).

In other words, this problem is basically having a distributed set with addition, deletion, and readdition of elements, which you can’t have in a nice way. A distributed set that you can only add to, on the other hand, is a lot more easy.


CRDTs have to tackle this sort of problem as well.

Right. And a CRDT for a set you can add and remove from is either hard or weird for this reason.

Also I think you don’t actually want your repo to be a CRDT because a CRDT resolves all conflicts and that would mean merge conflicts get resolved in an arbitrary way (leading to unexpected results and bad code). Maybe you could say that merge conflicts don’t count as conflicts in the CRDT sense of the word but that just feels like you’re abusing the notion of what a CRDT is.


I don't know how good of an idea CRDT would be here, but from the pijul perspective the CRDT layer should not possibly create a conflict, in the same way a `git fetch` should never cause a conflict (assuming well behaved hashes).

The CRDT layer would produce a set of patches in an unmergeable state and then pijul would have to fix it; just like git.


1. "not understanding the relevance of this paper to distributed version control"; 18. "what the point is.": On page 11: Our contribution [..] is to present patch theory in a categorical setting that is also a programming formalism, so it directly leads to an implementation.

2.: Patches are (homotopically) equivalent to paths and form <bold>inductive</bold> groupoids. (pg 11) "In homotopy type theory the path space of every type is symmetric, and to fit patch theories into this symmetric setting, we either considered a language where all patches were naturally total bijections on any repository (Section 4 and 5), or used types to restrict patches to repositories where they are bijections" Inverses of patches arise due to these considerations. The use of the term distributed seems to be used in the sense of dependence that arises with pseudocommutation (pcom).

3. It might be inferred that apd = Action on Paths for Dependent function (f) <-> Action on inductive groupoids. (https://ncatlab.org/nlab/show/fiber+sequence#ActionGroupoid) So, a fiber sequence? Not sure where it is stated that PathOver does not depend on f (along with p and B)?

7. section 6 uses histories that are a quotient higher inductive type to equate sequences of patches, which result in the same changes to a file.

11. "Pseudocommutation gives us a merge operation that is well-defined, symmetric [..], and reunites the two branches of a span, but this is not enough to guarantee that we get the merges that we might expect" ..

12. "the correctness of pseudocommutation follows from the induction principles for paths that are proved for each type from the basic induction principles for the higher inductive types—roughly analogously to how, for the natural numbers, course-of-values induction is derived from mathematical induction. Moreover, proving these induction principles is sometimes a significant mathematical theorem. In homotopy theory, it is called calculating the homotopy groups of a space, and even for spaces as simple as the spheres some homotopy groups are unknown."

15.; 16. see section 6 for a patch language with add and remove, not just swap. Unfortunately, the authors were unable to provide a proof of the correctness of the proposed pcom (p,q) definition for it. If it is correct, it would be a more general patch language than the merges as pushout with ojo pseudo-edge proposal along with its assumptions / restrictions on deletions that computes patch application by having [multiple] Tarjan (2-SAT) run(s). Even so, not sure how efficiently the "length-2 suffix" pcom would work out. Reminder that dependency resolution is NP complete.

17. "In homotopy type theory the path space of every type is symmetric"

18. If merges could be assumed to be push outs or proved to be that would remove the ambiguity arising from pcom -> imply its correctness. pcom is defined for each patch language. Section 6.4 gives an outline on how to use patch laws to prove the correctness of the assumed pcom definition. The references also provide further details [18], [19], etc. The "category" paper ref treats merge as a pushout not as pcom based. pcom may be seen as a generalization of merges as pushouts. For another perspective on what is the same mathematical structure regarding Darcs see the Jacobson [15] reference that interprets patches as inverse semigroups, which are essentially partial bijections. Additionally, the Ehresmann-Schein-Nambooripad theorem states that inverse semigroups are equivalent to inductive groupoids.

NB I am not an author of any of the papers.


What does HOT stand for? It’s surely not higher order types but I can’t think of anything beginning with homotopy.

Abbreviation for Homotopy Type: https://homotopytypetheory.org/book/

> risks of major rewrites

The risks are mostly in relation to your existing users. As this was more of a prototype, with no real user base, in my mind it fits more with the "throw the first away" line of thinking.


"v2 major rewrite, to make the code clean and stuff" is where dumb projects go to die.

(There's even a scientific name for this phenomenon.)


The combination of

> (2) thinking about the future and sustainability of the project, including its funding.

and the rewrite apparently not happening in the open casts some doubt as to whether the project will remain fully open source.


> the rewrite apparently not happening in the open

I'm the author of the rewrite. It will almost certainly be open source when it's done, but it is not open right now, because:

- the license of the current version has not been properly observed by other projects in the past.

- this project keeps getting bashed for "corrupting repositories" and "changing formats constantly", even though it is announced as experimental everywhere (also, some corruption claims have been wrong in the past, and we have released converters every time the formats have changed).

Therefore, allowing people to test the new version as it is being debugged would probably not improve this situation.


Thank you, that's good to hear! Your arguments are valid.

> […] the rewrite apparently not happening in the open […]

That's a major red flag for a FOSS DVCS that aims to exist as a valid alternative for existing tools like git. Fine for a hobby project, portfolio project, or experiment of course.



From https://pijul.org/manual/why_pijul.html#patch-commutation:

> In Git, importing just a few commits from another branch is called "cherry-picking", and involves changing the identity (the hash) of those commits. This usually works the first time. However, when done again, as the maintainer of a stable branch probably wants to do, this often causes counter-intuitive conflicts.

If you're doing this in git to apply a fix to a master branch and multiple release branches, you're doing it wrong. You should be basing the patches off the merge-base of master and the oldest release branch you wish to land the change into.

If you work this way, you can merge the patch cleanly into all target branches, and this new patch becomes the new merge-base. So you can actually continue to apply new patches without any conflicts.


> If you're doing this [..], you're doing it wrong. [..]

This is not 100% true (see my answer to the second part below), but the point of Pijul is that you don't need to "do things right" to avoid the potential problems of your version control system. In the end, a version control system is just a tool, not a way of life or a work methodology.

Sure, there are example projects where the Git way is the best, but they are quite rare (one example is Software Heritage [1]).

> If you work this way, you can merge the patch cleanly into all target branches, and this new patch becomes the new merge-base. So you can actually continue to apply new patches without any conflicts.

This is not actually true, and is the whole point of Pijul. In Git, your suggestion works only as long as (1) there is never any conflict, else you need "git rerere", and (2) 3-way merge never runs into a non-associative merge [2].

[1] https://www.softwareheritage.org/ [2] https://tahoe-lafs.org/~zooko/badmerge/simple.html


Regarding that second link about the bad merge, I think there is a deeper problem at hand. Namely that we still work with the textual representation of a program as the “source of truth”, but each time we save and commit this way, a lot of information is lost.

What I would like is instead of representing source code as text to have a binary representation of the AST of the software that I am writing where, crucially, all edit operations are recorded into history.

Perhaps such a binary format could even be slightly faster to work with in terms of time taken for editor and compiler to parse it?

And not only record edit operations, but differentiate between instancing and copying a node, so that we can express whether code that we duplicate should be edited everywhere automatically in the codebase when we change it, or if they should be allowed to diverge. (Both of which make sense in different contexts and would be great to actually express in the source representation and have the IDE read and understand.)

Furthermore, such a representation could allow for more freedom when positioning elements of the code because we aren’t constrained to just lines and lines of text with characters next to each other and lines above and below each other.

Now, a lot of non-textual editors exist already but I haven’t seen any that I like and besides I don’t want another language, I want to write Rust code in such a way as outlined here.


This seems to be exactly what the Unison language [1] tries to do.

I actually have been thinking about how the concept could be implemented for Rust.

[1] https://news.ycombinator.com/item?id=22009912


> but the point of Pijul is that you don't need to "do things right" to avoid the potential problems of your version control system.

The criticism directed by the Pijul guys at git cherry-pick is actually not a problem at all. At best, the argument made by Pijul's backers is that Pijul implements this feature differently and in a way that arguably may be creating a problem where in Git there is none.


> The criticism directed by the Pijul guys at git cherry-pick is actually not a problem at all.

Maybe, maybe not, but it comes with actual arguments, whereas that comment doesn't.


There is a clear argument that you chose to ignore: cherry-picks are patently not a problem in Git, and the only problematic aspect is how some git users might create their own personal problems by developing misinformed workflows while ignoring any information on best practices. If you try to reinvent the wheel on your own, ignore any advice or info, and end up with a badly shaped polygonal thing then you simply can't pin the blame on your chisel.

A lot of the criticism git receive is about the stance that there is the git way and the wrong way.

We all are super sure that git has extremely efficient workflows available; it is just that maybe many people do not like them.

Git does not try to be a tool for everyone; it is very opinionated on how to best use it; this obviously leaves space for less opinionated tools or even differently opinionated tools to shine under specific use cases/scenario


Maybe I’m “doing it wrong,” but I have certainly experienced pain cherry-picking fixes and features between dev <-> release branches! Pijul seems to me like it could be a fundamentally better model for VCS, a jump similar to svn -> git.

I guess it's a matter of personal opinion but as a git and svn user I simply fail to see any relevance or value in Pijul's selling point regarding cherry picking. It feels that the argument for Pijul starts from a problem that never existed, and even so the selling points fail to present a case for the alternative.

And to be honest, if the worse pain point of git that the Pijul team addressed was git's cherry-picking, it looks like they failed to do any form of cursory homework on Git.

But hey, just my 0.02€. Everyone should use the tools that they prefer, and if someone has a problem with Git cherry-pick that they feel is better solved by switching tooling instead of reviewing the problems then have fun.


Let me try to present one problem relating to cherry-picking. I don't know Pijul so I'm not sure it illustrates its strength clearly, but at the very least it should address the idea that there is no problem with cherry-picking.

Using git terminology here.

Say you have a release branch and a master branch, and you normally fix bugs on the release branch and then merge the release branch into master.

Now it happens that a bug was fixed in master, and later it turns out that it's actually a critical thing that should be fixed on the release branch, too. So you cherry-pick that fix into the release branch.

Now the next time you merge from release into master, you attempt to merge this commit that was originally cherry-picked from master, so it's already present in master. But git doesn't know.

With some luck, git sees that the textual change is already present, but I believe that can also fail sometimes.

With Pijul, you would have applied the patch from the release branch onto the master branch, and Pijul would have recorded that, and on the next merge from release to master, Pijul knows to skip this patch.

Another place where Pijul helps is when cherry-picking the bugfix in the first place: Suppose the bugfix includes calling some function (procedure, method), but another (earlier) change on master has renamed that function. Then Pijul can alert you to this fact and tell you which other commit (patch) from master is needed to cleanly apply the bugfix.

Does this help?


I think you've failed to read the other answers I and others have provided.

But hey, everyone should read the answers they want, if someone wants to pick only the ones that are the easiest to dismiss then have fun.


> I think you've failed to read the other answers I and others have provided.

I read them and it was based on those comments, and also the half a dozen previous discussions on Pijul here at HN, that I formed my opinion.


> This is not actually true, and is the whole point of Pijul. In Git, your suggestion works only as long as (1) there is never any conflict, else you need "git rerere", and (2) 3-way merge never runs into a non-associative merge [2].

Sure, if the two branches have diverged around the location of the patch, there will need to be a merge resolution.

Wouldn’t Pijul require the same too?


First, even though Pijul has branches, patch commutation means that most Git branches are just regular patches in Pijul.

> Wouldn’t Pijul require the same too?

If you're talking about my point (1) above, the answer to your question is no, because in Pijul, conflicts are resolved by patches, so if a branch has a patch solving a conflict, the conflict cannot come back. So Pijul doesn't need "rerere".

If you're talking about my point (2), the answer is also no: non-associative merges happen for bad reasons in Git. By design, Pijul doesn't have them at all.


> If you're talking about my point (1) above, the answer to your question is no, because in Pijul, conflicts are resolved by patches, so if a branch has a patch solving a conflict, the conflict cannot come back. So Pijul doesn't need "rerere".

No, that’s not what I was asking. Conflicts don’t come back in git either, if you base the change off the merge-base.

As I said, if the area around the patch has diverged between the release and development branches, you will need to manually resolve the conflict as you merge from the patch branch (based off the merge-base). That’s nothing to do with git, these tools do not have semantic understanding of your business problem so obviously require human input.

My question was, in that case wouldn’t Pujil need assistance too?

Your reply: “conflicts are resolved by patches” - indicates there was a conflict that needed to be resolved by a human or not?


> My question was, in that case wouldn’t Pujil need assistance too?

> Your reply: “conflicts are resolved by patches” - indicates there was a conflict that needed to be resolved by a human or not?

Yes, conflicts need to be resolved by humans indeed (sorry for misunderstanding your question).


But you don't want your VCS to dictate your project management.

Maybe you want to apply the patch to the main development branch, let people try it there for a while, and then decide which maintained release branches should get the patch.


You can actually achieve this by making a new branch at the merge-base, cherry picking the fix from the development branch, then landing this branch into the release and development branch.

Git will record the parent commit correctly on both branches but apply no changeset to the development branch, while applying it to the release branch.

I admit the above is a bit much for most people.


It's still not ideal.

With git's DAG-based "patch model", I think what you'd really want to do is, after making the new branch and cherry-picking the fix, make that new commit a second parent of the original commit (so that it becomes a merge, but its tree doesn't change).

I think that would leave the graph with the best representation of what's going on (ie, the same thing as if you'd made the patch from the merge-base in the first place).

But git's hash-based system for identifying commits won't let you add new parents.


> I admit the above is a bit much for most people.

True, but it's not just the UI: this also requires discipline and constant synchronisation in a team. Pijul achieves the same without much discipline.


Some people avoid merges like plague because they hate their commit history looking like a train station. Then you cherry pick and/or rebase.

I never really understood this philosophy. Separating the graph structure from how you view it is the way to go I think. e.g. `git log --no-merges`.

Why would I want my history to be a graph in the first place?

A graph can be flatten, but a sequence/line can't be expended. More information seems better than less

I find that with `--first-parent`, my history looks reasonably linear, assuming the typical workflow of having each merge commit merging a topic branch into master.

Yeah that works, if people are disciplined (no merge origin/master into master ...). You don't see individual commits from feature branches though.

Yes. And code review should enforce that kind of discipline.

I am not generally interested in individual commits from feature branches though, but when I do I can easily reveal them when needed.


I'm glad this made the front page. I and other of my friends have tried a few times now with varying degrees of success.

For those of us pining for the days of darcs or wanting to escape the git monoculture, Pijul is really great.


What's the value proposition of Pijul? I mean, not being Git is not a good reason to convince anyone to adopt it as a VCS. A tool needs to actually be better at something and add tangible value where others may not have. What's the absolute best reason to convince anyone to adopt Pijul?

It is not at all about not being Git. The absolute best reason is that Pijul actually has the properties that people think Git has:

- in Pijul, patches are associative: pulling B and C together after A does the same as pulling just C after pulling A and B. Git doesn't have that property: sometimes diff3 randomly (and silently) decides to shuffle lines around.

- in Pijul, patches commute. Most Git users try to simulate that by rebasing branches, but (1) that can mean a lot of extra work for no fundamental reason and (2) Git runs the same clunky merge algorithms to decide how to do it.

- Pijul knows what a conflict is, whereas Git pretends to know, but then there's "git rerere".

- in Pijul, you can clone one subdirectory of a monorepo by just pulling the patches related to that directory. Git can do partial clones as well, but only with LFS and/or submodules, which are incredibly clunky and unnatural.


> patches are associative: pulling B and C together after A does the same as pulling just C after pulling A and B

that makes no sense to me. DARCS also claim this, but if you have the patches all changing the first line to a different value, obviously the last one will dictate the final value of the first line. Which is the same as git. in what world do you want to change orders of patches and not have the final state change?


Associativity is not about changing the order but about a different grouping:

    ( A B ) C = A ( B C )

If A, B and C are patches, you are applying C last on the left side and A last on the right.

You are wrong. Yes, on the left side B is applied to A, then C is applied to the result (AB). However, on the right side, C is applied to B, then the result (BC) is applied to A. A is still the left most patch in both cases. BC could depend on (the output context of) A so it wouldn't make sense to apply A to BC! BC is applied to A. A(BC)

So you are saying that applying patches is not commutative? I was assuming A(BC) == (BC)A. What makes you think that the result should be different depending on the order?

No one else actually answered you - its patch theory is more advanced than the patch theory employed in darcs, operating on "pushouts" of files rather than files, which can contain ambiguous lines - the site doesn't seem to link to the paper anymore and I don't have it at hand. The upshot of this is that rather than stopping the world for "conflicts" (a real terrible misnomer that made people think they were some kind of error state rather than an ordinary part of merging) until they're resolved, "conflicted" states (i.e., ambiguous states, which are not an error!) are just another type of valid state a repository can be in after applying a set of patches (and then the resolution is just a patch that applies to the "conflicted" state). Once you know about it, it's obviously the correct theory, and "conflicts" being treated as an exceptional state is obviously wrong. However, I think this will take a few decades to trickle down into practice.

About half a year ago, I tried to merge two forks of a single program which had diverged by several years. I tried a Git merge, got halfway through conflict resolution, stashed my current changes and reset my repository, and could not reapply my partial resolution. (The proper approach would've been to manually run `git rerere` without arguments to save the current partial resolution, but I didn't know that).

I redid the merge in Pijul. Pijul had a bug causing it to misread the filesystem's execute bit, and no amount of `pijul reset` would fix it. Pijul's merge conflict textual syntax was baffling as well. I think it was a stack which was pushed and popped by >>> and <<< markers, and anything under === was something I should probably remove. In the end, I did succeed in the merge, but reconciling the two projects didn't work out.

(FamiTracker was based off the MFC GUI library. It was forked to 0CC-FamiTracker, and another fork ported it to a MFC compatibility layer with a cross-platform Qt backend. The MFC compatibility layer didn't support all the functionality used by 0CC-FamiTracker.)


I suggest you check out git-imerge, it's a quite handy tool for making complex/big merges simple.

https://github.com/mhagger/git-imerge

> Reduce the pain of resolving merge conflicts to its unavoidable minimum, by finding and presenting the smallest possible conflicts: those between the changes introduced by one commit from each branch.

I'm not sure this would've helped in my scenario. famitracker had no public repo, and the Qt fork and 0cc-famitracker came from different Git repositories and were rooted in different subdirectories. I created a synthetic Git and Pijul history for the purpose of this merge.

But it might be helpful in other situations. I'll look into it.

> Allow a merge to be saved, tested, interrupted, published, and collaborated on while it is in progress.

This does seem useful.


Compared to git, it relies on patches rather than a branch approach.

I think it's more intuitive. Less opportunities to do silly things and some things that are punishing in git are not with patch-based version control.


> Compared to git, it relies on patches rather than a branch approach.

Is this even relevant at all? I mean, in the user's POV the role of a VCS is to CRUD changesets and branches. As long as the user can commit their changesets and audit the DAG of past commits, who cares how the system is implemented under the hood?


> in the user's POV the role of a VCS is to CRUD changesets and branches

That would be the role of a filesystem, and Git is a perfect distributed filesystem. A VCS also allows people to merge their changes and to detect and handle conflicts gracefully.

> who cares how the system is implemented under the hood?

Pijul is not at all about the underlying implementation of commits. Patch commutation is a radically different way of thinking about cooperation, and (1) much simpler and (2) has the potential to scale to much much larger repositories.


> That would be the role of a filesystem

No, the role of the file system is not to track how changesets are organiced into branches. That's the responsibility of a VCS like Git or Mercurial or Pijul. The VCS is the interface and whatever it does with the file system is an implementation detail that's abstracted away by the VCS's interface. Similarly, patches are only relevant as an external interface of the VCS, and one that users have no good reason to use.


> No, the role of the file system is not to track how changesets are organiced into branches.

Oops, we just got a little bit more feature requests for a VCS than "CRUD changesets and files".

But this isn't an answer to my feature request, which was the ability to merge changes.

> Similarly, patches are only relevant as an external interface of the VCS

Why? If they're relevant as the external interface, maybe this means they model the problem better than snapshots, and should therefore be used as the implementation.


Because a thing that programmers tend to call "design" directly impacts how a user is able to interact with the tooling, obviously? Are you seriously asking "How could a thing like 'core design of the system' ever impact how someone uses it?" or are you just trolling for the sake of it?

As a real example, I used to be the maintainer of a major project that had long running stable branches. This means I needed to often cherry pick bugfixes from one place to another, but cherry picks are often difficult because they can rely on prior changes you haven't yet picked. In a system like Git, you have to cherry pick things in chronological order in order to replicate each "snapshot" of the history that the original changes came from, so the merge algorithm can figure it out. What change do you start from? Who knows! You have to find the first change that can be merged properly, or by hand, and work your way from there. You effectively have to hold the hand of the tool in cases like this.

In systems like Darcs or Pijul, that does not happen. They are always able to keep track of dependencies between patches, and so a "cherry pick" naturally implies picking change A, and all dependencies of A that are not yet available. There is no notion of having to run multiple commands or whatever; it's completely transparent. From there you can either A) accept the dependent changes or B) do surgery, for example, if you need to more carefully backport things. A) is the common case in the vast majority of uses, in my experience. Where as in Git, after a long enough time, almost any uses of 'cherry pick' will immediately fail and require you to start digging and picking out historical changes -- in Darcs, it will keep working just fine, even for dozens or a hundred dependent changes you need. The default features of the tool and their UX matter a lot, here.

If you not only can't understand how different core design choices -- like the ones in Pijul or Darcs -- can impact how a user uses a tool, but even refuse to admit that a tool could ever be used any other way: that alone is a great indictment of Git's complete and total monoculture, and precisely why these tools need to keep existing. Then again, computer programmers love having stockholm syndrome and hate reading things, so maybe it doesn't matter.

Here is a good, short video explaining the theoretical underpinning of Darcs and how they impact the user (and an old side project, "Camp", that was planned to eventually become Darcs 3). It's over a decade old and just as relevant as it ever was: https://www.youtube.com/watch?v=iOGmwA5yBn0


On cherry-pick dependencies: there is no way that any VCS can solve that problem for you, because dependencies can be (and often are, for the projects I've worked on) non-local.

That is, you can have a dependency even though the changes appear to be unrelated from a textual point of view.

I'm all for developing better tooling for digging into blame-history, which is more generally useful and would cover the textual dependencies between changes, but you shouldn't believe that that solves the dependency problem.


> but you shouldn't believe that that solves the dependency problem.

I don't think anyone believes that. The dependencies in Pijul are the minimal dependencies that make the patch application possible, they are by no means semantic (you can totally add dependencies manually, btw).

However, for the particular use case explained above (stable branches, backporting bugs), the default behaviour of Pijul already gives you something really useful.


Right. I guess what I'm trying to say is that this isn't anything truly fundamental about Pijul. You can build the same feature on top of pretty much any reasonable DVCS, certainly on top of git.

The fact that this particular feature already exist in Pijul is not a great argument for changing the underlying data structure, which is what switching to another DVCS is.


> You can build the same feature on top of pretty much any reasonable DVCS, certainly on top of git.

Absolutely, the only difference is the time you'll waste managing your branches (creating, rebasing and merging them).


> the role of a VCS is to CRUD changesets and branches

Then git is wrong and pijul is right for you. :)

Gits internal model is to CRUD snapshots and their relation. It would be fine if the UI would hide that but it does not. For example, gits inability to track file moves is a symptom.


> audit the DAG of past commits

That's a branch-based way to look at it.


> What's the value proposition of Pijul ?

A faster patch-based DVCS. Folks have done a good job explaining it below.

> I mean, not being Git is not a good reason to convince anyone to adopt it as a VCS.

I dunno if I agree with this, but even if we dismiss that folks should be willing to try new things. When I was teaching folks about dcvs's back in around 2009-2010 folks were also unnecessarily skeptical of any sort of change.

> A tool needs to actually be better at something and add tangible value where others may not have

Not really though? It just needs to be different.


How does it compare to Darcs?

https://hackage.haskell.org/package/darcs

Edit: Better link is http://darcs.net/, darcs has been around for more than 10 years and is still actively maintained.


From a complexity theory perspective, the Darcs patch application algorithm runs in time O(2^n) in the worst case, where n is the number of patches since the beginning of the repository. That exponential merge happens in practice.

In Pijul, it is always O(log n + c), where c is the size of the largest conflict between the current repository and the patch you're trying to apply.

So, it is a double-exponential improvement in practical uses, even though you could totally make c = n for the sake of the argument.

The tradeoff is that Pijul doesn't handle the very cool (but rarely usable in practice) "darcs replace".

From an implementation perspective, Pijul is in Rust, relies on less external stuff than Darcs (for example no need for Putty on Windows), is available as a library.

edit: also, Pijul has branches, too, and Darcs doesn't. But branches are far from being as useful as in Git, and in fact most Git workflows use short-lived branches, which are essentially just patches in both Pijul and Darcs.


I can't find the details off-hand, but doesn't Darcs actually have the facility to define different types of patches as well as normal ones and "replace"? I don't know whether that's ever been done, but it could be useful, whether or not it's a worthwhile trade-off.

Are we talking about the Darcs version < 2 merge algorithm or the merge algorithm in current darcs?

The current darcs. Conflict fights are still exponential, and happen sometimes. I'm a really happy darcs user, but I was talking about complexity theory here, which considers worst cases.

For the sake of completeness, I don't know the full story for the complexity of Darcs "in most cases", but I believe it to be linear in the size of history in the normal case (as it needs to try and commute all the patches with your new patch).


"Current" might mean the latest release or the current darcs.net repo with v3 patches, which I can't claim to have followed.

I don't know too much about the specifics, but Pijul is based on a new "theory of patches" that is different from Darcs' and which they claim is "formally correct". Darcs always had a problem with exponential merge solutions, and Pijul apparently doesn't, partly thanks to using what they call a "conflict-tolerant" format for storing the tree. See the "Pijul for Darcs uses" at https://pijul.org/manual/why_pijul.html.


Darcs has a some pathological algorithms.

IME, darcs has never lost a repo of mine and Pijul has. So I'm not clamoring to use pijul instead - I don't use darcs in a way that runs into the pathological cases anyway.


In full fairness, was it Darcs version 0.x? Pijul has never made any claim to be stable yet (but we're getting there).

Nope! But I need reliable version control, so I will continue using darcs until I hear Pijul is more reliable.

The problem with Darcs, which I've seen in use, is that the tooling around it is atrocious. Compared to something like Bitbucket or GHE, it's a huge pain for a bigger team, and requires a very oldschool workflow. There's a Jenkins plugin that barely works, you have to build your own new version because distro packages are old, etc.

Not sure whether there's any actual advantage at all.


Well, we (the Pijul team) are trying to learn from Darcs and not disregard that aspect. Darcs maintainers have been super busy fixing the performance issues, but since Pijul doesn't have those, we can invest more time in the tooling.

Look at nest.pijul.com for example, or the satellite projects such as Thrussh (https://cargo.io/crates/thrussh), which aim at making the UX as smooth as possible.


Thanks, this is fascinating. I'd been hoping to move one of my teams away from Darcs to Git but they're very set in their ways; gotta assume the work to bring them to Pijul would be just as much work, but then the other teams that use Git would also need to move. Way too much CD tooling to re-implement.

Would consider for a new project, somewhere else, though. I really miss my perfect understanding of SVN when I get lost in some Git madness - it's obviously way more powerful and faster but sometimes I get into merge scenarios where I have to go edit files just to restore whatever I just did on another branch or what-have-you. Editors papering over this with extensions to "resolve using theirs" etc. help but seem like an ugly fix.

I am interested in https://www.unisonweb.org/ which I saw here recently as storing its code in a fundamentally different way that helps make ugly merge conflicts a thing of the past by making the on-disk format (and therefore the revision control system) aware of some aspects of the AST. I'd love to see this go further and have a "no conflict" merge take place as long as two edits to a single file touched different functions and the merged version passes some tests or whatever.


> gotta assume the work to bring them to Pijul would be just as much work

Not really. If they're not heavily relying on the "darcs replace" feature, it's just the same interface. The dependencies between patches are computed differently.

Be careful though, Pijul is still 0.12, the patch format will change.

> I'd love to see this go further and have a "no conflict" merge take place

This seems possible in a future Pijul. Right now we're just starting to get text files right! Let's see in a few years.



I'd hope my revision control didn't need a lot of tooling, personally. I have Emacs darcsum and vc-darcs, and Trac integration, with hooks to use on commits etc. I guess darcsden (hub.darcs.net) is similar to pijul nest. There doesn't seem to be any real problem integrating it with things (that don't insist on a CVS-ish model) if people want to do it, it's just that few do, unfortunately. I haven't had time to see how easy it would be to provide darcs support for sourcehut, which might be useful.

I don't know what GHE is, but making me interact with github or bitbucket isn't a good way to encourage me to contribute to projects, especially fixing incidental bugs in things I use. Each to their own, after some decades' experience of effective distributed development.


> I don't know what GHE is, but making me interact with github or bitbucket isn't a good way to encourage me to contribute to projects

GHE is Github Enterprise. Most software development teams use that or Bitbucket or Gitlab, because there's a ton of value in the visual interface, the approach to code reviews, approvals, etc. that is hard to recreate in any other form.

You still push Git patches to it the same way as you would to any other Git repo.


> the tooling around it is atrocious. Compared to something like Bitbucket

I've yet to find anything comparable for repository hosting. I use it for projects where some of github's features are frivolous, but it would be a bit goofy adopting it at a big company.



I think it has a more tractable algorithm than darcs.

Last I tried it (a couple years ago), repositories got corrupted.

I'd like something to replace darcs, but so far I haven't found it.


Well, for a project explicitly announced as experimental (there's even a blinking paragraph on nest.pijul.com), two years is quite a while.

The repository of Pijul itself got corrupted several times, but the last time was certainly more than a year ago.


While that's a reasonable position in some ways I certainly don't get the impression that I should assume Pijul is not up to the most fundamental guarantee of an SCM: not losing my code. A page telling me why I should use Pijul probably should explain that I shouldn't use it for anything that matters in the most trivial of ways and yours doesn't.

https://pijul.org/manual/why_pijul.html


> the most fundamental guarantee of an SCM: not losing my code.

Yeah. It's not terribly useful, even as a research VCS if i end up with corrupted repositories.


Last I tried it (in November), repositories got corrupted.[0]

I wish Carnix would just use Git instead, but I get why that's not happening.

[0]: https://nest.pijul.com/pmeunier/carnix/discussions/39


This is a network error, not a corruption error. I agree the CLI is not yet stable.

I guess it was caused by a network error, but it still ended up with the repository in a corrupt state (with the patches "existing" but invisible).

Is there a reason it's named after the Smooth-billed Ani? (Not saying a reason is needed!)

https://en.wikipedia.org/wiki/Smooth-billed_ani


Yes: these birds build their nests cooperatively.

Latest commit from 7 months ago. Also, anecdotally I've heard it's not very user friendly in practice; more like a research prototype than usable software.


would you care to justify a little bit? The latest version is 0.x, are you talking about bugs, or something else?

If anything, a user friendly git is what we need, not yet another tool with bad UX.

There are lots of tools out there to try and make Git simpler/more user-friendly. There's also Mercurial, which is more user-friendly while still using the same idea.

Pijul is meant to model asynchronous work as just that, whereas Git forces it into a unique history. For example, in Pijul, "pull --rebase" and "pull" produce different history, but are guaranteed by design to produce the same result, and strictly equivalent repositories (there's actually just one command for both).

This difference is fundamental.


Fossil?

I use and like git, but I feel like I'm working against it sometimes. I much prefer a rebase workflow with a linear history, and a chance to reorganize and reword commits so they make more sense. Git allows this workflow, and does quite a good job of it, but I feel nudged towards "just merge it".

If a VCS could do anything to improve the rebase workflow, that would be pretty interesting to me.


A more useful landing page:

https://pijul.org/manual/why_pijul.html


The implications of Pijul are really interesting. Some of the guarantees afforded by Pijul's model are clearly superior to those afforded by Git, but I can't help but get the impression that Pijul is:

1. considerably more complex under the sheets, and

2. considerably more complex in situations you're going to have to deal with as a user.

It would be nice to read an article from someone familiar with Pijul, that would take a look at the "ugly" (from the user's perspective) parts, and either explain how they aren't issues in practice, or at least show how you would handle them and argue why the benefits outweigh the costs.

For example, accurately representing repository states with files appears difficult: the page on Pijul's theory[0] explains how Pijul does not limit the repository to containing valid states, and mentions that "...else, we say the file has a conflict, and presenting the state of the file to the user in an intelligible way is not obvious.". There's an independent article about Pijul[1] that gives some examples of how you could end up with two different conflict states in Pijul that render into identical files.

I also had a look in the patch list for the master branch[2]. I managed to find a few patches I couldn't understand by looking at them. I realize that Pijul is still some ways from being ready for mass consumption, but the selling point of Pijul is its theory of patches. The documentation for "pijul patch"[3] notes that outputting patches as text "may lose information", and that "the text representation is merely a pretty-printed representation, and is not sufficient to apply the patch". This suggests to me that the difficulty in viewing those patches is at least partially due to viewing Pijul patches being a hard problem.

Here's an example, "Fixing a conflict with #387": https://nest.pijul.com/pijul_org/pijul:master/patches/6E7Kee...

Some lines at the top appear to be for informative purposes. They reappear later on, but slightly different (e.g. imports reordered, or with changed whitespace), and without any color (green or red) that would indicate that they were added or removed. They're shown in a context (e.g. line 219, 222) where they fit in as additions together with other lines that were added, though. The output of "pijul patch" for this patch[4] looks somewhat similar, and likewise not very easy to digest.

I really want to like Pijul, but I'm worried that it's too powerful and requires too much from the user. I would like to be wrong, though, so I would really welcome an article explaining these "darker" sides.

0: https://pijul.org/manual/theory.html

1: https://jneem.github.io/pijul/

2: https://nest.pijul.com/pijul_org/pijul:master/patches

3: https://pijul.org/manual/reference/patch.html

4: https://pastebin.com/b8XT5WPE


Hi! I'm one of the authors.

> 1. considerably more complex under the sheets, and

Yes it is, which sometimes comforts me when I think of how long it's taken to get a version that works reasonable well.

That said, it's also not the kind of complexity that goes against performance, but rather the kind that makes it hard to process Pijul repositories or patches with independent tools.

> 2. considerably more complex in situations you're going to have to deal with as a user.

I've used it quite a bit, but the biggest repository I've every interacted with is Pijul itself, which is a rather small codebase, so I don't know how the UX scales.

I find it as easy to use as Darcs, with the caveat that patches are not always as intelligible as in Darcs. That said, patches solving conflicts are weird and/or wrong in Darcs, and they are quite simple in Pijul. And I believe conflicts are the situations where you need the most simplicity and clarity.

> It would be nice to read an article from someone familiar with Pijul, [..]

That is a really nice suggestion for a blog post, it's now on my todo list. Thanks!

> For example, accurately representing repository states with files appears difficult

This is unfortunately a difficulty intrinsic to the problem. Git has the same issue, it just doesn't really show it because the conflicts that cause this are shown in a very basic way (there's only so much that diff3 can guess).

> I managed to find a few patches I couldn't understand by looking at them.

There are two issues: one is that the Nest doesn't yet have the best algorithms to render patches (but is likely to improve a lot with the new version of libpijul I'm writing), and the other one is that some patches might have been converted from a previous patch format, which makes them weird.

> I really want to like Pijul, but I'm worried that it's too powerful and requires too much from the user.

The goal is quite the opposite! For starters you can use it just like Git (with branches, and only with better merges). But then when you realise you don't need branches, everything becomes much simpler, and the tool stops getting in your way. For large Git repositories (such as NixOS/nixpkgs on GitHub), I wish it was using Pijul about 100% of the time I contribute or even use my local clone.


> For large Git repositories (such as NixOS/nixpkgs on GitHub), I wish it was using Pijul about 100% of the time I contribute or even use my local clone.

That's a reasonable reaction for someone who is very familiar with Pijul, but as a newcomer my reaction is precisely the opposite.

When trying to figure out some issue I have with some software, I quickly run into two questions:

1. How did the source code look for the release X that I have installed?

2. Did patch Y make it into release X?

Git makes it trivial to answer both of those questions. Both GitHub and GitLab show all versions as tags (by convention) that are reachable in 1-2 clicks from the project home page, and a given commit page shows all tags that include it.

With Pijul/Nest I have no idea where to even start. I see no tag list anywhere[0], and patch pages look like gibberish[1].

[0]: https://nest.pijul.com/pmeunier/carnix:master/patches

[1]: https://nest.pijul.com/pmeunier/carnix:master/patches/6gv8hs...


You're basically saying the Nest is still primitive. Yes it is! It's even blinking on its front page https://nest.pijul.com

And that's fine! I just wish I didn't have to use it (before it is actually considered ready, at least).

You don't have to use the Nest at all, Pijul can work on your own server via SSH, the protocol just runs commands through SSH.

Sorry, guess I worded my old post poorly. I have next-to-no interest in Pijul itself. Git is working fine for me, and the snapshot model is much clearer and easier for me than patches anyway.

My problems only become relevant when trying to contribute to projects that already use Pijul/Nest (Carnix).


> My problems only become relevant when trying to contribute to projects that already use Pijul/Nest (Carnix).

Well, but then you can't blame others for using their preferred tools rather than your preferred tools. Also, Git doesn't work for Carnix, because its main author (me) is completely overwhelmed with too many projects and wouldn't want to add the additional project of managing Git branches.


> This website uses cookies to ensure you get the best experience on our website.

Thats the quickest way to get me to close the tab.


European laws require something like that to be said. The Nest uses cookies to authenticate you, it doesn't even give you one if you don't log in.

Not for functional ones. Only for tracking / advertising cookies.

Finally, a moral DVCS.

Is this a graph theory joke?



Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: