Hacker News new | more | comments | ask | show | jobs | submit login
Pijul for Git users (pijul.com)
105 points by Volundr 18 days ago | hide | past | web | favorite | 95 comments



I don't understand what justifies Pijul's existence. While I agree that Git is a complicated monster that takes ages to learn how to use for most people, Pijul does not seem to solve any problems with Git. Instead, it has many problems that Git solves. From a user-perspective, it doesn't even seem like there's much of a difference, despite the (annoying) choice of alternate command verbs (i.e. "record" rather than "add/commit").

It doesn't really seem like the authors fully understand Git, and what problems it solves.

> [...] in Git each commit is related to a parent [...] But in Pijul there's no timeline and branches are just sets of patches.

To me, that makes Pijul much, much less capable as a version control system. In Git, knowing the commit ID means that you know the full source tree state with the certainty being equal to the risk of a commit ID hash collision.

With just individual, unrelated patches applied in sequence, you have no such guarantee, and no knowledge of the current state of the repository.

Of course, you could add a seperate state tracking system allowing ID's to point to an immutable and unique set of patches in a certain sequence, but doing so just means that you have now implemented features identical to Git, with just a different internal representation (i.e. irrelevant differences for the user).

> [...] And push those changes in whatever order we want

You can do the same in Git (see interactive rebase, cherry-picking, similar). It rewrites the commit IDs in order to maintain the previously mentioned property of representing the state of the repository, but that's an implementation detail.


> I don't understand what justifies Pijul's existence.

The argument basically is: Darcs is better at the fundamentals than git is, and the reason for its lack of popularity is that it implemented those fundamentals inefficiently leading to exponential time use in some cases.

So the justification is that some people want Darcs fundamentals but with the speed of git. Now: if you don't buy that Darcs does anything fundamentally better than git then obviously you also won't allow it as an argument for why Pijul is needed. But it is the argument.

The pijul manual starts with a "Why pijul" that describes why the more complex patch theory is an advantage.

https://pijul.org/manual/why_pijul.html

it outlines an example "bad merge" scenario where patches theoretic system gets it right and git does not.

https://tahoe-lafs.org/%7Ezooko/badmerge/simple.html


First and foremost- its about a mental model. Pijuls model matches how one thinks of code changes. But in terms of actual implementation- pijul has better space management (a factor that solely dictates use of mercurial often).

Also- pijul has per path check out.

But again- all this is implementation- what works better is mental model.

Anybody who has cherry picked patch hunks or done a 3way merge knows what I'm talking about.


1. I see no difference. Git effectively works as differences. A commit can for all intends and purposes also be considered a "patch" (that just happens to also point to the parent it is meant to be applied to).

That git works as a blob store internally (through tree objects) is, to the user, an irrelevant and hidden implementation detail.

2. So does git. git checkout treeish -- pathspec

Pijul does not solve a three-way merge. In fact, it does not seem to mention how merges are dealt with at all. Its model does nothing to even remotely affect the situation from what I can tell.


> That git works as a blob store internally (through tree objects) is, to the user, an irrelevant and hidden implementation detail.

1. Sorry, I get that you might be angry at Pijul for some reason, but this is simply not true. The lack of associativity of 3-way merge is absolutely not hidden, and gets merges wrong sometimes.

But more fundamentally, the fact that commits must be ordered is not an implementation detail, it becomes fairly clear if you try to cherry-pick more than twice from the same branch, and find conflicts on the way. Or when you want to "undo" a merge, and push that undoing to a remote repository.

Conflicts in Pijul are detected correctly, and a conflicting state is another valid state, whereas in Git, a conflict is a failure to merge, and the user must solve the conflict before doing anything else. In particular, this implies that a conflict resolution in Git can be propagated to other repositories only if the two conflicting commits were merged at the exact same time in all repositories.

2. Working on parts of the repository in Git only works for checkout, you still have to clone the entire repository, and Git still has the entire repo, even it is not shown to the user in its entirety. This completely defeats the point of partial clones, and prevents, in many cases, the use of Git on monorepos. That said, I'm not sure why this is the case, Git could very well produce a commit using a tree whose blobs are not all downloaded, just like when you commit from a shallow clone.

But even if this worked, Git requires you to send the entire history of the full repository if you want to clone it to a remote machine, whereas Pijul lets you select the patches relative to just a part of the repository. This is because in Pijul, patches commute! If you place your patches in a sequence, you can reorder the patches you want to send (without changing their identifier) to be the first ones in the sequence, and then send just a prefix of the sequence. All this without any extra operation, rebase or anything else.


What I dislike about Pijul is that it entirely fails to sell itself.

Pijul is a product that seeks adoption. I'm a customer. Pitch.

Pijul fails to present me any reason or example for its superiority to the status quo, Git. It only provides vague points like simply claiming to be more "correct". I need to know how my workflow will improve as a result of replacing Git with Pijul.

Sure, you might say that I'm not the target, but as a software developer running my own DevOps product and consultancy company, I don't find that to be the case.

For your specific counter-arguments:

1. Pointing out a corner-case pain point with Git, and simply stating that Pijul will be "correct" is not at all informative. Git has countless pain points, and so will Pijul have. What matters is the 90% experience, not the corner cases.

"A conflicting state is another valid state" does not mean anything to a user. The source is not in a valid state during conflict, and sharing invalid source is not interesting, so this falls under implementation details again.

Merges have pros and cons, which is why Git also has other mechanisms which the users can select, such as rebase.

Again, what one needs to know is how the workflow of a developer is affected, not how beautiful the internal structures may end up being.

2. Ah, a partial checkout and a partial clone differ, at least in Git terminology, which is the status quo at the current time.

However, what people care about is not the ability to perform a partial clone, but the speed of certain operations. If a full clone is fast, the user won't give a damn about partial clones.

Instead, what one should then put as a prominent feature is monorepo capability, and direct performance comparisons between Git and Pijul.

However, do note that Microsoft appears to handle monorepos in Git just fine with GVFS, through partial clones.

Also, the ability to reorder patches without changing their identifier will to many be an anti-feature which might render Pijul a no-go. From a user-perspective, Pijul and Git are both equally capable of reordering patches, Pijul just lacks the ability to easily and uniquely identify an entire tree state.


You seem to be faster at writing than at reading, I already answered your point 1:

> it becomes fairly clear if you try to cherry-pick more than twice from the same branch, and find conflicts on the way. Or when you want to "undo" a merge, and push that undoing to a remote repository.

Any team with enough people, or working at a fast enough pace, knows this doesn't work in Git. Pijul completely solves it by thinking in an entirely different way (changes instead of snapshots). This is not an implementation detail at all, the two mental models are extremely different.

About your point 2, what can I say? You picked the example, and now that you're not liking it anymore, you are trying to blame it on me? I don't find that fair.


Sorry, but as another commenter said, you may not like Pijul for whatever reason or think it's "bad" someone is working on it, but your argument that git's model is not exposed to the user and that there is no difference is simply not true. I recently had issues with Git's model of merging come up and it broke a very important build in our Linux distribution (OpenJDK), due to an erroneous merge that Git applied without warning or conflict: https://github.com/NixOS/nixpkgs/commit/5d0ef3fd900769e82175...

To repeat the issue: I basically broke a build on our Linux distribution when I cherry picked a change after what git thought was a file rename -- it was not actually a file rename. An entire new file that was mostly similar was added, effectively resulting in Git thinking it was a move.

    - `10.nix` for JDK 10 was removed, and JDK 11 (via `11.nix`) was added in 1bd7b98c7975136ddd6183a155e90688fd5b3e43, all in the same commit
     - JDK 11 was only added in `master`, not in `release-18.09`
     - Because these files were substantially the same, `git` tracked this 10 -> 11 upgrade (from `10.nix` to `11.nix`) heuristically as a file rename.
     - I committed 162914742327002d49bd1dde424d399842ff7b5f on `master` as a change to OpenJDK 11
     - I cherry-picked that to `release-18.09`, in this commit
     - Git saw that the change was applied to `11.nix`, but that change was not applied on this branch, so that file did not exist.
     - But git *thinks* that `11.nix` came directly from `10.nix`, through a file rename, i.e. the files are "the same" in some sense (even though they substantially differ in functionality, now)
     - Therefore, `git` concludes it's safe and appropriate to "rebase" this cherry-pick on top of `10.nix`
     - And then the build fails, because ZGC is not available in OpenJDK 10
-----

I used Darcs for years before moving to Git, and this would have never happened with either Darcs or Pijul -- and not just because they actually track renames more accurately, more complicated merge issues that can arise from Git's model are also handled elegantly and correctly. The tool would have instead warned me that any cherry pick would have required a dependent patch (the one that added OpenJDK 11) that wasn't already there, and I would have been immediately informed of it -- a case that Git completely failed at. That's the wrong behavior, no matter what way you cut it. (Of course, I use Git every day despite this, and it works about 90% of the time.)

Many other very useful workflows are made easier by the fact that in the model of Darcs and Pijul, patches are treated as commutative, and that fact is tracked correctly -- doing things like cherry-picks to stable branches (something I did for years as part of a large project -- one that use Darcs first and moved to Git!) becomes far easier because it's always clear when dependent changes are needed and how they interact, and when you want to pull them vs author bespoke patches for different branches, etc.


Pijul is based on a different (and much more advanced) algorithm than Git. That doesn't mean it's the perfect tool - just as Git is better than SVN (even though they're based on the same underlying algorithms), Pijul might be the "SVN-like" tool waiting for its own Git...

Edit: having said that, Git isn't perfect either, I'm keeping my fingers crossed for an even better tool!


> Pijul might be the "SVN-like" tool waiting for its own Git...

Or Pijul is git, and Darcs is SVN in this case.


> just as Git is better than SVN (even though they're based on the same underlying algorithms)

Please explain this. Git is a content-addressable object-store. SVN is not. SVN’s architecture uses a diff as the basis for versioning which ties you to a particular diffing system, git uses snapshots and diffs can be generated using any algorithm because they aren’t part of the object-store; so what common “underlying algorithms” are you referring to?


AFAIK the basic merge algorithm for both Git and SVN is 3-way diff. Pijul instead uses commutative patches and a more complex merge algorithm.


Is it a better merge algorithm? Complexity on its own is not a selling point.

And if it is better, then I believe that it should also be possible to implement for Git, as the patch properties of Pijul and Darcs can easily be emulated.


1. It is associative, which make it predictable, and yes, better.

2. It is commutative, which makes it non-trivial to emulate with Git, a system where nothing commutes (on purpose).

Of course, you can obviously do your merges manually, rebase the world at each new commit, cherry-pick and rerere to emulate what Pijul gives you for free. But then is your job writing code or showing your proficiency at Git?


Well if it's better at preventing bad merges, it's better, yes. Pijul might not be the best/optimal implementation of this algorithm, but it's still a step in the right direction (mainly because they optimized the algorithm significantly, reducing its complexity). Like, not all benefits of number theory (e.g. encryption) were discovered on the first day either.


> Please explain this. Git is a content-addressable object-store. SVN is not.

They are both version control systems. People use them to track versions, as well as merging changes and diverging development branches. How data s stored is an implementation delail that doesn't reflect on the mental model.


Git is extremely different from SVN at all levels. Pijul does not appear particularly different from Git from a high-level perspective, however. Internal implementation details are very different, but they are largely invisible to the user.

It appears that Pijul is meant as an enhancement to Darcs (At least, the FAQ makes that out to be the case). Darcs which has its list of key features here: http://darcs.net/Features, all of which overlap entirely with Git.


Indeed, and it differs from Git in the same way as Darcs differs from Git: the basic block of history/code is not a commit, but a (commutative) patch.


Yes, which misses the point: Darcs does not mention any features that Git lacks, and lets its differences remain as largely invisible and irrelevant from the perspective of the user.


May I suggest maybe reading the manual before writing this kind of comment?

https://pijul.org/manual/why_pijul.html


> I don't understand what justifies Pijul's existence.

Someone wanted to create it, then created it, and probably had some fun while doing that.


> Git stores the repository changes as a series of snapshots. That means that every time we commit changes it keeps a new version of every file that changed so the size of the repo could grow quickly.

The very first statement in the gtihub page seems to ignore Git packfiles completely for the sake of argument.

https://git-scm.com/book/en/v2/Git-Internals-Packfiles


Their wording is indeed bad, and does not convey at all what they wanted to convey.

Git stores patches to reduce disk size, and does so very effectively. However the same exact patch, applied to a different head has a different commit id. That commit id is a snapshot, even if it is stored as a patch.

Pijul, that same patch applied to different heads, will always have the same hash. This has some advantages that is deeper in their documentation.

Basically, you can cherry-pick changes (patches) from someone else, and then later merge their entire branch, but not conflict on the patches you already imported. Having dealt with this with git, I can see the appeal.


> That commit id is a snapshot, even if it is stored as a patch.

I agree with your argument, but Git does store "blobs" only (plus "trees"), and never patches. When you can ask it to show you a diff between two snapshots, Git computes the diff on the fly.

More almost-accurate info here: https://git-man-page-generator.lokaltog.net/


You are right that this is the model, but if your argument is that this model means that unnecessary disk space is used, that's wrong.

https://git-scm.com/book/en/v1/Git-Internals-Packfiles states, using an example of a file which is 4K and then a new commit which adds a line to that file (thus making the new altered file 4K):

"What is cool is that although the objects on disk before you ran the gc were collectively about 8K in size, the new packfile is only 4K. How does Git do this? When Git packs objects, it looks for files that are named and sized similarly, and stores just the deltas from one version of the file to the next."

You are right that this isn't the thing that's presented to the user as "git diff". The model that Git uses is that each commit has each file at that version, but it uses an optimization (the packfile) to make sure that doesn't take up more disk space than necessary.


Git does store deltas between files to reduce space. Every so often it stores a full copy of the file to improve performance.

Perhapps patch was the wrong word, but a delta and a patch are the same in my mind.

I am sure everything, including pijul, does diff generation on the fly of the full files at different points. It would be faster than figuring out the difference of patches alone.


A packfile delta in Git is arguably an implementation detail — a compression technique — not part of the data model.

Git's data model consists of commits, trees, refs and so on. Not patches. You can implement Git without deltas and it will work the same. But in Pijul, the patch is the data model.

The whole idea of Pijul (and the project it was inspired by, Darcs) is that if you think of patches as units of data that fit together using a kind of formalism (usually called the "theory of patches"), you end up with a very powerful system that makes certain things — solving conflicts, figuring out what commits a single commit depend on — super easy. The patch isn't an implementation detail, because you cannot reimplement Pijul without it.


By the way, that guide is not from the authors of Pijul, even though as an author myself, I'm happy to see people understand it, use it and explain it.

I believe a more gentle and constructive way of discussing this is by starting a discussion on their page [1].

[1] https://nest.pijul.com/tae/pijul-for-git-users/discussions


Pijul keeps touting "patches" instead of commits, and they show how to record and unrecord these patches. But how do you actually work with such a repository? How do you handle merge conflicts? The documentation does not even describe how to resolve conflicts, just that you can procrastinate and save the conflicts "for later"[1]. There is probably technical merit in Pijul, but the documentation and marketing sucks.

[1]: https://pijul.org/manual/conflicts.html


I'm not convinced either. What's the performance of getting the the state 10 versions before? What is exactly the size difference between a git repo and a pijul repo of the same thing?

As for patch vs snapshot, that's arguing at the implementation level. It doesn't affect my use of a tool except as it relates to performance and merging/branching... etc.

Preferring a patch approach rather than a snapshot is like saying map(+)[1,1,1,1,1] != last([1,2,3,4,5]). I don't care.


You’re right about not caring about the implementation level.

But the difference isn’t just at the implementation level, it also relates to merging etc.

See here for an example of how merging works better, by taking into account intermediate commits: https://tahoe-lafs.org/~zooko/badmerge/concrete-good-semanti...

That is to say, start with version A, on one branch commit versions B1 and then B2, on another branch commit version C1. To merge the two branches (B2 and C1), Git/Subversion etc. will do a three-way merge of “A to B2” and “A to C1”. It won’t take into account B1 which provides vital information to do a better merge.


OK, but if it's needed you could derive patches from snapshots and perform the same steps with a snapshot based VC.

I may be missing something here!

PS less I seem too harsh here, I love looking into version control systems. I have worked with sccs, rcs, arch, larch, bazaar, bzr, bitkeeper, svn and of course git. Looked at darcs very briefly. Never used cvs seriously. Managed to avoid clearcase thank god.


Sure, but if you had read the manual[1], you would have learned that this is only the associativity part, which you could indeed use to provide a better merge algorithm in a snapshot-based VCS.

So:

1. You're 100% correct in saying that this can be done in a snapshot-based VCS like Git, Mercurial or SVN.

2. But you're also missing the main feature of Pijul, patch commutation, also explained in the manual. The idea is that you don't need to rebase anymore, two patches produced independently always commute, and hence patches don't need to change their identity when they are "rebased", or merged at any point in time. This has major usability benefits, such as being able to undo a "merge" (I prefer to call it a patch application) even after ten other "merges". Or choosing which part of a conflict to remove in order to solve it (if you don't choose to actually solve the conflict). With patch commutation, you don't need `git rerere` anymore, and you don't even need to branch very often, since branch are essentially an emulation of commutation (then of course Pijul still has branches, but they're quite different from Git's branches, in particular they are not as essential as in Git).

But then Pijul is still a young project, there are still a few bugs here and there. It is self-hosted, and we're quite happy using it for its own development (and the development of all companion libraries, Thrussh, Sanakirja…). But is it really ready for HackerNews-level flamewars?

[1] https://pijul.org/manual/why_pijul.html|the manual


> In Pijul, for any two patches A and B, either A and B commute, (in other words, A and B can be applied in any order), or A depends on B, or B depends on A.

This isn't exhaustive. What about the case where both patches A and B are worked on independently, and both depend on an original version of the codebase, let's call it P for parent.

A and B both diverge from P and therefore require a merge, or in git, a rebase. Over my years as a developer this scenario has caused approximately 90% of the issues with git. How does Pijul tackle it?


You can simply incorporate both patches A and B, you don't need a separate "merge" step.

If A and B have conflicts, then the resulting repo will have conflicts (but that's allowed in pijul) and you can resolve those conflicts with a third patch, C.

The nice thing is that this patch C can be used to resolve those same set of conflicts, even if someone else has a completely different history. In git, when you perform a rebase, the identity of commits is lost, and so you have to constantly re-resolve the same merge conflicts when merging between branches where one of the branches has been rebased.


That sounds great.

What about the situation where A and B don't have conflicts (as defined by the VCS), but still don't work. For example, based on a state of a piece of source code, e.g. A renames a method and all usages (works fine), and B introduces new code (but uses the original method name, also works fine). Merging them will not produce conflicts, but will not work.

In Git I always use "--exec 'mvn clean test'" when using "git rebase", to compile the source code (which would find such rename issues) and run unit tests (which would find other issues, e.g. A introduces a new mandatory column in a database table and B introduces an INSERT statement without that column).

What would be really cool is if one could say to Pijul "for any operations on this repository, execute this command to determine if the software is OK". And then forget about it. That way all operations it does it could check, and the user could never forget. I wish Git had something like that.

Or have I misunderstood something? I mean, simply relying on "no conflicts implies everything's fine" is not sufficient as far as I can see?


I would say that is somewhat outside the jurisdiction of the VCS. Sure you can have post-commit or CI hooks that ensure your code always compiles, but it doesn't need deep support from the VCS to do that.

However, the fact the pijul doesn't really treat the "no conflict" state as "special" is in some ways closer to what you want - the repo is just a collection of patches, and whether the result is "good" or "bad" is not determined by the VCS, but by whatever means you choose to employ.


I've not used Pijul, but I used Darcs — which Pijul is essentially an improved clone of — for half a decade, and I assume it's roughly the same.

The patch model is incredible. Think of "git cherry-pick". Imagine you could use that instead of "git merge" or "git rebase" for all your work. Imagine that every time you cherry-picked, it would tell you which additional commits you'd need, and then pick them for you. And that when you merged your heavily cherry-picked branch back into the mainline, it just worked. That's Pijul/Darcs.

One doesn't have to understand the "theory of patches" to use Pijul/Darcs. As a user, you just work with changes, just like Git. But the UX is much simpler than Git — in a good way.

I remember switching from Darcs to Git back in 2008 or so. It was like switching out a sleek spaceship [] for an old rusty, clanking pickup truck. Git has gotten better over the years, but ultimately, I think Github was the killer app, not Git. Going back technical merits alone, Darcs and Mercurial "should" have won that battle.

[] Albeit one that occasionally choked on its dark matter fuel for mysterious reasons. That alone contributed to a large part of the decline of Darcs. Apparently this is a solved problem in Pijul.


> Preferring a patch approach rather than a snapshot is like saying map(+)[1,1,1,1,1] != last([1,2,3,4,5]).

I assume you meant `reduce` instead of `map`.


Thanks, I knew I'd get that wrong!


> As for patch vs snapshot, that's arguing at the implementation level

??

Thats the core of any version control system, namely "what exactly am I versioning?".

For anything beyond the trivial use case (and "trivial uses" probably encompass a large majority of all usage of version control systems, so thats not to be sniffed at), this fundamental difference of "what am I versioning" impacts, well, everything.

For one thing, dealing explicitly with patches rather than snapshots means you can almost have ad-hoc branches, on the fly, AFTER the fact (ie: you only realised you wanted to separate out some commits long after you actually did them).

You can't do that with git, where every "patch" is really part of a linear series of commit where the parent of each is hardcoded into the commit blob.

Not without rewriting the history. Or revisiting the past, in the future (by writing undo commits, and branching after). Or however you want to do it.


> Thats the core of any version control system, namely "what exactly am I versioning?".

I would say that's less fundamental than answering "give me the exact state of the repo at time t"

> You can't do that with git,

Maybe I misunderstand but you can branch from some arbitrary point and cherry-pick to your heart's content. Of course the hashes won't be the same but does that matter?


> I would say that's less fundamental than answering "give me the exact state of the repo at time t"

This is indeed a very important question when you're working alone, but the notion of "at time t" changes when many people work in parallel. This is why Pijul has two notions of time:

- Patches, which always commute when produced in parallel. - The sequences of patches applied locally to a branch.

Alice and Bob working together can have the same patches and a different application order, what matters for the contents of their file is the set of patches only.

> Of course the hashes won't be the same but does that matter?

It does matter a lot sometimes, for instance if you cherry-pick, solve a conflict, and cherry pick again from the same branch, the fact that Git changes the hashes will make the conflict reappear the second time you cherry-pick.


With "patch theory" the VCS can tell you which patches depend on which patches, even if they look like a linear "tree", it helps you get the same result. Of course you can do it with git, but it helps you get there faster.

Git uses some heuristics to make things smooth, but when it fails, it fails bad. Patch theory handles those, and it cannot "fail", it simply states where the conflict is. (On patch level, not just that these files happen to conflict because these diffs conflict after a lot of rebase-rewrite.) Which will be exact, not heuristic driven and implementation dependent, like in Git.


> Preferring a patch approach rather than a snapshot is like saying map(+)[1,1,1,1,1] != last([1,2,3,4,5]). I don't care.

You don't care as long as the difference is unobservable, but sometimes it is.


Why pijul?

Cherry-picking.

In git, cherry picking is broken. In pijul, it isn’t.

That’s the shortest explanation possible of why a patch-based model is better than a snapshot one. Hope it helps.


Not the most important matter but from the FAQs

> Where does the name come from?

> Pijul is the mexican name of Crotophaga sulcirostris, a bird known to do collaborative nest building.

About technical stuff: the mathematical theory of patches, commutability, etc are nice properties but any versioning system has to fight a very uphill battle nowadays. Incumbents (mostly git) are very entrenched. I'd like to hear which reasonable expectations the authors of pijul have.

By the way, do you ever mistype it? How about a two or three letters official short name?


I'm one of the authors. We are actually quite optimistic about Pijul's possibilities. In particular, there are a few niches where people either:

(1) do not understand Git, can't learn it, still need a VCS. I have a few industrial applications in mind, not necessarily related to code.

(2) understand Git very well, and have spent too many hours debugging their failed merges, non-associativity issues, complex rebases. These users would know enough about "git rerere", for instance, to find the model of conflicts slightly absurd. For them, Git has become a source of cost in the project, and costs need to be kept under control. Again, I have at least one open source project in mind, and a handful of industrial contacts related to these use cases.

(3) many people in academia, even in computer science, collaborate on their LaTeX papers using various combinations of Dropbox and emails. This is because they are "principles people" and like to understand the principles behind what they're using. If a tool seems too complicated, it looks suspicious. But anyway, I have written papers with Git before, the "fun" of using Git often fades away as the deadline approaches.

Given the very young age of Pijul, I see this picture, and the interest many people have shown in the project, as quite promising. However, unlike what your post may suggest, I don't see Pijul as fighting against Git. Ultimately, when Pijul is ready, the two tools could become quite complementary:

- use Git (with branches, but without merges) when you need a linear history and/or a Merkel tree of your content. Git is extremely efficient in disk usage and in speed of retrieving contents. Projects like "Software Heritage" are a good example of the perfect Git-but-not-Pijul user, storage of experimental data is another one ("I've done experiment X on day Y, the results were Z").

- use Pijul when you want to collaborate efficiently on text documents, and manipulate your repository in an extremely flexible way, handle conflicts sanely, while minimising your planning and maintenance costs. Pijul will probably never be as efficient as Git in terms of disk space, but could become quite competitive in terms of merge speed. Agile teams, academic papers, are good examples.


> [agile] teams

yes please. git makes merges hard, and rebases are just too hard when you want to maintain a few different releases. without losing too many QALYs. (with git we [would] need to maintain a lot of branches, just to be able to have clean merge requests at the end for each big feature coming up for a release)


Regarding your niche (2): what open source project do you have in mind?


One of the largest open source repositories ever:

https://github.com/NixOS/nixpkgs


How about pi


For those who wonder what the heck Pijul is about, I'd like to emphasize these two articles linked at the bottom of the page:

https://jneem.github.io/merging/ https://jneem.github.io/pijul/


I find the stuff on jneem’s github wayyy more intelligible than the actual current pijul code. Granted I think it should be largely equivalent!

Full disclosure : totally planning to messing around with a Haskell port of some of these ideas sometime soon.


I read that merging blog post a while back but didn't completely grok it, I've yet to read the paper though. Do you know of any other write-ups on patch commutation?


Unfortunately, no. It's pretty much my only exposure to the idea.


A shame, thanks for the reply though. I'll set aside some time this weekend and read through it again.


Since pijul's key advertised strength is the use of patches for resolving merges, couldn't this just be a new merge strategy option for `git merge -s <strategy>`? In this case, the algorithm would just produce temporary patches out of git snapshots before running its own algo. Even if it comes out being slower than natively storing patches it would be worth the cost if it made complicated merges easier.



This just needs to ditch all of the command line stuff and explain the patch concept up-front. This was the most useful line:

> if we unrecord a patch with dependencies all its dependencies are unrecorded as well.

Up until that point I had been thinking "but what happens with patches that depend on each other", and I don't think they really explain that ever on the Pijul website.

But it doesn't go far enough - how is the dependency automatically tracked for example?


Automatic dependency tracking is deceptively simple: If you think of a patch as something as simple as "Modify line 10 of file A from `dog\n` to `cat\n`", the dependency is obvious: whichever previous patch left `dog\n` in line 10 of file A.

In most cases that's good enough to get stuff done.

Of course, it's not automatic semantic dependency tracking. It can't always entirely tell that Patch B depends on a "feature"/"flag"/"magic bean" in Patch A or it won't CI build, without additional information (such as a CI build process in a bisect, or a manual dependency flag from the author of Patch B or a downstream consumer).


I read the article. Then I read the home page. It still don't know why I should give it a try. The practical added value, espacially compared to the cost of switching, is not clear to me. "faster, easier, mathematically proved" is a bit fuzzy as a sale pitch.


I'm very interested in Pijul! Patch-based version control always seemed like a much more intuitive paradigm to me. That said:

From reading these comments, I would say that perhaps the number one priority for the project is an education/PR push. The best way to draw in more users (and thus more developers!) would be to have your very own Steve Klabnik. Someone to produce not just reference documentation or a manual, but a good, long(!) and detailed introduction with the tone of a blog post. The "why Pijul" section of the manual is a very good start, and the blog posts I've seen (particularly Joe Neeman's) are very fantastic, but it needs to all be tied together in a single section on the pijul.com site and expanded upon.

Those docs should include a good description of what Pijul solves that git doesn't. That is to say, basically anything along the lines of "git has a hard time with _______. Will pijul be a silver bullet for that?". (Examples for _______: monorepos, rewriting history, completely purging some file or commit from history). It's unfortunate, but a lot of developers seem to look for what's "best". You see this with new programmers a lot. Any programming forum will have fresh faces asking "I want to learn how to program please tell me what's the best programming language". I've found the JavaScript community in particular is quite keen to chase the new shiny that everyone likes. It's part of why Rust obtained huge momentum so quickly. If there's a really good description of where Pijul beats the competition, it helps with these sorts of people.

And finally, those docs (even the bits about Pijul vs. git) need to be written with the assumption that the reader has little or no understanding of version control or git. The docs may have to link to some 'git in 30 minutes' blog post elsewhere on the internet, if necessary. Writing in that style will not only draw in the new users who will help show where Pijul and it's documentation are confusing, but it will ensure that even those with lots of experience with git but little understanding of it will be able to appreciate Pijul's design.

This is by no means meant to be a put-down on the project or the way it's been handled so far. There's a lot of negativity in the comments here, but I'm certainly not one to be negative about the project. :)


I would have two pieces of advice:

(1) Starting by explaining that Pijul works with patches probably isn't the best place to start. It might be true, but most people are going to think "but if I do 'git diff' I can get a patch, so it's not really any different." I would start with explaining the consequences of this e.g. better merging, and so on.

(2) Basically take all the questions asked on this discussion, and put them into an FAQ together with their answers. These are the same questions everyone else is going to be asking too. Even if these questions are misguided, that still doesn't change the fact that's what everyone's going to be asking.


Regarding (1): as someone new to pijul, I was thinking the same. Not just that, but I deal with patches in git all the time. I receive them, and I send them, via e-mail.


> Pijul is the only version control system based on a complete mathematical theory of patches. [1]

Can someone please provide a link to that theory?

[1] https://nest.pijul.com/

EDIT: This? https://arxiv.org/abs/1311.3903


I found this [0] to be a very readable presentation of the theory behind Pijul. Key point is that by representing state as an "DAG of lines" instead of as an "ordered list of lines" (i.e. a file), you are always able to do perfect merges. You then need a "flattening" step to get to a normal file, but that is handled same as any other patch.

All in all, I am a tempted to give this a try -- even though the mental model will require some mental rewiring.

[0]: https://jneem.github.io/merging/


> the mental model will require some mental rewiring

One must be pretty brave to admit that. Even though I'm a big user of Git (and a co-author of Pijul), I agree that you'll probably need to unlearn a number of things. But you won't need to relearn so much, the patch model is really more intuitive in many cases.


I believe Pijul branched from the Darcs codebase.

https://en.m.wikipedia.org/wiki/Darcs


> Contrarily to what we hear sometimes, Pijul is not a rewrite of darcs in Rust. The theory is different, and the algorithms have a different complexity. Then of course, Rust was chosen to gain an extra performance factor, but rewriting darcs in Rust would be a lot of work for really minor improvements (darcs is already really good, and Haskell is a great language).

From [0]. It's possible that they did branch from darcs, but it seems unlikely given that quote.

[0]: https://pijul.org/model/


Fair enough, after looking around the big quote seems to be this one:

"Pijul started as an attempt to fix the performance of darcs, and ended up among the fastest distributed version control systems."

https://pijul.org/

It sounds more like they wanted a better Darcs and started from there, rather than actually using any of the Darcs code.


The theory is actually quite different, even though Pijul and Darcs behave in a similar way on many examples.


Just a quick sanity check:

1. machine going bad causes data to get corrupted. oh no!

2. can still check the fiddly number with `git log`. whew!

3. grab a clone from $untrustable_cloud and put it on another machine

4. check the fiddly number of the clone with `git log` or whatever. It's the same as the other fiddly number. Hurrah!

5. run `git fsck`. no problems.

6. Keep working as if nothing had happened

Same/similar process for Pijul?


Rule number 1 for project naming: choose a name which anyone can easily figure out how to pronounce.


How is this any different from mercurial?


By having a mathematical theory of patches instead of a bunch of (user-friendly and often working) hacks. It's actually explained in the manual:

https://pijul.org/manual/why_pijul.html


How is this better than git format-patch?


Git and Subversion had a baby?


See https://pijul.org/ (and its inspiration, http://darcs.net/ ). It's actually something quite different from both.


This is basically how CVS worked, as inherited iirc from RCS. Presumably the authors know this, so they must be doing something different, though I can’t tell what from the description or pijul’s FAQ.

Perhaps they figured RCS, SCCS et al are so old that mentioning them would add confusion for most readers. Might be reasonable, but for me I am unclear as to what’s up with Pijul.

(It’s sweet that their host is named ‘nest’ given the name of the project)


I'm one of the authors, and we've done a lot of bibliography before engaging in such a big project. I'll talk only about the core principles behind version control, not about how things are actually implemented, or which options are the defaults in which tool.

Pijul is absolutely not how RCS or CVS worked. Instead, RCS and CVS worked more or less like SVN (remember the "CVS done right"?), and SVN itself is not too different from "Git without branches", or "Centralised Git", in the sense that these tools are snapshot-based (even though the actual implementations may store patches sometimes). This means that they see history as sequential, and use algorithms to merge when necessary. One could actually very well simulate a large part of Git branches and merges using multiple SVN servers, even though that would obviously be quite cumbersome to implement and use in any concrete situation.

The only version control system one can clearly relate Pijul to is Darcs, using the idea that patch commutation is the main thing people care about. Unfortunately, Darcs has important complexity problems when dealing with conflicts, and Pijul solves these problems.

That said, short-lived branches ("feature branches") in Git mostly try to simulate commutation, i.e. independent development mergeable later, hopefully in any order (since they were written independently).


Kudos on your work. I'm looking forward to trying pijul - I used darcs in the past and I miss some of its capabilities in the current git monoculture. Please ignore the negativity if you can - it's not like anyone was taking git away from people by suggesting that they might try a new thing.


Thanks for your response. I do think the archeological part of my brain was very confused/damaged; I think it was SCCS that kept the original document and applied patches as opposed to RCS which applied the "innovation" of maintaining the latests with deltas to go backwards. Luckily I haven't used any of those old systems in decades. IIRC BitKeeper followed the SCCS model.

In any case good luck with Pijul; I do feel git has become a bit of a monoculture.

In your discussion of model you mention binary blocks but don't otherwise mention binary files. Two major "shortcomings" of git (in quotes because they are explicit design decisions) are the lack of binary file support and the lack of very large file support. It's possible, but ugly. This means game developers (in particular) are typically unable to use git and are primarily stuck on Perforce, which is sort of like perpetually scratching yourself with a fork made of salt. The patch model might be a win here, especially if you could plug in format-specific "differs"


People who have issue with Git are just doing Git wrong. They are the same that build a feature on one single big commit impacting lots of files. They forget to merge their hotfix back in dev branch, etc.

And Git punish those peoples, but your tool don't, how can i trust the code base ?


Git doesn’t scale.


> They are the same that build a feature on one single big commit

You can easily tell with pijul as well if they are doing that or not.


This is unreasonable. First of all, often you don't need patches, so it's premature optimization. Second, git decided specifically for the actual-state version to have this key-object store, which means you throw away not just the disadvantages but the advantages as well. Last but not least, git ALSO STORES DIFFS when it packs stuff up, which is what happens if big amounts objects need to be transferred or big amounts of old data needs to be stored.

That doesn't mean pijul is a bad tool. I didn't check it out. And I really think for everyday coders who don't want to become git-gurus it would be nice to have a simpler git-like VCS, for instance.

But the marketing must be updated according to the facts. Try to find features that people really care about, e.g. ease of use, e.g. integration with build tooling and docker, e.g. better federation through more automation which makes centralized servers like github go away.


The primary novelty of Pijul is its sound patch theory, not any kind of technical achievement like smaller repository size. I do think it's premature calling it unreasonable without understanding this aspect.


Why not summarize it a little if you feel that info got lost.

Technically if you need a patch you can generate it on the fly by comparing both objects (you can even do that in a bash script with `diff`, if you are willing to type in the logic to look up commit->tree->file->object-name first). So there shouldn't be anything lost. The only parameters I can see with storing diffs vs immutable objects is disk space vs processing time, which is also what git proofs by not storing old immutables and instead store diffs for old stuff (reducing space by increasing processing time).


No, it's actually related to the ease of use and correctness. By design, Git lacks two fundamental properties that would make working with repositories much easier:

1. Git merge is not associative. This results in cases where Alice and Bob work together, Bob pulls Alice's work, and her lines get merged into places she's never seen (blocks of text newly introduced by Bob).

2. More importantly, Git is not commutative. This means that cherry-picking and rebasing are complex operations that change commits' ids. This forces Git users to branch (an extra step that needs to be done before starting to write), but more painfully, it sometimes forces them to do multiple steps of rebasing before being able to merge what they want, or to solve the same conflict again and again after an unlucky cherry-picking.

Of course, all this could maybe be prevented by stricter planning and a more vertical organisation. But this is not how many people write code in 2019. Continuous delivery, for instance, means that teams no longer know long in advance what they are going to over the course of the project. Also, the best developers don't work in vertical teams where they get continuously told what to work on.


Sorry, I was in a hurry and I think I mixed up articles since I think another one recently appeared on HN which pointed to a page with a much better explanation.

The main idea behind Pijul (as I understand it) is that it makes merging divergent branches correct and predictable by making patch application an associative operation.[1] What this means is that that, given patches A, B and C, it should be irrelevant whether you are:

1. starting with A, applying B on top, followed by applying C 2. starting with A, applying the result of C applied on top of B all at once

In other words,

    (AB)C = A(BC)
This isn't always what happens in git because it doesn't work with patches on an abstract level. Instead, it always works with states of the branch heads (and sometimes the state of their branching point, i.e. BASE, in case of a 3-way merge).

As to why this matters, [2] and [3] has practical test cases where the difference of these two approaches is observable. [3] is an example of plausible C code where the non-patch approach may produce an incorrect result. It also demonstrates the way the patch algebraic approach is able to take individual changes into account and apply edits from the second branch in the correct place in the first branch, even though the affected code has moved in the first branch in the meantime.

The focus on patches has other important implications, such as the fact that branches then simply become sets of patches and cherry-picking retains the identity of patches instead of creating new, unique commits.

The other important aspect of Pijul is use of efficient data structures that are naturally suited to the problem in order to avoid suboptimal algorithmic complexity. This is explained in more detail here[4].

[1]: https://pijul.org/manual/why_pijul.html

[2]: https://tahoe-lafs.org/%7Ezooko/badmerge/simple.html

[3]: https://tahoe-lafs.org/%7Ezooko/badmerge/concrete-good-seman...

[4]: https://pijul.org/model/


I disagree with some of that. But first thanks for explaining more in-depth what is behind pijul. The main subthread I started because I didn't understand what it provides. If it can combine easier UX with more efficient diffing, then I think it is a very valuable contribution, actually.

Now some more in-depth bla bla if interesting:

> (AB)C = A(BC)

great feature request, I agree.

> This isn't always what happens in git

correct.

> because it doesn't work with patches on an abstract level. Instead, it always works with states

Incorrect though. States, patches, these are just trade-offs. You can represent either in the other completely. Like you can build a list using a tree structure if you just allow one branch. Or you can also build a tree on top of a list structure, if your traversal algorithm knows which item-index to pick for a certain subtree's children. All trade-offs.

That doesn't mean "Pijul does better diffs" would be wrong, though. It can still be the case. But it doesn't mean that git would need huge refactoring to also implement that better-diff-algorithm. In the end implementing this better algorithm in git might be trivial for a git core developer if you can explain to him how it works.

If you think about it a diff between two states has an unlimited way of being represented. And considering minimal steps to generate the diffs with adding lines to the diff and removing lines from the diff, the whole thing is an abstract tree. Basically to achieve associative patches one needs to make sure to always traverse this tree in the same order. Git traverses greedily though, using the very first diff that is good enough as a final result. Probably the idea behind this was also smart. Do it quickly for now, and optimize it if needed later.


> Incorrect though. States, patches, these are just trade-offs.

I still maintain it is correct to say that git at present does not work with patches on an abstract level. I do not mean to imply by this that patches cannot be recovered from states (they obviously can) nor that it would take a great refactoring of git in order to implement it in git.

On the contrary, now that Pijul has done the hard part of thinking about it and developing it into a theory, it would probably be a very useful addition to git, as you have noticed, if it can get mind share among git developers.

> Probably the idea behind this was also smart. Do it quickly for now, and optimize it if needed later.

The issue is of course whether the property of associativity is useful or not in a correctness sort of sense. IMO, the answer is yes, and I would gladly take a small performance hit in order to have this result.

Also, I've seen one of Pijul's authors claim that it could be made competitive with git with regards to merging performance (in terms of time). It is already quite quick. We'll see.

Another point made by one of the Pijul authors elsewhere in the thread is that Pijul uses novel data structures which enable it to also have commutativity of patches. I'm less clear on how this works, but it essentially means that branches in Pijul become sets of changes, not sequences of changes. In other words,

ABC = ACB = CAB = <any other permutation>

This is what I hinted at when I said commits retain their identity across rebases and cherry-picking. At that point, history stops being important and you only deal with changes as first-class entities. I feel this is enough to justify the claim that Pijul is fundamentally different from git by being patch-centric. I'm also not sure that this could be retrofitted to git as easily.


> Technically if you need a patch you can generate it on the fly by comparing both objects

Automatically generated patches might not work in the long run in certain situations (e.g. when standard 3-line context is repetitive, making the patch applicable at multiple places in the same file). Those patches also suck at conveying actual file changes (how many actual changes start with "- }" or contain lots of context braces?); that eventually prompted me to do lots of split commits in git whose sole purpose is to make automatic diffs more readable (e.g. separate commit for indentation fix).

That brings the question: can Pijul store user-formatted patches?


I'd say maybe check it out first, especially the parts about patch commutation and the problems with associativity in git merges and rebases.




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: