Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Pijul – The Mathematically Sound Version Control System Written in Rust (initialcommit.com)
92 points by initialcommit on Nov 29, 2020 | hide | past | favorite | 64 comments


I really want to use Pijul, but programming is a team sport. And especially open source.

The Pijul equivelent of GitHub is Nest. It's just not there yet. Firstly, there are basically no potential colaborators for the areas i work on. Secondly, it is much less mature ass a platform. In 5 minutes i couldn't find any way to "browse repositories" inn any sense, to discover what is out there. Discussions are not fleshed out as a issue tracker (no tags for a start), though that could be dealt with by using a 3rd party issue tracker.

Pijul's whole setup seems like it would solve a bunch of problems I have with git. 1. that a commit maintains its own identity when cherrypicked onto another branch, and its intrinstically linked. Vs in Git where it is a unrelated (but identical) set of changed. 2. the whole merge/rebase dichotomy. In git most people try annd avoid merging the main branch into the feature branch, because it leads to messy history and hard to understand where changes come from, and prefer to instead do rebase. But for a merge you only need to resolve conflicts once. For a rebase you need to resolve them potentially again and again even if those conflicts never would matter to the final version (e.g. if the file that is conflicting has been deleted before HEAD) But Pijul doesn't have seperate merge and rebase because changed in Pijul commute. Which is a mind twise of an idea. But they say they solved it, so nice.


> Firstly, there are basically no potential colaborators for the areas i work on

That's correct. On the other hand, this is a network effect, and the Nest, in its current incarnation, is only 3 weeks old. It's getting better every day, though.

The two main features I'm working on at the moment are:

- CI/CD. This is almost ready to go, it is "sort of" working. One issue that I fixed recently is that Pijul versions are tricky to compute: change A, followed by B, is the same state (guaranteed by design) as B, followed by A. Our version identifiers handle that now (try `pijul log --state`).

- Social features, like discoverability, recommendation of collaborations. We're really just getting started at this, and figuring out what works best.

I should add that until two weeks ago, the Nest would stop accepting connections at random times, or crash. It's much more stable now, and we can finally think of adding features.

> Secondly, it is much less mature ass a platform

There are tags for issues. You can configure them in the "Admin" page of your repositories. They're not used very much for Pijul itself, because most issues so far have been closed within minutes or hours after their creation.

> Which is a mind twise of an idea. But they say they solved it, so nice.

As one of the authors, I can tell you it wasn't easy to get everything to work together, especially for conflicts. But the theory is clear and sound now (and I have good confidence that the implementation is, too): any two patches that could have been created independently can be pushed to a remote repository independently from each other.


Sounds super exciting; thanks for pushing the boundaries on something new and potentially so much better!

Please don’t get discouraged by comments pointing out the current limitations of the platform — it just means that the core ideas are interesting/good enough for people to bother critiquing the ancillary aspects. Good tools & platforms take time to build and polish, so kudos and keep at it.


> Please don’t get discouraged by comments pointing out the current limitations of the platform

Thanks for the kind message. I totally agree, and didn't take the comment as an offense at all (the reason I reminded the relative youth of Pijul is that I know from experience that some people expect Pijul to have many more users than it currently has).


Exciting. I definitely looked at Nest 6months or so ago as well. I think I might remember it crashing then. Cool to hear it's had an overhaul.

Definitely excited that things are continuing to progress.

I really wish I had capacity to try it out for a new project. But that would require two units of capacity, both having time for a new project, and having time to do it on a new platform.

Still, I can dream.


Programming can be a team sport, but doesn't have to be. It doesn't even have to be a sport.


You can use git rerere to avoid the repeated rebase conflicts.


As an obsessive rebaser, in practice I've found that rerere very rarely kicks in during a rebaser unless the conflict looks exactly the same across commit applications, which is rarely the case.


Pijul would presumably fit well with the development style encouraged by Sourcehut, if you need such a thing. I've never had time to consider integrating Darcs with it.


It would be nice if the authors could provide practical examples of their features, instead of a theoretical talk about mathematical soundness.

Ie; how does this make my life, as a programmer, easier?


As far as I understand, the main change with respect to Git is that commit hashes don't change depending on previous history, so if you rebase the same change IDs are retained.

Personally I've never had an issue with conflicts after cherry-picking, if Git sees the same change in two branches it manages it correctly most of the time. Furthermore, I like that commit ID changes when previous history is changed: a commit ID refers to a certain state of the repository. I don't like that two repositories in different states could have the same commit ID.


> As far as I understand, the main change with respect to Git is that commit hashes don't change depending on previous history, so if you rebase the same change IDs are retained.

That is not the main change, merely a by-product. There are many changes:

- Perhaps one of the most crucial differences is that Pijul uses a rigorous merge algorithm, and doesn't run into the bad merge examples described in [1]. Note that the situation described there happens without any conflict, and doesn't tell anything to the user.

- Changes that could be written independently commute, meaning that they can be pushed independently. You don't have to worry about isolating them on feature branches, which can save time if you (like me) work on many features at the same time.

- Rebase and Merge are the same operation in Pijul, and happens automatically. Conflicts happen between changes, and conflict resolutions are recorded as changes, so you don't need any `git rerere` (which essentially guesses what to do), since conflict resolutions work even in a different context.

- Everything is invertible! You can even remove an old change from a channel if you like, as long as other changes don't depend on it.

> I don't like that two repositories in different states could have the same commit ID.

That never happens in Pijul. Changes are not commits, they're diffs between commits. And states have identifiers too, they're just more subtle: if you have two changes A and B, Pijul can apply them in any order without changing the result (so why change the commit id?). In Pijul, if you run `pijul log --state`, on "A, then B" and "B, then A", you will get the same state id.

[1] https://pijul.org/manual/why_pijul.html


If the order of changes does not matter and identifiers do not depend on previous changes, ist there still a way to uniquely identify a specific set of changes? With git, I like to include the commit ref in release binaries which is great for debugging since I can just git checkout and see the exact form of the code running on the server. Is there something similar in pijul?


From reading the blog posts/source code, I can explain how it works.

Here’s a simple way to give a set of changes (a state) an id:

1. Take the hashes of each change

2. XOR them together

3. This is your state ID.

The problem is that because of the way you’ve made it (basically adding a bunch of vectors in a vector space over F2), one could use Gaussian elimination to construct any given ID without it being of the intended set of changes. Therefore pijul uses a process which works similarly but in a more cryptographically secure way which makes forging state IDs harder.


I assume it's the same as with Darcs, where you would record a tag patch. The darcs --exact-version option prints that for a released version of darcs.


The one potential effect of this that I'd love over git is being able to check in local changes that I never intend to upstream. E.g. I could have a local change adding some editor config files, or modifying some compiler flags in the Makefile to suit my local config.

With git my typical solution for this is to basically keep rebasing them as uncommitted changes with `git stash`, until I inevitably lose them. Or keep a separate branch, and keep explicitly rebasing (and stay on my toes not to accidentally push the commits).


Ooh, yes, I have this problem with Git too. I maintain it mostly by keeping stuff constantly unstaged, but there’s a lot of git tooling that doesn’t respect my working area and dumps things in the wrong place. (And there’s a lot of people who don’t understand the point of staging at all…)


> and stay on my toes not to accidentally push the commits

You could just set the upstream of the branch to a remote that will always fail pushing to, which depending on how you usually push should make it very hard to accidentally push something. Always worked well for me.


Why don’t you add those files to .gitingore?


That only works the way you'd naively expect if the files aren't already tracked. To be explicit: gitignore has no effect at all on already-tracked files. So you end up depending on the upstream correctly foreseeing and working around this problem, or you're sunk.

If the give you say a Makefile that's not correctly parameterized for you (doesn't let you adapt to your system via env variables or whatever), there's not a lot of good options unless you fix that yourself and get upstream to accept it.


In the case of a makefile already tracked with parameters incompatible with your system, fixing it to depend on some sort of input seems like something you’d want committed upstream. The fact that that’s the best option practically seems good to me, as I think it’s the best option architecturally. I don’t think it’s great to have a file committed to an upstream repo that is too particular about the machine being used when shared among multiple people. It’s better to make it so you can share it with people on different machines, and put those changes in the repo so others aren’t repeating that same work on their local machines. If you have an unresponsive upstream repo maintainer, that’s a whole problem in and of itself. Seems better to fork it in that kind of a situation rather than have a local bandaid.

In the case of editor config files already tracked in the repo, I think it makes sense to remove those from being tracked, add to the gitignore, and push that upstream. Unless dealing with a team that agrees to use the same editor and wants to share settings, having editor settings committed to the repo seems like a mistake that should be corrected upstream. Same thing applies as in previous situation if the upstream maintainer is unresponsive.


With Darcs you might put some tag in the patch name which would be caught by a hook if it was accidentally pushed.


This is incredibly useful for a project like Nixpkgs.


Not having to worry about the order of patches would (except where they explicitly depend on each other) is a really nice property to have. One consequence of this property, alluded to in the article, is that if I understand correctly, rebasing and merging effectively become the same operation. In Git, the difference between rebasing and merging is the resulting dependency graph of the commits. However, in Pijul, there is no dependency graph, at least not based on the order in which the commits are applied, so rebase and merge are equivalent.

For example, have you ever been doing code-review on a Github pull request and made an inline comment on a line of code, only to have that comment later invalidated by a rebase? That happens because the commit hashes were changed by the rebase, so the commit hash referenced by that comment is no longer contained in the pull request. There's probably a new commit with a new commit hash representing the same change, but there doesn't have to be. So Github can no longer be certain of where to place that comment, and it can't tell you whether that comment is still relevant to the pull request in its current state. (IIRC, Github continues to show such comments but marks them as associated with commits that are no longer in the branch, which makes it hard to know if they're still applicable.)

In Pijul, I'm pretty sure this problem simply doesn't exist. A hypothetical "Pijulhub" that implemented the same "comment on changes" feature could certainly have comments on old changes that subsequently get removed from the pull request, but if that happens, then "Pijulhub" can confidently declare that those comments are no longer relevant to the current state. On the other hand, if some version of the same change is still present in the current state of the pull request, then "Pijulhub" can confidently continue to include that comment, still associated with that same change, because even after modifying the pull request, that change still has the same hash, because it still represents the exact same change.


About your guess on the Pijul hub: nest.pijul.com does something nicer than PRs, which is to attach changes independent from the context, to discussions.

See the following discussion, for example: https://nest.pijul.com/pijul/pijul/discussions/104

These two contributors (@cole-h and @lowenheim) are trying to add colours to the output of `pijul change`, and discuss changes, rewrite them, amend them, push them again. Certainly, some of these changes conflict with others. But that's totally ok, since you can pull them independently.

For example, when preparing release 1.0.0-alpha.9, I didn't pull the latest patch from that discussion into my main local channel, and I preferred to pull some of the older ones instead. I did this because the latest one doesn't detect whether the standard output is a TTY. If I apply them to the main channel on the Nest, my current local channel will be in sync with it, even though I've worked on ten other features since then.


Yeah, this is why I talked about a hypothetical "Pijulhub", instead of talking about any real website. The hypothetical website I'm comparing to would be one that replicates Github's features as closely as possible but uses Pijul as the VCS. But as you say, the real answer is that a hub for Pijul should be built around the data model and features of Pijul, not those of Git.


> A hypothetical "Pijulhub" that implemented the same "comment on changes"

A "pijulhub" ready exists and it's called "Nest". There is still a lot of work to do but at least it has a name :-)


I'd especially be interested in examples regarding multiple channels / branches (one development and several production branches), and then backporting a fix into older versions.

In the conflict free use case I expect to just notify Pijul that this fix now belongs to not only to the dev-master, but also and rel.22, rel.21 etc. - in contrast to switching to every release branch and cherry-picking it with git.

How is the workflow if this applies cleanly to rel.22, but not rel.21 anymore? What if the same conflict happens in rel.20, rel.19 and rel.18?

If I (in git speak) "--amend" the original fix, could I automatically include this in the release branches?

What if the conflict resolution changes every single line, but I still want to record that, "Yes, this fix is also in rel.17, even though it does not look like it".


> I'd especially be interested in examples regarding multiple channels / branches (one development and several production branches), and then backporting a fix into older versions.

I've been maintaining to versions of my SSH library for Rust, called Thrussh (https://nest.pijul.com/pijul/thrussh).

For some context, the Rust asynchronous ecosystem is undergoing a change, with a fundamental library called Tokio going from version 0.2 to 0.3. Some crates have made the change, others haven't. So if you're like me, and some of your code implements both an SSH server and an HTTP server on the same event loop, you might want to stay with Tokio 0.2 (since Hyper, the main HTTP library, hasn't published the move yet), while allowing others to benefit from all the bugfixes you find, and start transitioning now.

The cool thing about doing this in Pijul is, you can work on your Tokio-0.2 version, and push your changes directly to your Tokio-0.3 version without having to care about cherry-picks, merges and rebases explicitly. Do this locally, test, publish.

> How is the workflow if this applies cleanly to rel.22, but not rel.21 anymore? What if the same conflict happens in rel.20, rel.19 and rel.18?

Good question. In Pijul, conflicts occur between changes (not between commits or states), and are solved by changes. This is in contrast with e.g. `git rerere`, since the solution to a conflict depends only on the changes that introduced it, not on anything else. This means that once you've solved the conflict in rel.21, you can push it to rel.20, rel.19 (it's the exact same change, with the same hash).

> What if the conflict resolution changes every single line, but I still want to record that, "Yes, this fix is also in rel.17, even though it does not look like it".

Excellent question. Two answers:

- If you're really doing this, then your intention cannot really be guessed from your change, which sort of defeats the point of using changes, and you will get another conflict. If you fix that conflict again though, that fix will apply to any "branch" (Pijul users call these "channels" because they are not exactly the same as Git branches) that has the two changes.

- There is ongoing research to extend the theory to allow large restructuring of code to commute with edits to that code. Note that no other system handles that well: Git and Mercurial don't even consider commutation (but their merge heuristics handle refactoring correctly sometimes), Darcs doesn't do it at all. Since our goal is to get something that works 100% of the time, we have to be a bit more careful.


Thank you for the explanations!

> In Pijul, conflicts occur between changes (not between commits or states), and are solved by changes.

So some channels contains the changeset (Fix, ..), and if adding that change to a channel creates a conflict, the resolution results in the set (pre_Fix, Fix, ..) on that channel. Then adding "Fix" to another channel will automatically pull in the "dependent change" "pre_Foo" as well, if applicable? Or is a separate "new_Fix" created?

Can a link between two changes be created manually, even though the delta does not require it? E.g. to attach an empty commit, if these are even possible in Pijul (compare git's "--allow-empty"). Or, more reasonably, to record logical dependencies which do not show up in the code, such a separate changes for testing vs. the code itself.


> Then adding "Fix" to another channel will automatically pull in the "dependent change" "pre_Foo" as well, if applicable? Or is a separate "new_Fix" created?

I'm not 100% certain of which is which in your sentence, but if Fix and Fox are in conflict (maybe Fix is Alice's fix, and Fox is Bob's), and they are both pushed to the same channel, we need to solve the conflict. We do that by creating a change, called "Resol".

Now, if you push Fix to another channel (that doesn't have Fox), that doesn't necessarily pull in Fox and Resol. If you push Fix and Fox, you'll get the exact same conflict. But instead of solving it again, you can simply push Resol, and the conflict will be solved.

> Can a link between two changes be created manually, even though the delta does not require it? E.g. to attach an empty commit, if these are even possible in Pijul (compare git's "--allow-empty"). Or, more reasonably, to record logical dependencies which do not show up in the code, such a separate changes for testing vs. the code itself.

Absolutely, the syntax for that is, when presented with a change draft when running `pijul record`, add a line of the following form:

[] BQDE4VH6OZHULHAOP37GSBJMCI5QIFSOU6COBYKPAY6FPH3IMISAC

To the dependencies section (that is, empty square brackets, a space, and the hash of the change you want to add to the dependencies).

We can even imagine adding dependencies automatically with external tools via record-time hooks, taking the specific programming language into account.


(right, I switched between "Foo" and "Fix" in the comment, sorry)

> instead of solving it again, you can simply push Resol, and the conflict will be solved.

Oh noes, so no obscure "pijul rerere" to learn!? I mean, very nice!

> pijul record

I see, that's what that field is for - I tried it briefly when pijul had the interim name and wondered why it was listed and editable in the record editor.


Just reading the "abstract", this sounds way more complicated than anyone needs a VCS to be. I also question the "mathematically sound" bit - the "commutative changes/diff" thing sounds really far fetched.

Also, the fact it is written in rust means nothing to me. The whole "but rust doesn't segfault!" thing is, and has always been, a ridiculous reason why rust is somehow inherently better suited for such projects. The article keeps leaning on that fact as though it makes the project superior somehow, but it just feels like grandstanding.

The architecture section is just buzzwords. It doesn't explain the technical aspects, it doesn't allow me to understand the system, and then I'm immediately dropped into "how to use the CLI". But wait, you haven't sold me yet.

This was painful to read.


I don't read it like that at all. Rust is mentioned only once. Nevertheless I do care if a tool I am considering to use is written in JavaScript, Rust or some esoteric language. It means a lot, not only for "safety" which is not important to you that much, but also for maintainability, community, and the future of the project. Of course, what triggers you probably is mentioning Rust in the title of the post. Still hard to imagine being so annoyed by it–if authors thinks it's important or it will grab attention, why not use it...


We might not have read the same article.

> Also, the fact it is written in rust means nothing to me.

I'm one of the authors of Pijul, and the author of that post asked me a few questions when preparing that post, and that is addressed there: https://initialcommit.com/blog/pijul-creator (see "Why did you choose to write Pijul in Rust?").

> It doesn't explain the technical aspects, it doesn't allow me to understand the system

You can have plenty of that there: https://pijul.org/manual/theory.html.

> This was painful to read.

I strongly disagree. Explaining the goals of this project isn't always easy, because both large complex projects and complete beginners can benefit from using a rigorous mathematical modeling. I guess this blog post leans more towards the "beginners" side, whereas the "Theory" page linked above would be more appealing to power users. Both are useful and important, they just appeal to different people.


The Rust part I think is meaningful. It means we can (with reasonable probability) expect the software to be fast and lightweight in terms of memory usage, and it means it’s probably a lot easier for new contributors to jump in than an equivalent C or C++ codebase. That last bit is about a lot more than memory safety. Rust is basically designed from the ground up to make it relatively easy to onboard a new engineer safely onto a project in comparison to legacy systems languages.


And abandoned quite quickly, like most Rust projects.


> And abandoned quite quickly, like most projects.

Amended your statement slightly. If anything, there's a fairly high incidence of successful projects coming out of the Rust world.


Honestly as long as it's not written in JavaScript (I like not having npm and node on my machine) or C (not abstract enough) I'm happy to use it. For small tool at least.


> Just reading the "abstract", this sounds way more complicated than anyone needs a VCS to be. I also question the "mathematically sound" bit - the "commutative changes/diff" thing sounds really far fetched.

Care to say anything more quantitative and less feel-y? Because you're refuting pretty much the entire premise of pijul (which clearly exists, works, and has been somewhat rigorously designed) with nothing but adjectives. So having a concrete counterpoint would be helpful because pijul is just sitting there with very concrete counterpoints to your entire comment :)


Yes, but it is written in Rust.


Is there any hope in Pijul of _any_ kind of interoperability with git at all?

Back when there used to be some projects still stuck on svn that I wanted to work on, I was able to use git locally and just kind of "publish" via svn when I was done.

Would anything like that be possible? That'd be the killer thing for me to be able to give Pijul a real try.


Yes! The `pijul git` command allows you to import Pijul repositories by replaying their history. It also works incrementally, meaning that if you already imported a repository, you'll be able to continue new commits.

This uses a somewhat naive way of doing things, and can take a very long time on large repositories. One other way of doing it would be to add an "initial change" explicitly saying "I'm coming from Git commit number #SHA1HASH".


Thanks! I'll definitely give that a try, just looking at the docs real quick it sounds pretty doable.

I can think of at least twice in the last month that I _think_ this would have saved me some effort, be interesting to give it a go. Cheers!


For what it's worth, you can do that with Darcs import/export. I've kept parallel git and hg repos synced that way with a cron job; it could be done with a hook, but that slowed down pushes too much in my case. (Doing that basically assumes one branch, i.e. linear development in Darcs.)


This.


With this latest release does Pijul now have an extensive test harness to prevent the corruption & data loss in repositories that people have been experiencing for years? I know pre-1.0 versions had low visibility warnings that you shouldn't actually use Pijul for any code you cared about, but I'd assume that is no longer true for a 1.0 version. I've been very interested as an on-again off-again Darcs user, but when people were reporting that it destroyed their data it was a kind of scary step to take.


> low visibility warnings

The versions were called 0.x, each blog post said "this is experimental", and there was even a blinking line on the front page of nest.pijul.com. Maybe we should have printed a notice on the command line tool itself ;-)

More seriously, one major issue with the 0.x versions was that bad performance meant we couldn't really test it massively. But now we can, and actually one of my first tests was to try and import the history of Nixpkgs, and run massive checks for data loss after every operation. It works now.


As scalability was one of the design goals. are there any comparisons of performance with huge repos?


I asked before, but didn't get an answer. Darcs is criticized for lack of soundness and potential exponential merge complexity -- which I'm not sure I've actually run into -- but that's Darcs2. I haven't actually followed along, but I thought the development Darcs3 is an answer. This may not be the place to ask, but can anyone compare the new Pijul and Darcs3?


Shall we talk about the name? Or does nobody care about the impact of subconscious bias in adoption of dev tools and ai should shut up ? :-)


What about it? Searching for "pijul -vcs" shows images of birds, which is reasonable?


1- I’m not sure how to pronounce it (maybe it s just me), ‘j’ has a lot of depth consonent-wise (https://en.m.wikipedia.org/wiki/Voiced_palatal_approximant) 2- what’s the story behind it (strong believer in the importance of symbols and what they attach to) 3- what is it supposed to make me think of? A pidgeon? So it’s everywhere and defecate randomly on people?

I must also postface: I understand it’s a spanish word meant to evoke a bird (what for?) and maybe I’m the only one that believes that simple evocative name change anything, but hey, here you are in that thread :-p


For 1, the name of the bird is Spanish so you could pronounce the j as https://en.m.wikipedia.org/wiki/Voiceless_velar_fricative

Alternatively you could just read it without that knowledge and pronounce it however you like.

As far as I’m aware it follows a theme of other bird names for other parts of the project.

It seems fine compared to eg mercurial (if you spend a lot of time with it you go mad?) or git (the word you mutter under your breath as you use it?)


Even if we disregard the riddle of its meaning, I have no idea how to pronounce the name.

Pee? Pye? Then, a "jay" sound, or a Spanish "junta", or a French "Jules"? And then how is the "u" pronounced?

I'm confused. Pidgel?


Yea I was searching to see if anyone else would mention this, especially seeing the uproar about GIMP, etc. I don’t care about the naming dispute stuff, but this is comically close to pee-hole.


what's wrong with the name?


"One of Pijul's goals is to minimize the number of commands" Shut up and take my money.


This write up is very kind to Git (vis a vis one of it pros):

> Intuitive method and interface for version tracking



Does this allow for semantic, non-text diffing? This would allow for code formats without syntax errors (i.e. storing the valid AST in binary as separate, likely structured in binary, from the editor representation).

Diffs would then be truly semantic, representing semantic refactorings (e.g. "extract method to function", "rename symbol") at the patch level. you could easily then query "when did this variable show up" even post-refactor where the variable moves modules. Sure this has its limits but it'd push the tooling to levels virtually impossible to automate with text-based code now.

A man can dream!


It does allow that, to some extent: by changing the diff algorithm. On thing you can do in a diff algorithm for Pijul (not written yet, but totally possible) is to treat whitespace as their own binary blocks, so that reformatting commutes with other changes.


By that logic, so does Git (disregarding fundamental Git vs Pijul differences).


Darcs has "replace" for that, the only other patch type that's been implemented for it as far as I know. I think Toolpack provided that for Fortran in the 1980s, but I don't remember for sure whether its version control was AST-based latterly (and, of course, it wasn't networked, let alone distributed).




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: