
Pijul version control system - luu
https://pijul.org/
======
zaphar
Looks like a faster darcs. For anyone who has never used darcs a brief
description is in order. Darcs is a revision control system based on something
called "Patch Theory". That gives it some interesting properties. But for the
user this translates into the possibly the clearest easiest to use interface
of _any_ DVCS out there bar none. It's merges are the least painless of them
all in my experience.

t's glaring flaw though is that it has historically been really slow as the
repository grows in size. They've improved this since but it still can't
compete with git and hg on speed. If it were faster though I'd use it in a
heartbeat. Pijul looks to be a refinement of the darcs algorithms and thus
much faster as a result. I'll definitely be watching them as they progress.

~~~
maaku
You didn't really say what "patch theory" is. Can you describe or provide a
link?

~~~
zaphar
[https://en.wikibooks.org/wiki/Understanding_Darcs/Patch_theo...](https://en.wikibooks.org/wiki/Understanding_Darcs/Patch_theory)

------
skywhopper
Sounds like it aims to be a (much) faster darcs. But between the name and the
fact that they are trying to maintain two separate codebases (OCaml and Scala)
that both do the same thing, I'm not super optimistic about its prospects.

~~~
gecko
That should make you _more_ optimistic! They want to make sure that the data
format is well-specified from the very beginning, ensuring that there can be a
third-party ecosystem that doesn't have the issue Git has right now where the
only real definitive definition of Git's data formats is Git, yet people
expect libgit2, JGit, etc. to work.

------
anton_gogolev
> ...snapshot-based systems can be extremely fast, but require an in-depth
> knowledge of their internals

This is false, as demonstrated by Mercurial.

~~~
Grue3
I dunno, Mercurial Queues are pure black magic that only the most advanced
dark wizards can possibly understand.

~~~
nsm
Please don't use mercurial queues. histedit, rebase and the recent changeset
evolution extension allow you to do everything while having first class
history and merging support. Also far easier to understand than git. Useful
links (first 2 slightly Mozilla specific):

[http://ahal.ca/blog/2014/new-mercurial-
workflow/](http://ahal.ca/blog/2014/new-mercurial-workflow/)
[http://ahal.ca/blog/2015/new-mercurial-workflow-
part-2/](http://ahal.ca/blog/2015/new-mercurial-workflow-part-2/)
[http://evolution.experimentalworks.net/doc/](http://evolution.experimentalworks.net/doc/)

------
jpgvm
Couldn't easily work out how it compares to git/svn/mercurial/darcs/bazaar.

A comparison to any one of those would do wonders for the landing page.

~~~
gecko
It presents the same conceptual UI as Darcs (it's patch-based, not snapshot-
based), but provides the performance of Git and Mercurial. It does that by
changing the internal storage of how patches work compared to Darcs. If you
know Darcs, that's your entire sales pitch: Darcs, but now performant.

For everyone else: In Git and Mercurial (and Bazaar, Monotone, Subversion,
tla, Fossil, TFS2, and others I'm forgetting), the state is always represented
conceptually as a snapshot of the current revision and a pointer to the
snapshot of previous revision or revisions, so if you identify that you're at
"snapshot 590", and I'm also at "snapshot 590", we're both at the same thing,
and both have exactly the same history of how we got there.

In a patch-based system, like Pijul or Darcs, that's not true. The state is
instead represented by _the set of applied patches_. These usually have an
implicit ordering, but you and I can theoretically have the same patches
applied and have a different ordering. (We also can _briefly_ have different
resolutions to merge conflicts that result from that, and Darcs' historical
inability to deal efficiently with you and me resolving conflicts differently
is a major reason it doesn't scale well.)

That sounds weird, but can have major benefits. For example, I can now sanely
cherry-pick a bug fix from you regardless of the history of that bugfix--and,
unlike in Git and Mercurial, Darcs/Pijul will know that the patch I cherry-
picked from you is _the exact same one_ that you have on your branch. No
duplication, Cherry-Pick: messages, or merge collisions will result if I later
pull in your whole branch (or vice-versa). You can also do things like have a
patch that has all of your security settings, but always simply not push that
when you push out to a public server. (N.B., giving this as an example, not
best-practice.)

If Pijul works, and can provide Darcs' model in an efficient way, it'd be
really awesome. I'm not sure whether it'd be better than Mercurial or Git's
model at scale, but it's been really hard to answer that question historically
when Darcs itself didn't scale well. Pijul could change that.

~~~
jerf
"That sounds weird, but can have major benefits."

I give git training at work with some frequency. I now briefly discuss this
sort of patch-based workflow, explicitly so I can tell people it's not how git
works, because in my experience, it's what most people naively expect out of a
version control system. Same problem when people try to work out workflows for
git. I hypothesize that the reason for the profusion of git workflows is
precisely that we _really_ want a patch-based system. Or, in other words, it's
not really weird; snapshot-based systems may be weird.

It never comes up in SVN or similarly klunky systems, because they're too
incidentally complex to notice the underlying essential mismatch. Git and
friends, to their credit, made the incidental complexity go away, but I think
the essential complexity of their approach is still higher than it ought to
be. I wish pijul all the best, because there _is_ room for improvement here.

And may I suggest to the pijul team that they really, really want a Pijul-hub
as soon as they think they're even remotely ready.

~~~
gecko

        I give git training at work with some frequency. I now briefly discuss this 
        sort of patch-based workflow, explicitly so I can tell people it's not how
        git works, because in my experience, it's what most people naively expect
        out of a version control system. Same problem when people try to work out 
        workflows for git. I hypothesize that the reason for the profusion of git 
        workflows is precisely that we really want a patch-based system. Or, in
        other words, it's not really weird; snapshot-based systems may be weird.
    

I actually agree. I advocated for DVCS-based workflows for a long time, but I
think it speaks volumes that I "got" Darcs within about 20 minutes of first
seeing it and playing with it, but I know for a fact that, somewhere in the
Freenode archives, you can find me saying "The frak is Mercurial? The frak is
Git? The frak is this?" as I tried to grok what on Earth they were doing--and
this after having taught myself tla!

That said, just because _something is intuitive_ doesn't necessarily mean that
_something is best engineering discipline_. I want a Darcs-like workflow to be
the dominant one specifically because it's intuitive, but I'm very open to the
fact that it might be a really, _really_ crappy way to build a sane (forget
performant) large code base.

Or it may be exactly what we've always wanted.

The simple fact is that I have no idea. This industry, for all the "science"
in CS, has got to be the least results-based discipline I'm personally aware
of. Pijul will at least permit anecdotal tests of how well patch-based systems
scale _as a workflow_ if it takes off, but I'm not holding my breath on
someone doing a genuine A/B productivity/trade-off study.

------
dchest
SHA-1 and 32-bit time for a new project in 2015?

~~~
nsm
I understand the 32-bit time not being a good idea, but why is SHA-1 bad for a
non-cryptographic application?

~~~
dchest
Because a hash function in version control system is a cryptographic
application — it's important to be able to verify the integrity of repository.

~~~
crpatino
I think the GP question is not trivial.

The rationale behind SHA-1 being obsolete is that it will be used for password
protection. The back end stores Hash(password+salt) and uses it to grant
access. Finding _any_ password_prime which causes a collision with password on
Hash(p+salt) will brake the security of that scheme.

In this case, instead, the problem from the point of view of the attacker is
to find repository_prime such that...

1\. Is a syntactically correct program...

2\. It is syntactically and semantically close enough to repository_orig that
it will not be discovered by simple inspection.

3\. The delta between repository_prime and repostory_orig causes a useful side
effect in the program behavior (from the point of view of the attacker).

4\. The delta between repository_prime and repostory_orig does not introduce
other significant and unintended side effects that will trigger investigation
beyond simple inspection from some legitimate maintainer.

I will say that this is a much higher bar to cross than merely "no collision
shall be found". And if you are concerned that SHA-1 is not enough to protect
against such attack, you probably have to consider that SHA-2 may not be
enough either... You probably need to use half a dozen of cryptographically
strong hash functions, preferably based on different principles, so that we
ensure that at no possible delta can simultaneously fulfill all the 4 points
above for every hash function.

And if you have reached that level of paranoia, you need to consider that the
adversary will simply not bother and chose a different attack vector, like
exploiting the bureaucracy of your commit process...

~~~
dchest
First of all, this has nothing to do with passwords.

Indeed, this may be a higher bar to cross, but higher bars are usually being
crossed. For years we known that RC4 had biases, but it was considered okay
until we discovered that it was not okay after all. MD5 collisions led to
rogue certificates.

It's not correct that SHA-2 is not enough to fix the issue. While they share
some design structure, SHA-2 is not broken, while SHA-1 is. Also, the proposal
for multiple hash functions isn't particularly good: collision resistance of
many cascaded hash functions is not much better than the maximal resistance of
one hash function used in the cascade.

It doesn't make sense to use a broken cryptographic hash function in a new
project that does need cryptography. There are faster modern cryptographic
hashes than SHA-1. And it doesn't make sense to use a broken cryptographic
hash function in a new project that doesn't need cryptography. There are
faster modern non-cryptographic hashes than SHA-1.

To be fair, I don't know why we even have a discussion about this if
cryptographers say that SHA-1 shouldn't be used, and you're trying to defend
this choice for exactly what reason?

~~~
philsnow
> SHA-2 is not broken, while SHA-1 is

this kind of bare statement benefits greatly from a citation

> collision resistance of many cascaded hash functions is not much better than
> the maximal resistance of one hash function used in the cascade

I don't think GP is suggesting including results of hashes in the contents to
be hashed by subsequent hash functions, but rather calculating MD5(contents),
SHA1(contents), SHA2(contents), FOO4(contents) etc, and then concatenating all
the hashes together. Or am I misunderstanding and this kind of scheme is
exactly the "cascading" that you're talking about ?

~~~
dchest
> this kind of bare statement benefits greatly from a citation

Seriously?

> I don't think GP is suggesting including results of hashes in the contents
> to be hashed by subsequent hash functions, but rather calculating
> MD5(contents), SHA1(contents), SHA2(contents), FOO4(contents) etc, and then
> concatenating all the hashes together. Or am I misunderstanding and this
> kind of scheme is exactly the "cascading" that you're talking about ?

Yes, this is cascading. See this paper:
[https://www.iacr.org/archive/crypto2004/31520306/multicollis...](https://www.iacr.org/archive/crypto2004/31520306/multicollisions.pdf)

------
andersonmvd
Is git that broken? To focus on replacing VCS, you should have a big reason
for that I think. I'm just curious.

~~~
krupan
git is an example of an amazing and powerful new VCS paradigm (distributed),
but I sure hope we are not done making better and better tools, like, ever.

~~~
andersonmvd
I also hope we are not done making better and better tools, but to make
something better, you need find some gap in current technology and fix it.
That's what I am asking: which are the gaps fixed by this VCS? Or in simple
terms, why should I switch?

~~~
WorldMaker
Git is a representation of history as a directed acyclic graph (DAG) of
commits full of trees that are the state of the repository at the time of the
commit. That DAG is strictly ordered (commits point to their parents and this
is part of their identity so changing a commit parent makes a new incompatible
commit).

Darcs (which predates git, by the way) represents a repository as a loosely
ordered set of patches (just the changes, not the full state at any given
time). Darcs works to know how to rearrange patches in time to create new
states. The difference/benefit to this approach is best illustrated with
cherry-picking: in Darcs it is very easy to state that I want the patches my
fellow developer created for modules X and Y, but I don't think he's done with
module Z yet so I'm going to skip those. The patches can be interleaved and
recorded in any order and Darcs will do all the hard work of figuring out how
to grab just the ones I want, and do so without changing the physical identity
of those patches (they are still the same patches my coworker developed). When
this works, and it works 98% of the time without an issue, it is a wonderful
magic. (Contrast this to git cherry-pick which builds new commits and is
likely to have future merge issues with the commits it was built from, even
those commits are somewhat "the same".)

Pijul is an attempt at getting some of the same magic of a patch-oriented
approach (and smarter patch merging) in a post-git world.

Also, it's definitely early to talk about switching to Pijul.

------
gosukiwi
"Pijul" sounds like "dick" in spanish lunfardo :p

------
ebbv
This seems like a classic case of novice developers thinking it's a great idea
to re-invent the wheel. The page gives no compelling reason why I'd want to
adopt this over the much more mature solutions out there.

I say they are novice because they are maintaining two parallel codebases,
which is a mistake no experienced developer would make.

~~~
gecko
I disagree. Having maintained a massive system that had to go deep into Git's
internal data structures, I will tell you that, regardless of what Git's docs
might tell you, the actual documentation for most of Git's file formats is,
"Whatever the Git commands actually output." This is part of why a perfectly
legal Git repository (as far as Git itself is concerned) can sometimes crash
libgit2 and JGit.

By the Pijul team forcing themselves to have two different implementations in
lockstep, they will probably do a very good job avoiding that problem.

~~~
ebbv
That is the most ass backwards reasoning I've ever heard.

In order to force yourself to stick to a reliable standard repository format,
maintain two codebases at once.

What?

How about just publish a standard and abide by it.

