[Edit] Interestingly, they cover that. I suppose if you're just running Pijul rather than integrating with its code, it might be safe to use in a corporate environment. Still, it's likely to be offputting.
I feel that AGPL for a product is misunderstood and pre-rejected without justification by far too many. Now, if you are developing a library, then AGPL sucks as a license for it, but for a cohesive product? Seems acceptable to me (not sure if it's my favorite choice)
Isn't that precisely the parent comment's point?
Perhaps it shouldn't be this way, but the argument is that the AGPL will make pijul "untouchable" by businesses, de facto.
Anyway I don't really care anymore. We act based on what we believe not on facts (this is not directed to any parent comments, to be clear). Whatever
You're saying AGPL shouldn't be offputting. He's saying that by some quirk of circumstance, it is.
In other words, his argument is that picking AGPL was a poor strategic decision, given its (admittedly undeserved) baggage.
This sounds like taking some of Pijul's source code and putting it inside another project. That would certainly have business implications, which justifies ninjas. It's possible to use AGPL as a business strategy (I worked at a company whose main product used CPAL[1] which has a similar network-use clause); but such decisions should not be made via VCS commit.
Of course, if the choice of AGPL prevents a business from reusing Pijul's code then presumably that's why they chose AGPL. That's kind of the point of copyleft.
I imagine very few businesses would care about Pijul's source code though. If Pijul matures into a compelling tool, then the relevant phrase would be "if you used an AGPL command"; no need for ninjas there, unless (as others note) you're building a PijulHub or something.
If you say "we don't plan to patch the code, just use the binaries as-is" then you're asserting that the software today your needs forever into the future. That's a terribly foolish bet.
A business founded on a principle of radical openness might be compatible with AGPL. But any business that wants to have some internal software (HR, etc) is well advised to stay the hell away from AGPL.
Yes, but this has nothing in particular to do with the AGPL and everything to do with copyright laws. There is no licence except public domain/CC-0 that allows BigCorp to incorporate other people's code without any obligations whatsoever (attribution, at least). Copyright laws forbid such incorporation by default.
I agree; one of the many reasons I'm against proprietary software is that it forces users into this helpless situation.
I don't quite get the "internal barriers" idea; is it common for companies to mix together code from multiple projects, including ones they don't own, such that it's difficult to disentangle them? Regarding your example, why would copy/pasting code from an internal code search be treated any differently from, say, searchcode.com? A modicum of diligence is always required regarding ownership, licensing, appropriateness, trust, etc.
Basically, ghostscript says you cannot distribute AGPL code with your commercial code, even though they are not linked together, not even on the same media.
Yes, you can, but if you add something - like issue tracking, user/access management etc.. around that then you need to publish that under AGPL as well..
It means it will never be part of something like github, bitbucket or AWS CodeCommit.
And then there are plugins to CIs (checkout from repository ...) - would those be affected?
Mission accomplished?
It definitely be part of something like Gitlab (or github, bitbucket or AWS CodeCommit), you just have to model your business around the fact that the software is available to everybody. You know, like wordpress (sure, they are GPL so they can have some proprietary components but the practical implication is that you CAN create your own, for pay, wordpress.com service).
Plugins can be a different story. Does the software have a rest interface? then use that, no license virality. If it doesn't, you release them AGPL
If you are just calling the service through it's API (either CLI or through the programming language interface), then you don't need to distribute anything.
This just protects against people taking open source code GPL, modifying it for themselves, using it in backend services, and then never distributing their modifications.
That's contestable afaik.
> through the programming language interface
i.e. dynamically linking which is usually understood to be prohibited unless linking code has a compatible license.
> either CLI
Here's the problem - you can take any GPL library, make small CLI or REST adapter for it and license it GPL as well, then use that adapter from your proprietary application - is that still allowed? Because if it is then GPL can't ever be enforced and if it isn't you can't call CLI APIs even if original libraries themselves provide it.
However, I think if you just installed Pijul on a server and then called it through your operating system interface, then you _might_ be fine. You might need to make it so the interface to Pijul is generic and could swap out with other VC systems.
I still might also be wrong about this. I'm not a lawyer and the comments in [0] are on both sides of the argument.
EDIT: edited to express less certainty over my interpretation of the license
Might, exactly. Depending on various courts accepting there is a loophole in GPL, and my layman understanding is that there isn't. Skimming GPLv2 I don't see them differentiating in derivative works between those that use compile time linking and those that use mechanisms like CLI. I'm weary of bringing after the fact constructs to justify something that GPL doesn't talk about. And after all how is CLI that much more different than dynamic linking - CLI is merely an interface that is subjectively a bit more friendly in certain situations but this imo shouldn't a have a bearing in legal discussions.
afaik this means any kind of linking (static or dynamic), using a jar or using a npm package or similar..
>through the programming language interface
which implies some version of the above
It's easy to imagine wanting to add Pijul support to Tower, SourceTree, Gerrit, Phabricator, etc. - projects that all dwarf Pijul. But these projects will be unable or unwilling to risk doing so, because of the AGPL.
Now that we've started to use Pijul for the website (about two weeks ago), we want to change the license.
I'm slightly annoyed by all the political statements in GPL3, AGPL3 and several versions of LGPL, such as "you cannot use this on impure devices", or "you cannot use this on platforms with poor support for shared libraries such as windows or mac os" (by which I mean that these platforms don't have real package managers to handle dependencies, and the end user has to either (1) install DLLs manually or (2) install "unshared shared libraries", i.e. one full instance of the library per program using it, which OSX calls an "app").
I am not a lawyer, but GPL2 seems to be free of these. We're not likely to pick anything much more permissive for now. Also, the Pijul and Darcs teams agree that we don't particularly enjoy discussions about licenses, especially when they're not based on factual arguments. Here are answers we've already given:
- If you think that "a new anarchistic jurisdiction not recognizing any copyright law will soon emerge, hence Pijul should be in the public domain", we don't agree. The movie Dunyayi Kurtaran Adam, also known as Turkish Star Wars, is available full-length on youtube, to remind everyone that copyright laws may be broken, but not totally useless. Therefore, your dream country may not "emerge" that soon.
- Or maybe you think "I'd like to make a living from selling a small wiki based on Pijul, leveraging not only your research ideas, but also the database backend you spent 6 months full-time writing, as well as you SSH library. Why would you not allow me to do that?". The answer is: because if your wiki is useful, we want to use it too, without having to pay! What sense of fairness is this?
The thing they worry about is deciding what falls under AGPL. With GPL it is easy, if an executable leaves the building, source goes with it. With AGPL, when almost very thing is web connected, it can be hard to draw the line on what you have to open source (or just tell people you are using.. do I have to add every AGPL program in an Ubuntu server install, just in case one is getting used by something else?)
Whether rightly or not, https://www.theregister.co.uk/2011/03/31/google_on_open_sour... and other writings on the topic had a big impact on enterprise adoption of AGPL software.
There's also the fact that most prominent AGPL licensing tends to be around commercially backed software where the main backer owns all the source and is therefore in position to not hold themselves to the same standard of sharing changes as their customers and third parties must do.
> But maybe we’ve missed something, and the AGPL actually prevents some use of Pijul that we’ve not thought of, and that does not aim at centralizing the internet. If this is the case, please discuss your idea with us on the mailing list.
Second, I wonder how many companies really had to change the source code of git, subversion, mercurial etc. What somebody could do is build a web services around it. As far as a service interfaces pijul with a system() it won't have any licensing problems. Encapsulate it into some RPC wrapper for extra safety. You might have to distribute the wrapper but it won't make it much easier for the competitors. The bulk of the web service can still be closed source.
Third, GPL licenses are fair to the original developers and the end users (they get to see the code they use). I understand why other developers might like BSD style licenses and we could spend another 30 years arguing about the virtues and the flaws of the two approaches. I won't get into that but given their goal of decentralizing the Internet I think they picked the right license.
Even that overstates the matter. Companies could modify the code all they want as long as they don't distribute that code. That would impact Github competitors - maybe 10 or 20 companies in the entire US.
Also, "distribute that code" can be interpreted very broadly. A contractor may access your internal HR system, and now you have to share it with them. A factory line worker may be entitled to the source controlling the robot fixture. A airline passenger may use the in-flight entertainment unit, and be entitled to its source. Etc.
Spot on for PHP and MySQL. PHP has its own license (kind of BSD?) and MySQL is GPL2 or commercial. I bet Facebook could buy the rights of any AGPL product, unless the owner is really firm about principles.
The speculation I have heard is that the terms "deploy" and "link" are both ill-defined and have not gotten proper testing in the courts. So there is no case-law saying that pushing your changed binaries to 1,000 internal sites (or even better: partially owned subsidiaries) does not invoke the clause. Or what does "linking" mean in the context of a database driver? What happens if a well-meaning employee loans out a modified binary to a customer to see if it fixes their problem? All of that makes lawyers nervous, and what makes one lawyer nervous has the potential of making other lawyers rich at their companies expense.
Just because you read something and come to a conclusion does not mean that people who come to other conclusions are "uneducated".
I run a tiny business. AGPL is banned there too.
It isn't an "unreasonable fear". It is a reasonable decision based on mitigation of risk. AGPL is a risky license for end users; it goes too far beyond the Four Freedoms.
It's not about GPL, it's about GPLv2 vs GPLv3 and the requirements that come with it.
UPDATE: both git and mercurial get the concrete version wrong (https://tahoe-lafs.org/~zooko/badmerge/concrete-good-semanti...)
The last blog post from 2017-Jan-10 says "I’m pleased to announce that we are starting to test the first usable version of Pijul.We are not quite ready to release.."
I was hoping that getting to the front page meant pijul had done a release but I see nothing to that effect.
Darcs, and pijul, are patch-based. That means that they think of the world as an ordering of patches. Patches aren't the same as commits: commit orderings, for example, are fixed, whereas patch orderings are computed. They can change when you e..g merge a "branch". Branching is similarly "simpler": a branch is just a collection of patches, not just a single commit with an implicit DAG attached to it.
Bitkeeper does have a weave data structure that more closely resembles patches. It's an encoded set of instructions for transforming one file from one state into another:
This data structure has a big advantage when computing annotations (blames): it's much faster than Mercurial's revlog (which in turn is faster than git's blob-tree-ref structure).
Is the difference between patches and git commits in a DAG really only a difference in internal representations or is there a user-facing difference?
Darcs' and pijuls' patches aren't glued, they only either commute or do not, and the conflict resolution mechanisms for non-commutative patches are different.
https://en.wikibooks.org/wiki/Understanding_Darcs/Patch_theo...
However, it would not be possible to implement a patch-based system (Pijul/darcs) based on git.
- One example is cherry-picking: in git, when you are on some branch A, and cherry-pick from another branch B, after the cherry-picking is done, if you try to cherry-pick from B again, you'll get conflicts. In Pijul and darcs, that comes for free.
- Another example is merge: merge between commits is provably wrong (https://tahoe-lafs.org/~zooko/badmerge/simple.html). With patches, this cannot happen.
> The difference between what svn does and what darcs does here is, contrary to popular belief, not that darcs makes a better or luckier guess as to where the line from c1 should go, but that darcs uses information that svn does not use -- namely the information contained in b1 -- to learn that the location has moved and precisely to where it has moved.
Darcs/Pijul understand which lines a patch is interested in, and if those lines get moved around in another branch, the patch "follows" them to the right place, rather than just finding/applying the shortest possible diff.
If you've heard someone say "never git pull, always git fetch and then either merge or rebase as appropriate", then they've noticed the difference between the two.
The Darcs wiki has some cool graphics around cherry-picking merges that might answer your question better than a paragraph of prose can: http://darcs.net/Using/Model#merging-with-cherry-picking
darcs had incredible cherry picking about ten years ago.
Instead of saying "get this commit" and solve merge conflicts manually, like git does, darcs would get one patch and every other that was necessary for it.
It effectively made cherry picking work as in "I want this feature from that branch" instead of "I want some code from that branch".
It was glorious, other than the little detail of occasionally exponential merge times...
One confusing thing for git users is, git represents a number of commits (but not all types of commits) as patches.
Two differences:
- Cherry picking is possible, but when you cherry pick twice from the same branch, you get conflicts with commits, because cherry-picking change their identity. With patches, this works just as expected.
- Merging can be made associative with patches, not with commits. Concretely, in git, if Alice and Bob add lines to a file, even when there are no conflicts, Alice's new lines can be merged in the middle of parts added by Bob, even though she's never seen these parts. Even worse, there is no way to tell when this happens to you (git doesn't say). "Associativity" is the mathematical property that this never happens.
The darcs/pijul world of repositories as loose ordered sets of patches makes cherry-picking the rule rather than the exception. A branch is mostly just the subset of patches you are interested in at a given moment. A "trunk" is just the superset of all possible patches. You can make interesting and easy usages of things like set intersections [1]: the intersection of the patches in two branches in darcs/pijul can be much more interesting than nearest common parent commit in the git DAG, and especially can be a lot more informative in the cases where things like bug fixes are cherry-picked across branches, which in git is a special bit of tree/commit surgery but in the darcs/pijul world that patch can be often the exact "same" in both branches.
[1] Aside, I love the concept of using intersection branches for consensus-oriented development (what releases to Production are the patches that every developer has pulled into their own working branch), which is a neat form of decentralized development that I think can only really be handled in the darcs/pijul model. (I have an ancient blog post on the idea of such a starfish development workflow.)
One thing you'll notice very quickly with Darcs (and presumably Pijul) is that the system always manages dependencies between patches. If you try to cherry-pick a single patch from a branch, you will get that patch and all the patches it depends on; you don't get the full linear history, you only get a subset.
In other words, you get the intuitive feeling that you're operating not on a log, but on a graph. Pulling one thread necessarily pulls other threads, and the whole graph rearranges itself to accomodate your changes. This has its downsides compared to the strictly-linear snapshot model, but the upside for most users is incredible. You can just commit and merge, and the system handles ordering for you.
Git was a major step down, UX-wise, when we switched from Darcs back in 2008, and it's still less user-friendly today. (It was also a major step up in some ways: Darcs, at the time, had a huge performance edge case where conflicts where sometimes effectively unresolvable because they took too much time to compute.)
Here is a very short video on a project called "Camp" that stalled out, nearly a decade ago. I think it very nicely explains how the user interface differs:
The most important thing is that because there is no DAG, when you say "Darcs, pull this patch for me", like saying `git cherry-pick ABCDEF` -- the dependencies are automatically computed and pulled as well. You can sort of imagine it like if you had a git branch, and you ran 'cherry-pick' on one of the commits to your 'master' branch (because you wanted it). But, rather than pulling that one thing, 'cherry-pick' implicitly traversed the dependent patches and picks them all as well. But because there is no DAG, a dependency doesn't mean "parent commit". It means "the other patches that are mathematically required for this patch to work out". That means cherry-pick always works: you never have to calculate the dependencies yourself. To merge a patch is to implicitly merge all of its dependencies.
I've spent plenty of my time as an OSS maintainer dealing with merging multiple bug fixes from a development branch into stable branches. For example, a bug may already be fixed in HEAD when it's reported, but not STABLE, so you want to pull changes from HEAD into STABLE. Many times this requires multiple, carefully curated sequences of 'git cherry-pick' in order to correctly get the dependencies right. For example, the author may have made a small refactoring, then implemented the bugfix on top of that. Or it requires a complete reformulation or re-commit of a new change that matches the STABLE branch.
In a sense: this never happens with Darcs. If there's a bugfix, I say "Get me that bugfix patch". It always gets every dependent patch that is necessary, and never anything more. Every time. It always just works. Remember: no DAG. You aren't traversing parent commits. You are, in a sense, finding the transitive closure of "patches that cannot commute with this patch" (IIRC). That means: if a given patch does not commute with this patch, i.e. it is dependent, because we must apply them in a certain order, so there is a dependency -- then you also need that patch. And you need to apply that rule to that patch, and every patch it depends on, and so on and so forth (hence 'transitive closure')...
This allows a very powerful form of development, where features and bugfixes can coexist. But they do not necessarily need separate 'branches', so to speak. To merge a feature into a repository implicitly pulls its dependents, and the same with bugfixes. The net effect of this is that Darcs almost always gets merges correct, or it fails to do the merge at all. This kind of means that merges are sound ('kind of' because I don't know about an actual soundness proof, but the intuitive idea roughly is right): if Darcs pulls off the merge, then it's always correct, but it may not be able to always actually do that merge (perhaps not every merge is actually sensible, in the theoretical view of things, or perhaps the merge is sensible but the model doesn't allow it to handle that case).
The "fails to do so" is the tricky part, where Darcs 1 originally went exponential in some cases, though Darcs 2 mitigates this. It looks like Pijul will finally nail this problem dead, although admittedly I haven't looked over the theory.
Side note: Camp was originally envisioned to be the successor to Darcs, or at least the basis for "Darcs 3", using Coq to build formal proofs about the underlying patch theory to show it worked out correctly and avoided the harry bits that plagued Darcs 2. Unfortunately, it never panned out that way (due to time and lack of funding). The project was actually started by Ian Lynagh who worked at my current company before me and was one of the founders.
> Darcs almost always gets merges correct, or it fails to do the merge at all.
One of these is not like the other, which IMO is the problem with "magical" merging systems. Great when they work, f*cking hell nightmare when they don't.
I'd rather have something like git that works in normal usage all the time, and when it fails, is easy to fix. YMMV.
In contrast, Darcs and Pijul's merge are associative, and Pijul's merge is commutative. Even if you don't like maths, this means that they will always behave deterministically. This also means you can use them in scripts, although darcs might sometimes have performance problems (pretty bad ones, actually).
In git, you can get the following: https://tahoe-lafs.org/~zooko/badmerge/simple.html
In all honesty, given years of experience with Git, and fondly using Darcs as my first version control system: I still think merges are absolutely the one thing it beats Git at, hands down. When it works and it does its job, it always is correct. When it doesn't, you can bail it out. Not much different, but the "always is correct" and dependencies-being-implicit is what makes it good. Darcs could have saved me at least dozens of hours of hair pulling when doing STABLE merges I estimate... Git's still good. I wish it could do that, though...
Your note about git is interesting. In fact, Git is, in at least some cases, more magical than other VCSs in the merge department. You might just not be aware of it due to being so familiar. When I say "Darcs always gets the merge correct", I don't just mean it literally finishes with exit code 0, but also that the semantic model is, in some sense, more 'correct' or 'intuitive':
http://r6.ca/blog/20110416T204742Z.html
Darcs (and others) always get this 'merge associativity' case correct, where 'Base+A+B' where (+) is merge is associative (so it doesn't matter how you 'bundle' the changes or whatever). That means you have less edges to worry about. And to be fair, I don't think there's anything inherent about Git where this particular case can't be fixed. It's just a good example of why people are trying projects like Pijul/Darcs at all, so these things can be formalized and understood. The theory of patches is actually rather rich and helps formalize a lot of these notions of what a "merge" really is in an algebraic sense, how patches relate to one another, etc.
When you type darcs pull, darcs lets you pick and choose which patches (subject to the dependency constraints) you want to pull in. Those patches then get applied however darcs wants to apply them (again, subject to dependency constraints, of cousre). Because you do not necessarily need to pull all of them, you are always "cherry picking" by default.
This is different from cherry-picking in git, because when you cherry-pick in git you still have 2+ commits that exist in the context of a DAG; you're just transplanting the contents elsewhere to create a new commit in a different position in the DAG.
This sounds less and less like a tools/implementation thing and more like the default recommended/enforced workflow thing.
My point is, if you focus on the implementation differences, my understanding won't increase because we don't have the same mental model of how git works under the hood.
It might help to compare the "identity" structures of git commits versus darcs/pijul patches. In a pseudo-C, you can see a git commit as something like:
struct commit {
string author;
string description;
tree_id tree_snapshot;
commit_id[] parent_commits;
}
This is a directed acyclic graph (DAG) because of that `parent_commits` link from one commit to the immediate previous parents (it can be multiple parents in the case of a merge commit). Git can only just move refs on a pull in the case of a "fast forward" when the remote branch is "simply" ahead of the current branch and all of its new commits "point to" the last commit in the current branch. Every other case it is a merge of the graph (via a merge commit with two or more parent commits).
(While git outputs a diff as the representation of the commit in places like `git show`, a commit doesn't store the diff but instead a link to a snapshot of the tree at the time of the commit.)
For something like darcs/pijul, the identifying information of a patch looks something more like:
struct patch {
string author;
string name;
change[] changes;
}
This may seem like semantic quibbling in that the patch here actually contains the diffs as a part of its identity rather than a snapshot of a source tree, but that's not actually the important difference.
The important difference is that the context of the patch is no longer a part of its identity: there is no "parent patch" information, and the change structures don't directly refer to previous changes.
The reason that difference matters is because in the darcs/pijul models the context of the patch is more "metadata" about the patch than a direct part of the patch. Patches aren't "nailed" to a graph like a commit is, they "float in a basket" together. Darcs and pijul do the work to figure out which patches need to be in which order in a branch/repository.
This can be a nightmare to someone expecting a strict graph. Darcs and pijul can and will reorder history during a pull. You can see "newer" patches float down under "older" patches in the patch log as the systems work to build a stable sort of patches.
That movement, however, is also where the systems draw the most strength. That movement of the patches can be seen as a continual, rustling "cherry arranging" as the systems work to figure out the minimal set of previous changes that patch needs in order to exist.
If you cherry-pick a commit in git you copy the changes from that commit to the new branch (a new spot in the DAG) into a new commit with its own new identity. Down the line when you go to reintegrate/remerge the branches between the original branch and the cherry picked branch, git doesn't see the same commit/change and its merge can (in my experience, will) see conflicts in the exact same change made in different contexts.
When you cherry pick a darcs/pijul patch, you bring over the same exact patch and the system lets you know any other minimal dependencies that you need and brings them over as well. When you reintegrate/remerge the cherry picked branch, those exact same cherry-picked patches are already "in" the original branch and so don't necessarily need to be remerged/rearranged again.
You can duplicate git workflows on top of darcs/pijul, but it is very hard to duplicate some of the more interesting darcs/pijul workflows on top of git. Among other things, rebase/cherry-picking merge hell is a very real problem in the git ecosystem, whereas darcs/pijul almost seem like crazy smart magic in comparison when it comes to some of the scenarios where you might rebase or cherry-pick.
It might be something that won't entirely make sense until you try experimenting with it yourself: maybe, you might want to take darcs for a spin for a small project or two. I think you can feel a lot of the difference as you use it, especially as you start to push/pull between branches/repositories.
(Anecdotally, my workflows are quite different on darcs versus git, knowing that typically I could fix a bug discovered elsewhere in the code in the middle of a bigger project, without needing to branch, I would often just record that change into a tiny patch on its own right there on the spot, and generally know that if I needed to get just that one patch into another branch I could rely on darcs to cherry pick it for me later.)
> Down the line when you go to reintegrate/remerge the branches between the original branch and the cherry picked branch, git doesn't see the same commit/change and its merge
> can (in my experience, will) see conflicts in the exact same change made in different contexts.
Sadly, git also recognises this on the user's behalf, so likely your experience was due to some other delightful quirk of the git UI.
edit: I'd also recommend not calling trees 'tree snapshots', because that will confuse people familiar with trees. Same for 'cherry-picking a commit', since from your description darcs seems to use 'cherry-picking' to mean 'fetch a set of changes from someone else', which maps to 'fetch a branch' in git-land. 'git cherry-pick' means, 'copy a single change from one local branch to another', so has almost no overlap.
You seem to think its "fetch a branch", but I'm trying to tell you that the `darcs pull` experience is a lot more like doing `git fetch && git cherry-pick origin/TIP --interactive` every time than `git pull`, but with a much, much better merge experience than that implies.
I'm not trying to confuse different concepts, I'm trying to show that the hard concept in the git case was the easy concept in the darcs case.
Can you explain a little more what you mean?
Even when using git am or send-email or whatever, yes, you're sending a patch -- but the way to apply that patch is to turn it into a commit and then cherry pick or rebase or merge or manually fix conflicts or whatever. In darcs and pijul, the model is _always_ that set of patches.
But, you are correct. Internally, git stores the full contents of files and computes the diffs on the fly.
For others' benefit, if you want to test for yourself, create a new repo, add a file and make a series of commits with changes. Git objects are compressed with DEFLATE (zlib) so gunzip and unzip won't work. I used https://github.com/jezell/zlibber because I was too lazy to write my own quick zlib wrapper. Then doing
for o in .git/objects/*/*; do cat "$o" | inflate ; echo ""; done
This was surprising to me, since I had a very different model mentally. I still think the DAG of diffs is the better model mentally, but it is worth understanding that this is not what git is actually doing under the hood. It explains issues that arise doing rebases, cherry-picks, etc.
I now also understand the motivation behind Pijul. If I understand correctly, Pijul does use a collection of changesets as the underlying model. Like you say, that can be a critical difference.
But if you don't have A in the first place, then there's a real problem, how to take B-A and reconstruct B, without knowing what A is. You have to find A, and there may be multiple acceptable A's. It seems like the fundamental difference here would be not having a strict "parent" for any given patch. I can see why that would make some workflows a little nicer, not being forced to rebase, but I don't see any massive advantages -- as a user what does this really buy me? Does it enable some things like that are impossible with git? Or does it mainly make some advanced git workflows easier?
One difference is that by storing the patches only you can understand more clearly what the intended change was. When you store the whole file it is easy to compute the difference between A and B, but may be impossible to compute the correct differences between A, B, and C. By storing the whole file you now have to consider all the possible differences between them, not just the ones introduced by the commits you are trying to merge.
I would have to play around with it, but I know there are scenarios involving rebase, revert, and cherry-picking commits that can cause trouble in git that I now understand comes because of the fact that git is storing contents, not diffs.
One that I've run into regularly is cherry-picking commits from a dev branch into a master branch to hot-fix bug fixes directly into a prod release instead of waiting until dev gets merged as part of our regular process. If I had commit A on dev and cherry-pick it to master it creates a totally new commit A-1 that becomes part of the history of master. We lost the fact that A and A-1 represent the exact same changeset. Depending on what the changes are, and what further changes happen on dev afterwards, this can cause failed merges requiring manual resolution when dev does finally get merged into master.
I imagine that would not be a problem for Pijul.
Git certainly has some room to improve in the merge conflict department. I looked at the bad merge example you posted -- I suspect I've hit that before. It's rare, but yeah it's there.
I also frequently notice that git complains about merge conflicts, while the custom diff tool I use to resolve them says there's no conflict, and I don't actually have to do anything. Good reason to use a custom merge tool with git.
But, given all this, is this really all an outcome of patches vs snapshots, or is this just git's merge algorithm being suboptimal? Certainly git could selectively ignore the DAG when merging, couldn't it? Even after reading the other comments here, it still seems to me like git has more information when merging than the "patch-based" workflow of darcs & Pijul.
It seems to me like there's a language problem with trying to draw a distinction between patches and snapshots. Git is still storing and transferring patches at the tree level, even if it's not happening at the file level. Git does not store a commit as a zip snapshot of the entire tree, the commit is still only the changed files. It would be fair (but not standard or common) to call the overlay of changed files a "patch" or a "diff". People do still use git format-patch, and email git "patches" to each other. So it's inherently confusing & problematic to talk about git and say that it doesn't use patches.
What does make sense to me is the distinction of having a strict DAG vs not having one -- is that actually what people mean when they talk about snapshots vs patches? Am I tripping on it because I'm being too pedantic about what a "patch" is?
merging, rebasing, committing... all operate on refs. You might think you're transplanting changes (and you are), but the inputs are refs, and the outputs are refs. As you mentioned, refs are unambiguously snapshots.
If anyone is questioning this, just play with `git rebase -i [some old changeset id]` and you'll see that it's just an ordering of patches.
In fact, this is why rebase is an out-of-band tool that has odd effects on shared history - specifically because it's inverting git's model into something more like pijul's, and therefore isn't really native-to-git.
Assume you change file A, commit, then change file B, and commit. In git there is a dependency from the second to the first commit, because the state of the second is derived from the first one. However, the changes are unrelated because they are in different files. Pijul understands them as parallel unrelated changes (although with different time stamps).
(disclaimer: I infer that from using darcs many years ago. I never used Pijul)
In theory, the number of possible branches is much, much greater (arguably infinite) for git. Commits have their own metadata, so just amending creates a new commit. Patches are immutable, and there's no independent artifact like a commit that incorporates the current set of patches.
Furthermore, darcs (and I assume pijul) absolutely let you make multi-file patches.
Apart from the other answers that go into theories of patches vs. commits, I always found Darcs much more intuitive to actually use than Git. It has fewer commands that do more intuitive stuff. To record a patch, you do "record" instead of separate "add" and "commit" steps. To revert some changes you have not recorded and that you want to get rid of, you do "revert" instead of "checkout -- filename". The "diff" command works more intuitively than git's "diff", which you sometimes have to use as "diff --cached" due to its staging model.
Another thing is that every branch is a separate copy of the entire source tree in your file system. A drawback is that this can be viewed as wasteful, but the advantage is that it's much much easier to work in parallel branches at the same time since changing the branch is just doing "cd" instead of some boring dance of "stash" and "checkout". You also don't get problems due to files intended for one branch still lying around after a "checkout" to switch to another branch.
(I'm assuming that Pijul preserves all or at least most of these properties of Darcs. The docs aren't very exhaustive.)
That's not how math and/or intuition works.
Let's look at example from math: integers and addition. Addition is pretty general -- there aren't, say, weird special cases when one operand is even, or the current date during calculation is Friday the 13th. Addition is associative and commutative, so I can evaluate a long summation in any order I want, or I could chunk up the calculation and have multiple computers evaluate parts thereof, all without any coordination/locking. It's easy to reason about, because the rules are so general, and generalization is what math is all about.
Now let's look at software: packaging. What are the semantics for package installation for your language/OS? If you install package A and then B, do you end up with the same result as you would by installing B and then A (i.e. is installation commutative)? Many (if not most) package managers can only support one installed version of a package at a time, and thus installation can not be commutative: installing a package will pull in dependencies that will influence package constraint resolution in subsequent installations, so order does matter. Now you have to be careful not to fuck that up when you set up your cluster's configuration management (or, hell, just get what you need installed on your laptop so you can work on a new assigned project). Now, if the package manager in question supported multiple installed versions of a given package (and had a way of "activating" only a subset of all packages for a given project/application), installation would be commutative, freeing you of the burden of installing things just the right way and in just the right order. This is how Nix (OS pkg manager) and the latest Cabal (Haskell pkg manager) work.
So, yes, coming up with a simple, consistent mathematical model for the semantics of your target system will definitely make things more intuitive. It's precisely because most developers are terrible at mathematical reasoning that so much software is so difficult to reason about -- there are tons of unnecessary special cases, when hidden inside all the tangled logic there's secretly a simple set of axioms and theorems that lend themselves to intuitive composition.
For clarification, my point is this: if some operation can be proven to be commutative, but you don't acknowledge that it is, you've just made things more difficult for yourself (and maybe others). It would be absurd if someone worked really hard to evaluate a_1+a_2+...a_n strictly left to right, when there might a more convenient (from a human grey-matter standpoint) order to evaluate those numbers. That's because we know that addition here is associative and commutative. Uncovering these properties is what math is all about -- discovering these truths and exploiting them to good effect.
If you can prove that something is not commutative given your axioms, you can attempt to revise your axioms -- maybe one axiom was redundant and only served to impose unnecessary constraints on your model. If you can't find a revised set of axioms that satisfy what you need, congrats -- you've found the simplest system you could come up with -- though someone might, down the road, come along and show you that there was, indeed, a simpler set of axioms that you couldn't envision. That's the process of mathematical development.
If you're good at math, you'll discover the axioms you need to get the generality you're looking for, and if commutativity is to be had, you'll find a way to get it. If you find no way to get you some commutativity, you at least avoid pretending that you have it (that is, you don't write buggy software).
Good math skills will either get you commutativity (which is intuitive) or prove non-commutativity (intuitive again, because you're avoiding bugs) -- it's win-win.
Susan: My brother plays such great music with his violin!
Bob: Actually, I think you're arguing the opposite point -- a violin has strings that can be mishandled such that they cause annoying screechy sounds, so surely he's capable of producing a cacophony of screechy sounds. Not so enjoyable.
Susan: Okay... but he's a good musician, so though he could fuck up a performance if he wanted to, he doesn't -- he plays to the best of his abilities.
If a math model claims that something isn't commutative, it's either as simple/general as it can be, or the creators of that model are bad at math (they left commutativity on the table due to bad axioms, or their theorem that the given operation was not commutative was wrong).
You're original argument was that mathematics can model things that we might arrive at intuitively, without looking at the model - such as assuming commutativity in package manager installation order - correct?
I'm simply saying that mathematics can also describe things which are surprising; counter-intuitive even, so I agree with 'higher' comments in that being:
> based on a mathematical model of collaborative edition
is not sufficient for having that:
> behavior matches intution, [sic] every time
It just doesn't follow. Not least because one man's intuition differs from another's.
> You're original argument was that mathematics can model things that we might arrive at intuitively, without looking at the model - such as assuming commutativity in package manager installation order - correct?
No, certainly not -- the right choice of model is key (a trivial example: one system for strings would be one in which there wasn't an identity element (that is, the empty string), thus making strings non-monoidal; that would complicate matters like concatenating a list of nullable/optional strings). I'm not claiming that to be the case, nor do I think the Pijul authors are claiming that. I think we both would say that there exists a model of any given domain that is the most suitable for for that domain (not that just any given model will suffice), and that it's math that helps you discover it. I read that blurb on the site as (the slightly tautological) "because we chose a good mathematical model, you can expect that the entities and operations on those entities can be composed as advertised, rather than unintuitively yielding unexpected results (bugs) or unnecessarily prohibiting some composition of operations that is clearly logically be sound (which is also unintuitive -- why the special cases, when these things should compose?)".
> I'm simply saying that mathematics can also describe things which are surprising; counter-intuitive even [...]
Sure. An example: non-commutativity in in package managers that only support one version of a given package to be installed. As someone who uses more flexible package managers, I'm often surprised when I'm using another package manager and discover its operations aren't commutative (which inevitably is due to, as I discover shortly thereafter, that it only supports installing one version of a given package at a time). That model requires extra brainpower to think through how I'm going to coax the package manager into installing what I need without conflicts (of course, after I've jumped through hoops to uninstall all the bad versions first). A mathematical model can be given for these systems, and they surely are convoluted and bad -- but just because we can come up with convoluted messes in math doesn't make math any more antithetical to intuition, that's just user error.
In this case, the package manager developers never set out to codify the formal semantics for these systems -- they just grew organically from initial needs. If they had started with an explicit mathematical model and iterated on that model, and assuming they had any math proficiency here, they would have done the convenient thing and allowed multiple package versions, and would consequently have commutative package installation (and I wouldn't have a notepad full of notes on how I need to carefully serialize my installs so that I don't get conflicts).
>> based on a mathematical model of collaborative edition
>is not sufficient for having that:
>> behavior matches intution, [sic] every time
Sure, not sufficient, but necessary (unless you count the possibility of just randomly stumbling into the best model). It's also necessary that you chose the best model for what you want (where "what you want" is surely a subjective matter, but efficacy can often be measured objectively in terms of, say, how much time is spent doing the same thing in two systems (assuming a similar level of mastery in both systems)).
> It just doesn't follow. Not least because one man's intuition differs from another's.
Ok, intuition is a subjective measurement, but I think there's still value in trying to find a pattern in what people generally find intuitive, rather than dismiss the topic entirely. I'm suggesting, anecdotally, that fewer arbitrary edge-cases is easier for human brains to deal with, and I think few would argue with that. Math is, to a large extent, the process of shaking out those generalizations from a bunch of concrete observations, so I would surely trust a system where someone could point out their logic in the construction thereof, over some system where the authors shrug and say "I dunno, that's just the way I built it." (which is most software I've come across). Seeing that note on the Pijul site inspires confidence: even if their model isn't maximally generalized (yet), I know that it's something they have an appreciation for, so I can come in and propose improvements (the same cannot be said for projects where the leadership can't appreciate such proposals due to a lack of the mental/mathematical framework necessary to conceive of the positive consequences thereof).
Given my heuristic for intuition ("as few edge-cases as possible"), it would seem that math would be requisite here. Do you disagree with that, or do you perhaps interpret that blurb from the Pijul site as claiming that (any unqualified) application of math is sufficient for developing an intuitive system?
Have you ever developed software outside of academia? Special cases arise when your perfect mathematically structured snowflake is put in the hands of actual users who want to do actual work with it.
A buddy of mine who did work with JPL has told me that NASA uses a decades-old FORTRAN app to calculate orbits. When I asked him why they didn't port it to a more modern language he said it was because the system has so many special cases and fudges that would take forever and the new system might not be able to replicate the features of the old one.
Of course having a logical basic architecture (example, a structure that encodes special cases as configuration) can make it easier to maintain as a project matures.
For vim users, that's how vim keybindings work, for example (not everyone would agree vim is intuitive, hence I qualified the user-base). If you learn vim's keybindings, they're like axioms. You start to intuit the patterns like you would a language, eventually you don't think about how to do something, your fingers intuitively know.
If by intuitive we mean "matches the patterns that other people expect (e.g. git/svn)" then obviously Pijul is not intuitive. But if once one learns the basics, it becomes fairly easy to infer what the tool will do in more advanced cases, then one might say it is at least "intuitable".
Depending on the outlook, AGPL is simply there to fix a loophole in the GPL, and rightly so.
pijul changes the theory, and does so in a way specifically to avoid slow algorithms.
One thing I see here (as a long time reader/follower of the space) is that pijul has been using other languages than Haskell and there's some thought that a more approachable language might bring in more developers outside of the subset of Haskell developers that have been darcs' main source of development experience.
Also the only vcs I've ever used where the repo managed to corrupt itself to the point I lost committed but unpushed changes
Optimizations have been added to darcs over the years including some changes to something increasingly akin to git's hash storage for binary objects, but so far as I know, binary files that might change a lot over time are still mostly discouraged in darcs.
[1] To keep the patches reversible, for commutation.
... much better now..
Edit: they have some minor text on how it's better, but a more detailed description would be appreciated.
It is written in Rust, instead of Haskell, also contributing to better/more predictable performance.
The Darcs folks are very aware and supportive of it, and last I heard, Darcs might add support for the Pijul patch format.
This is an extremely exciting project.
(speaking as a long time Darcs fan and passive onlooker who saw the Pijul presentation at rustconf)
Feels like good research and progress.
https://pijul.org/faq.html#did-you-solve-the-exponential-mer... answers it, at least partially. There may be a more comprehensive comparison, though.
Where Darcs has this:
Pijul has some text saying "we're better than Darcs in these areas".
As far as I know, the only unique aspect of Fossil is "Integrated Bug Tracking, Wiki, and Technotes".
I am very bullish about agpl. As far as I understand, the choice of agpl has no effect on your code which you put in a repository. Is that not the case?
I mean, I might not understand the finer legal details but thinking "all your code you see after using anything GPL has to be GPL" is really weird, did this people ever used a linux kernel? (yes I know GPL is different from AGPL and at a high level I also know what is different)
Disclaimer: IANAL. AGPL enforcement is generally a little unclear. If you ship a binary, you have to make the source available. If your binary answers on the wire but you don't ship it you (under the APL) have to make the source available.
So, uh, yes -- as long as you never use it, which seems a little silly of a definition?
... Code you add with the equivalent of `git add; git commit`. I.e. suppose git was AGPL instead of GPL, that would have no effect on the license of say, Rust (MIT+Apache) regardless of the fact that git is used in the development.
Again, (A)GPL enforcement is unclear and subject of real debate, but my understanding is that nobody thinks virality impacts data being used by the program (e.g. source code in a VCS).
Does AGPL prevents some company creating Github like services for Pijul?
So, assuming said company would not want to AGPL their code (and assuming they don't distribute binaries), that choice would imply some technical decisions they would presumably not be very happy about, but overall: no.
(And note that GitHub doesn't even use the original git written by Linus, they wrote their own implementation, libgit2.)
Anyways, for Pijul to be successful, there must be something like PijulHub. The bar is higher now. Thus, make it easy to build this hub!
I expect their idea of success is also very different from github's.
Not at all, if their service is AGPL licensed.
Of course, many companies may not want to use AGPL, and there's a whole lot of discussion on this page regarding what might/might-not be allowed to work around the AGPL. However, I didn't see anyone mention the possibility of, you know, using the AGPL, which is the entire reason the authors chose it ;)
