Hacker News new | past | comments | ask | show | jobs | submit login
Is Git Irreplaceable? (2019) (fossil-scm.org)
240 points by cnst 12 days ago | hide | past | web | favorite | 547 comments





Git's biggest flaw is that it doesn't scale. If a new system can fix that without sacrificing any of Git's benefits, I think it can topple Git.

It's ironic that Git was popularized in the same era as monorepos, yet Git is a poor fit for monorepos. There have been some attempts to work around this. Google's `repo` command is a wrapper around Git that treats a set of smaller repos like one big one, but it's a (very) leaky abstraction. Microsoft's GVFS is a promising attempt to truly scale Git to giant repos, but it's developed as an addon rather than a core part of Git, and so far it only works on Windows (with macOS support in development). GVFS arguably has the potential to become an ubiquitous part of the Git experience, someday... but it probably won't.

Git also has trouble with large files. The situation is better these days, as most people have seemingly standardized on git-lfs (over its older competitor git-annex), and it works pretty well. Nevertheless, it feels like a hack that "large" files have to be managed using a completely different system from normal files, one which (again) is not a core part of Git.

There exist version control systems that do scale well to large repos and large files, but all the ones I've heard of have other disadvantages compared to Git. For example, they're not decentralized, or they're not as lightning-fast as Git is in smaller repos, or they're harder to use. That's why I think there's room for a future competitor!

(Fossil is not that competitor. From what I've heard, it neither scales well nor matches Git in performance for small repos, unfortunately.)


I disagree that Git's biggest flaw is its lack of scalability. Cases where git needs to scale tend to be isolated to companies that have the manpower to build a finely-tuned replacement (see: MS, Google).

Git's flaws are primarily in usability/UX. But I think for its purpose, functionality is far more important than a perfect UX. I'm perfectly happy knowing I might have to Google how to do something in Git as long as I can feel confident that Git will have the power to do whatever it is I'm trying to do. A competitor would need to do what git does as well as git does it, with a UX that is not just marginally better but categorically better, to unseat git. (Marginally better isn't strong enough to overcome incumbent use cases)

And for the record: I think git-lfs issues are primarily usability issues, and tech improvements. The tech enhancements will be solved if there's enough desire, and as I mentioned the usability problems are more annoyances than actual problems.


I work for a 40-people game studio.

A major limitation of git is how it deals with many "big" (~10Mb) binary files (3D models, textures, sounds, etc.).

We ended up developing our own layer over git, and we're very happy ; even git-lfs can't provide similar benefits. This technique seems to be commonplace for game studios (e.g Naughty Dog, Bungee), so certainly git has room for improvement here.


This does not surprise me. Git's original purpose of managing versions of a tree of text files (i.e. the source code of the Linux kernel) pervasively influences it, and I wouldn't expect it to be any good for working with binary files or large files.

If somebody comes up with something that matches Git's strengths and also handles binaries and biggies much, much better then they could definitely topple Git with it. It'd take time for the word to spread, the tools to mature and the hosting to appear, but I can definitely see it happening.

I think most people know that Git isn't perfect, but it's also the case that coming up with anything better is an extremely difficult task. If it wasn't, someone would have already done it. It's not like people haven't been trying.


Do you have tools that you can utilize diffs from your binary file changes? Or does a change simply just replace all the bytes.

I'd argue if it's the later, that git was never the right choice to begin with. You don't really want to record a full 10MB of data every time you change one pixel in your texture or one blip in your sound, right?

So I don't know if this is a "major limitation" of git per se. Not saying there's a better solution off-the-shelf (you're obviously happy with your home grown). But this was probably never a realistic use for git in the first place.


While I can't speak for the person you're replying to, the technology at least exists. Binary diffs are sometimes used to distribute game updates, where you're saving on bandwidth for thousands if not millions of players - which costs enough $$$ to actually be worth optimizing for. On the other hand, between simpler designs and content encryption being sometimes at odds with content compression... so is just sending the full 10MB. For a VCS - I'd probably be happy enough to just have storage compression - using any of the standard tools on the combination.

> You don't really want to record a full 10MB of data every time you change one pixel in your texture or one blip in your sound, right?

Actual changes to content in a gamedev studio are very unlikely to be as small as a single pixel. Changes to source code are unlikely to be as small as a single character either. And we definitely want a record of that 10MB.

We're willing to sacrifice some of our CI build history. Maybe only keeping ~weekly archives, or milestone/QAed builds after awhile, of dozens or hundreds of GB - and maybe eventually getting rid of some of the really old ones eventually. Having an exact binary copy of a build a bug was reported against can be incredibly useful.


I usually see bsdiff or courgette cited as good tools for binary diffs:

http://www.daemonology.net/bsdiff/

https://www.chromium.org/developers/design-documents/softwar...


"Having an exact binary copy of a build a bug was reported against can be incredibly useful."

Sure, immutable build artifacts can be invaluable -- but aren't they also an orthogonal concern?


> Sure, immutable build artifacts can be invaluable -- but aren't they also an orthogonal concern?

One person's immutable build artifact is another person's vendored build input.

It's common to vendor third party libraries by uploading their immutable build artifacts (.dll, .so, .a, .lib, etc.) into your VCS, handling distribution, and keeping track of which versions were used for any given build. It makes a lot of sense if those third party libraries are slow to build, rarely modified, and/or closed source - no sense wasting dev time forcing them to rebuild it all from scratch.

The next logical step is to have a build server auto-upload said immutable build artifacts into your VCS, for those third party libraries that you do have source code for, when your VCS copy of said source is modified. Much more secure and reproducable than having random devs do it.

And hey, if your build servers are already uploading build artifacts to VCS for third party libraries, why not do so for your own first party build artifacts too? Tools devs spending most of their time in C# probably don't need to spend hours rebuilding the accompanying C++ engine it interoperates with from scratch, for example, so why not "vendor" the engine to improve their iteration times?

This can lead to dozens of gigs of mostly identical immutable build artifacts reuploaded into your VCS several times per day, with QA testing and then integrating those build artifacts into other branches on top of that. The occasional 10MB png is no longer noticable by comparison.


I can sympathize with the game assets argument, but this problem is just the result of trying to stuff a square peg into the round hole.

Build artifact caching is a different problem from source control, with very different requirements:

1. As you mentioned, the artifacts tend to get huge.

2. The cache needs to be easy to bypass. From your example, it needs to be easy for the C++ engine devs to do builds like "the game but with the new engine" to test out their changes.

3. The cache needs to be precise, so you don't end up with mystery errors once it finally does trigger, or people wondering why their changes don't seem to apply.

4. The builds need to be exactly reproducible, so you don't end up with some critical package that only Steve Who Left 5 Years Ago could build (or Jenkins Node 3 That Just Suffered A Critical HDD Failure).

Git either doesn't care about or fails spectacularly for each of those points. In particular, #3 will be very confusing since there will be a delay between the code push and the related build push.

Nix[0] solves #2 and #3 by caching build artifacts (both locally and remotely[1][2][3]) based on code hashes and a dependency DAG (for each subproject or build artifact, so changing subproject X won't trigger a rebuild of unrelated subproject Y, but will rebuild Z that depends on X). It helps with #4 by performing all builds in an isolated sandbox.

#1 is solved by evicting old artifacts, which is safe as long as you trust #4. If the old artifact is needed again then it will be rebuilt for you transparently. Currently this is done by evicting the oldest artifacts first, but it could be an interesting project to add a cost/benefit bias here (how long did it take to build this artifact, vs the amount of space it consumes?).

[0]: https://builtwithnix.org/

[1]: https://nixos.wiki/wiki/Binary_Cache

[2]: https://nixos.org/nix/manual/#sec-sharing-packages

[3]: https://cachix.org/


Assets and code have mostly the same needs out of a version control system - diffs, history, control over versions, etc. - and there are version control systems which handle both adequately. That said, I'll grant git is quite focused on code version control specifically - and I would not dream of trying to scale assets into it directly.

> 1. As you mentioned, the artifacts tend to get huge.

This, admittedly, is more common with build artifacts. That said, I've hit quota limits with autogenerated binding code on crates.io, with several hundred megs of code still being in the double digits when better compressed by cargo than I can figure out how to compress with 7-zip.

And that's a small single person hobby project, not a google monorepository.

> 2. The cache needs to be easy to bypass

I need to bypass locally vendored source code frequently as well, to test upstream patches etc.

> 3. The cache needs to be precise, so you don't end up with mystery errors once it finally does trigger, or people wondering why their changes don't seem to apply.

Also entirely true of source code.

> 4. The builds need to be exactly reproducible, so you don't end up with some critical package that only Steve Who Left 5 Years Ago could build (or Jenkins Node 3 That Just Suffered A Critical HDD Failure).

Enshrining built libs in VCS is an alternative tackling of the problem. You might not be able to reproduce that exact build bit-for-bit thanks to who knows what minor compiler updates have been forced upon you, but at least you'll have the immutable original to reproduce bugs against.

> In particular, #3 will be very confusing since there will be a delay between the code push and the related build push.

It's already extremely common - in the name of build stability, including with git - to protect a branch from direct push, and have CI generate and delay committing a merge until it's verified the build goes green. By wonderful coincidence, this is also well after CI has finished building those artifacts - in fact, it's been running tests against those artifacts - so it can atomically commit the source merge + binaries of said source merge all at once. No delay between the two.

There are some caveats - gathering the binaries can be a pain for some CI systems, or perhaps your build farm is underfunded and can only reasonably build a subset of your build matrix before merging. Or perhaps the person setting it up didn't think it through and has set things up such that code reaches a branch that uses VCS libs before the built libs reach the same spot in VCS - I'll admit I've experienced that, and it's horrible.

Nix, Incredibuild, etc. are wonderful alternatives to tackle the problem from a different angle though.


Yeah, I get it. Still seems a stretch to fault git for "failing" to optimize for that inefficient-by-design use case though.

To be fair, I mostly don't fault git for failing to optimize that far, even if there are alternatives that do. That's far enough outside the core use case for myself and those I know that I'd be willing to sacrifice it for other, more important considerations.

But I'm totally willing to fault git for failing to optimize enough to handle the manual commit cadence of source game assets though. Because that's not just a tertiary use case - frequently for coworkers it's their primary use case. The end result is I mostly only use git for personal hobby stuff, where it's a secondary use case and my assets are sufficiently small as to not cause problems.


Right on. Thanks for clarifying -- and for confirming a legitimate complaint based on real-world, personal experience.

> You don't really want to record a full 10MB of data every time you change one pixel in your texture or one blip in your sound, right?

Ideally, yes, why wouldn't I? I want to capture the exact state of the thing at each change.


I kind of phrased that poorly. I should have added the context of "in git". Saving a new 10MB file every time you change it, as per my original premise, is not something that git was really designed for. It's asking a screwdriver to do the work of a hammer.

I totally get the use case of saving each iteration of that 10MB file _somewhere_. But expecting git to do that job is not the right level of expectation, was my main point.

When I have worked with binaries like that described, I will place a URI reference to a file hash and have something that knows how to resolve it. A file store (think S3 or whatever) that has files named: texture1.dat-[sha1] and change the reference to the file in the source. e.g. a "poor man's" version control by way of file naming conventions. Does this approach work in your world?


Aren’t game studios and other creative studios meant to use “asset management” systems instead for their large binaries?

Diffing a PSD as a binary is impossible - whereas proper asset management tools will deconstruct the PSD’s format to make for a human-readable diff (e.g. added/removed layers, properties, etc).


Separate version control for code vs assets leads to a world of pain. Also you can use whatever diff tool you want; doesn't have to be the built-in textual diff.

Jup, I experienced this too (not at a game studio though, and the team I worked with wasn't nearly experienced enough to write a layer over git). When we switched to a new version of the git gui it would stop working because when you click through the GUI, it would perform some git operations that were supposed to run fast. I filed an issue that quickly got shot down with 'wontfix, your repo is too large and git is not for binary files'.

The usual technique of game studios is using Perforce. It is clunky and sometimes straight up infuriating, but it handles large files well.

Why did you not use Plastic SCM or Perforce?

I can't recommend Plastic enough - it's so fast and a great UI but super powerful. I've been using it at my company for years.

Has any studio open sourced what they've built? Or turned it in to a product? It seems there could be an opportunity to do something before git solves the problems for that use case.

Yeah, git with binary files is not fun. Have you tried git annex?

What does your layer do?

> Cases where git needs to scale tend to be isolated to companies that have the manpower to build a finely-tuned replacement (see: MS, Google).

I constantly run into git scalability issues as an individual. I don't use any of the UI clients because they all fail hard on mostly-code git repositories. I abandoned my VisualRust port in part because the mere 100MB of mingw binaries involved for that meant it was using github LFS, which meant CI was hitting github quota limits, and as I wasn't part of the organization - nevermind an admin with billing rights - I couldn't even pay to up said quota limits paying out of pocket myself even if I wanted to.

I'm not going to inflict git's command line experience - which confounds and confuses even seasoned programmers - on any of the less technical artists that might be employed at a typical gamedev shop, even if git might be able to scale acceptably if locally self-hosted at a single-digit employee shop.

A few dozen or hundred employees? Forget it. Use perforce, even though it costs $$$, is far from perfect, and also has plenty of scaling issues eventually.


The fact that you had a problem with github quotas isn’t really a problem with git though, is it?

The whole reason git lfs exists is to workaround git scalability problems. Its raison d'etre is problems with git.

That one of - if not the - most popular tool to solve said git scalability problems, also has scalability problems in practice, is both ironic - and absolutely a problem with the git ecosystem. To be pithy - "Even the workarounds don't work."

"Technically", you might say, "that specific symptom with git lfs, and that service provider, isn't the fault of git the command line tool, nor the git protocol". And you would be technically correct - which is the best kind of correct.

But I don't think we're referring to either of those particularly specific things with "Git" when we ask the article's question of "Is Git Irreplacable?". I'm already the weirdo for using git the command line tool - most of my peers use alternative git UI clients, and I don't mean gitk. The git protocol is routinely eschewed in favor of zips or tarballs over HTTPS, Dropbox, Sneakernet, you name it - and is invisible enough to not be worth complaining about to pretty much every developer who isn't actively working on the backend of a git client or server. Not to mention it's been extended/replaced with incremental improvements over the years already.

So I'm using a slightly broader definition of "git", inclusive of the wider ecosystem, that allows me to credit it for the alternative UI clients that do exist, rather than laughing off the question at face value - as something that has already been replaced.


Nothing about your problems had anything to do with git & everything to do with the commercial service you were using for your source code hosting.

Github the company is not interested in providing you (or anyone else) with free storage for arbitrary data. You were unable to pay for the storage options they do provide because you did not have admin rights to the github account you wanted to work with.

None of this is a problem with git, be it GUI git clients or command line ones.

This isn’t just "technically correct". It’s the "a commercial company doesn’t have to provide you with a service if they don’t want to" kind of correct.


> Nothing about your problems had anything to do with git & everything to do with the commercial service you were using for your source code hosting.

All the commercial service providers recommend keeping total repository sizes <1GB or so, and I hear nothing but performance complaints and how much they miss perforce from those who foolishly exceed those limits, even when self hosting on solid hardware - which is 100% the fault, or at least limitation, of git - I believe you'll agree.

LFS is a suggested alternative by several commercial service providers, not just one, and seems to be one of the least horrible options with git. You're certainly not suggesting any better alternatives, and I really wish you would, because I would love for them to exist. This results in a second auth system on top of my regular git credentials, recentralization that defeats most of the point of using a DVCS in the first place, and requires a second set of parallel commands to learn, use, and remember. I got tired enough of explaining to others why you have a broken checkout when you clone an LFS repository before installing the LFS extension, that I wrote a FAQ entry somewhere that I could link people. If you don't think these are problems with "git", we must simply agree to disagree, for there will be no reconciling of viewpoints.

When I first hit the quota limits, I tried to setup caching. Failing that, I tried setting up a second LFS server and having CI pull blobs from that first when pulling simple incremental commits not touching said blobs. Details escape me this long after the fact - I might've tried to redirect LFS queries to gitlab? After a couple hours of failing to get anywhere with either despite combing through the docs and trying things that looked like they should've worked, then I tried to pay github more money - on top of my existing monthly subscription - as an ugly business-level kludge to solve a technical issue of using more bandwidth than should really have been necessary. When that too failed... now you want to pin the whole problem on github? I must disagree. We can't pin it on the CI provider either - I had trouble convincing git to use an alternative LFS server for globs when fetching upstream, even when testing locally.

I've tried gitlab. I've got a bitbucket account and plenty of tales of people trying to scale git on that. I've even got some Microsoft hosted git repositories somewhere. None of them magically scale well. In fact, so far in my experience, github has scaled the least poorly.

> Github the company is not interested in providing you (or anyone else) with free storage for arbitrary data.

I pay github, and tried to pay github more, and still had trouble. Dispense with this "free storage" strawman.

> You were unable to pay for the storage options they do provide because you did not have admin rights to the github account you wanted to work with.

To be clear - I was also unable to pay to increase LFS storage on my fork, because they still counted against the original repository. Is this specific workaround for a workaround for a workaround failing, github's fault? Yes. When git and git lfs both failed to solve the problem, github also failed to solve the problem. Don't overgeneralize the one ancedote of a failed github-specific solution, from a whole list of git problems, to being the whole problem and answer and it all being github's fault.

> None of this is a problem with git, be it GUI git clients or command line ones.

My git gui complaints are a separate issue, which I apparently shouldn't merely summarize for this discussion.

Clone https://github.com/rust-lang/rust and run your git GUI client of choice on it. git and gitk (ugly, buggy, and featureless though it may be) handle it OK. Source Tree hangs/pauses frequently enough I uninstalled, but not so frequently as to be completely unusable. I think I tried a half dozen other git UI clients, and they all repeatedly hung or showed progress bars for minutes at a time, without ever settling down, when doing basic local use involving local branches and local commits - not interacting with a remote. Presumably due to insufficient lazy evaluation or insufficient caching. And these problems were not unique to that repository either, and occured on decent machines with an SSD for the git UI install and the clone. These performance problems are 100% on those git gui clients. Right?

> This isn’t just "technically correct".

Then please share how to simply scale git in practice. Answers that include spending money are welcome. I haven't figured it out, and neither has anyone I know. You can awkwardly half-ass it by making a mess with git lfs. Or git annex. Or maybe the third party git lfs dropbox or git bittorrent stuff, if you're willing to install more unverified unreviewed never upstreamed random executables off the internet to maybe solve your problems. I remember using bittorrent over a decade ago for gigs/day of bandwidth, back when I had much less of it to spare.

> It’s the "a commercial company doesn’t have to provide you with a service if they don’t want to" kind of correct.

If it were one company not providing a specific commercial offering to solve a problem you'd have a point. No companies offering to solve my problem for git to my satisfaction, despite a few offering it for perforce, is what I'd call a git ecosystem problem.


No one is saying git doesn’t have problems. It's just weird that you keep on conflating issues with Github with issues with git.

I'm conflating at most one github specific issue (singular), not "issues". And I'm doing so because it's at best a subproblem of a subproblem of a subproblem.

If my computer caught fire and exploded due to poor electrical design, you wouldn't say "nothing about your problems had anything to do with your computer and everything to do with the specific company that provided your pencils" when in my growing list of fustrations I offhandedly mentioned breaking a pencil tip after resorting to that, what with the whole computer being unavailable and all. That would be weird.

Even if we did hyper focus on that pencil - pretty much every pencil manufacturer is giving me roughly the same product, and the fundamental problem of "pencils break if you grip them too hard" isn't company specific. It's more of a general problem with pencils.

Github gave me a hard quota error. Maybe Gitlab would just 500 on me, or soft throttle me to heck to the point where CI times out. Maybe Bitbucket's anti-abuse measures would have taken action and I'd have been required to contact customer support to explain and apologize to get unbanned. git lfs's fundamental problem of being difficult to configure to scale via caching or distribute via mirroring isn't company specific. It's more of a general problem with git lfs. Caching and mirroring are strategies nearly as old as the internet for distribution - git lfs should be better about using them.

It would've turned github's hard quota error into a non-event, non-issue, non-problem - just like they are with core git. Alternatively, core git should be better about scaling. Or, as a distant third alternative, I could suggest a business solution to a technical problem - GitHub should be better about letting me pay them to waste their bandwidth. Then I could workaround git's poor scaling for a little bit more, for a bit longer.


Not waiting to provide you with free storage is not a "scalability problem". I can't spend company money on Perforce either, is that a Perforce problem?

I pay for a github subscription. I set out to pay more for a github quota bump, but found I was limited by upstream's LFS quota rather than my fork's LFS quota.

> Cases where git needs to scale tend to be isolated to companies that have the manpower to build a finely-tuned replacement (see: MS, Google).

I recollect that for Windows (which also uses git), MS have actually extended git with "Git Virtual File System" rather than replace it[1]. But I do agree that broadly, not everyone needs the scale.

[1] https://devblogs.microsoft.com/bharry/the-largest-git-repo-o...


Scaling isn't even just about number of files or size of them. A problem I've hit is just in having cross-repo stuff work well. Monorepos are helpful partly because git submodules are not ideal for good workflows, and splitting stuff across multiple git repos can backfire (it doesn't help that almost all the tooling around CI and the like is repo-based instead of project based).

I would love a layer over Git to handle workflow issues related to multi-repo projects


Sub modules have terrible UX and are completely counterintuitive. Subtrees work well for my small repos when I need to vendor something in

There is also a git subrepo from subtree author, worth checking out IMHO

> I disagree that Git's biggest flaw is its lack of scalability. Cases where git needs to scale tend to be isolated to companies that have the manpower to build a finely-tuned replacement (see: MS, Google).

I would say that the sole thing git was developed for, the Linux Kernel, is (starting to be) painful to work with when using git.


The Linux Kernel is big, but it's not likely as big (in terms of lines of code or pick your metric) as Google or Microsoft repositories. Maybe the kernel is just starting to feel that pain?

Honestly asking.. Do you speak from some level of authority that the Linux kernel is stretching the boundaries of git? Or are you just saying that more speculatively? What is the painful part?


Maybe I am a weirdo but I have always thought that git's UI is very intuitive (with some exceptions like sub modules). SVN on the other hand was an unintutive mess where I had to look up commands all the time.

Agreed - I had been looking around this thread like, "<slow blink> - surely I'm not the only one that finds git to be a rewarding exercise in teamwork?"

Fully agree.

The magic sweet spot might be the fact that most projects to not need to be distributed. This is where a lot of complexity is derived.

So without all those extra concerns - and - a more elegant UI framework (i.e. rational commands) - and possibly something that scales a little better. That's enough mojo to unseat git for a lot of things.


I'd say the number of git repos on Earth that would encounter problems of that nature would be a vanishingly microscopic minority. Sure, it's a problem for those companies but it's not a problem for anyone else.

Vanishingly small in number, but quite significant in terms of the number of developers working in them.

All of the organizations that have outgrown git will have such incredibly specific requirements meaning nothing but a custom built tool will work for them.

The so called problem would also vanish if the monorepo was modularized and broken up into smaller repos.

That's probably a harder task for an existing monorepo that is too big for git than writing a replacement for git that works with a repo of that size.

The problem boils down to refactoring a large monolith. I feel like Git is a scapegoat for a much larger problem.

Let's say you started with a well factored set of code that is managed within your organization. What advantage is there to having multiple repos if you're not limited by your tools? Refactoring is easier within a single repo...

In my experience, code doesn't stay well factored unless there are technical hurdles that keep it so. That of course doesn't have to be a repo boundary, but in can be.

There probably will be a plethora of different hard issues to fix in such situations. It's also easier to institute change in a dictatorship as opposed to a democracy (being a dictator that is :).

Objectively not true. Many (most?) are just using Perforce in an off-the-shelf configuration.

This reads to me as a failure of imagination. Any mid size game development shop is going to feel this pain - not just giants like Microsoft and Google. I believe the Unity Game Engine has a user base in the millions? Even a subset of that may be small in comparison to the entire developer population but by no means microscopic.

Mercurial is probably that competitor. Only slightly slower than Git. Works on very large monorepos (as large as Facebook's or Google's monorepo). Very similar workflow as compared to Git, with some minor differences in terminology.

As an FB employee, I use hg regularly (because it is required). I would not use it as a git replacement for non-FB-sized repos. It has some weird design choices (e.g. branching is bad), and it very often requires manual intervention for merges that git performs correctly and automatically.

You can get around branching-is-bad by changing your workflows a bit, but you can't get around the bad merges: over time it's like death by a thousand papercuts.


What is so bad about mercurial branching? The underlying structure is the same as git: a directed acyclic graph, the only real difference is how branches are named.

Mercurial has 3 ways of doing branching:

- bookmarks: these are like git branches, a pointer to a revision

- branches: when you are in a branch, all commits are permanently affixed with that branch name. Less flexible than bookmarks (and therefore git branches) but good for traceability

- heads: unlike with git, a branch name can refer to several actual branches, it usually happens when you are pulling from a central repository, but you can create them yourself if you need some kind of anonymous branching. These can be pushed but it is not recommended.

Git only has the first option.

The way central repository are managed is also a bit different even if the fundamentals are the same. Git has the "origin" namespace to distinguish remote branches from local branches. Mercurial uses a "phase" which can be "public" (in remote), "private" (local only, will become "public" after a push) and "secret" (like "private", but will not be pushed and therefore will not become "public"). So if you are not synchronized with the remote, in git you will have two branches: origin/my_branch and my_branch, in mercurial, you will have two branches named my_branch, one public, one private. That's essentially the same thing, presented differently.

In the end, they are fundamentally the same. The feel is different though. Git is flexible, and gives you plenty of tools to keep things nice and clean when working with large, distributed project. As expected for something designed for the Linux kernel. Mercurial focuses on preserving history, including the history of your mistakes, and I feel it is better suited for managed teams than a loosely connected community.


What kind of merges does git handle that hg doesn't? If it's just a matter of figuring what goes where, someone that uses hg daily could copy the implementation from hg. It could be a big organization that uses daily for instance.

Paper cuts can be addressed with more users reporting bugs and contributing fixes. The fundamental design issues with git that prevent scalability cannot.

As a Google employee I use hg every day, even though it's not required. (Some teams at Google do mandate its use, but these are few and far between.) I don't use branches, but I use bookmarks. I didn't notice any merges that really ought to be performed automatically but were not; in any case I use Meld to resolve merge conflicts and it's easy enough to do occasionally.


You can always strip a bad committed merge, abort a bad uncommitted one, and perform it again (maybe with different tooling).

Normally mercurial stops when there are conflicts it cannot resolve reliably. In that cases, have a try at kdiff3: it handles hairy merges quite well. In a lot of cases even automatically (and correctly).

There is always meld, but I'd say kdiff3 is superior wrt merge conflict resolution.


> ...weird design choices (e.g. branching is bad)

What bothers you in particular?


The branching in hg is actually way better than with git. The reason you are confused is probably because you learned the wrong (git) way of branches.

If you want the wacky and unreliable git branching you can use hg bookmarks.


For example: https://stackoverflow.com/questions/36358265/when-does-git-r...

This page has been viewed 230 thousand (!!) times. Because git is so easy and elegant that it lies to you what branches exist on the remote.

It is not even funny any more how bad this is.


> Only slightly slower than Git

That's interesting. In your examples isn't it fast because monorepos are network-based, as in, you only fetch what you need when you need it?

Also reminded me of discussions around CPython's startup time and how one use case where milliseconds matter is in small cli utilities such as Mercurial.


What I mean is daily operations on the repo like viewing a diff, committing, amending, checking out a different commit, etc. Without doing precise measurements, I would tend to think that it's mostly caused by the slowness of CPython, as compared to a C executable (Git).

The entire repo is stored on a networked file system. So essentially every file operation is remote. That doesn't actually contribute to much slowness because when I didn't use hg, operations were noticeably faster.


"Barney Oliver was a good man. He wrote a letter one time to the IEEE. At that time the official shelf space at Bell Labs was so much and the height of the IEEE Proceedings at that time was larger; and since you couldn't change the size of the official shelf space he wrote this letter to the IEEE Publication person saying, since so many IEEE members were at Bell Labs and since the official space was so high the journal size should be changed."

- http://www.paulgraham.com/hamming.html


What is the analogy here? The first guess that came to my mind was the monorepo versus multi-repo debate: since Git can only support repos that are so large (shelf space) without getting slow, you should split up your repos (journal size) even if semantically you would prefer a monorepo. But that would support the point I was making, whereas the obliqueness of your reply makes me think you probably meant to criticize it.

I think he's comparing the journal to the tool (harder to change, impacts everyone) and the shelf to the problem that only impacts a few organizations but actually a lot of people because those organizations are so large.

I guess that makes sense. But if that's the analogy, there's a significant difference between the situations. In that example, there was nothing inherently wrong with the journal's size, other than it not matching Bell Labs' arbitrary choice of shelf layout. Git, on the other hand, would be inherently a better tool if it had better performance on large repos (without sacrificing its suitability for small repos).

Git scales well enough for almost everyone (especially if you have a little discipline with what you put in the repo).

It’s only huge megacorps that need larger scale things like GVFS.

As for large files, that is not what Git is for. Git is for source code. Much like how you don’t put large files in your RDBMS, you should not be putting them in your SCM either.


What if you need to version them? Git imposes a very specific versioning model: version is a property of the entire repository. Thus, not including some file in the repo implies that it's not versioned in the same manner. It's not just a function of binary vs source.

Versioning big binary blobs is not what Git was designed for. It’ll do fine with smaller assets like icons and the like, but its data model is based on everyone using the repo having a local copy of the full repository history. You can’t easily purge old data. That scales poorly if you want to use it for audio/video files or other large data sets.

You can still do it if you want, but you might be better served using https://git-lfs.github.com/ or using another system designed for that purpose.


Honestly, you can use Git for large files with lfs. I wouldn't say I love this approach, but it isn't that bad now. You do have to make room for yet-another-tool, and you now have centralized version control comingling with your distributed tool (essentially making it central); but you can still use everything you love about git, and if your lfs doesn't change, you don't need to be connected to a server. It certainly feels pretty absurd. This isn't even a problem in SVN, but now we're tacking on another tool that you have to learn, and introduces issues.

The only way I've ever successfully `git clone`d my work repo is from another locally connected device. Even with shallow and then gradually unshallowing it, it will not generally complete before the internet falls over.

Nowadays, a new computer means a git clone (or just plain copy-paste) of a USB stick from the old one. This seems like it's a single feature of git that could be written, but if you told me "there's something that works better for large, twenty year old repos", I'd probably take that.

I don't know how Linux survives, but maybe it's just that you only rarely git clone your large repos. (Or maybe it's just that intercontinental internet is less reliable than intracontinental, so that if you're in the US it's a non issue.)


I'm guessing your work repo is has lots of large binary files in it's history?

> Google's `repo` command is a wrapper around Git that treats a set of smaller repos like one big one, but it's a (very) leaky abstraction.

Could you please provide a link to it? I’m very interested in seeing this command, but ironically it’s not a name that’s easy to google for.

Edit: I was very wrong, searching for “google repo command” displayed https://gerrit.googlesource.com/git-repo as the very first result.


I believe there will be no scalable open-source VCS because the incentives are not there. While the technical problem is interesting, I decided not to work on it because of this. http://beza1e1.tuxen.de/monorepo_vcs.html

> I worry that Git might be the last mass-market DVCS within my lifetime.

The possibility of git being the last mass-market DVCS within my lifetime leaves me with warm fuzzy feelings. Git is simple and elegant, though its interface might not be.


I think it's simple and elegant as a data structure, when what people need and want is something that is (at least also) simple and elegant in its UX and most importantly VERY simple and elegant for the 80/20 use cases.

For example a typical question on Stackoverflow is "How do I answer which branch this branch was created from", always has 10 smug answers saying "You can't because git doesn't really track that, branches are references to commits, and what about a) a detatched head? b) what if you based it off an intermediate branch and that branch is deleted? c) what if...

5 more answers go on to say "just use this alias!" [answer continues with a 200 character zsh alias that anyone on windows, the most common desktop OS, has no idea what to do with].

I don't want to write aliases. I usually don't want to consider the edge cases. If I have 2 long lived branches version-1.0 and master. I want to know whether my feature branch is based on master or version-1.0 and it's an absolute shitshow. Yes it's possible, but is it simple? Is it elegant? No.

The 80/20 (or 99/1) use case is

- centralized workflow.

- "blessed" branches like master and long lived feature branches that should ALWAYS show up as more important in hisory graphs.

- short lived branches like feature branches that should always show up as side tracks in history graphs.

Try to explain to an svn user why the git history for master looks like a zigzag spiderweb just because you merged a few times between master and a few feature branches. Not a single tool I know does a nice straight (svn style swimlane) history graph because it doesn't consider branch importance, when it should be pretty simple to implement simply by configuring what set of branches are "important".


As a very basic git user, about once a month my local git repository will get into a state I cannot fix. I cannot revert, cannot reset, cannot make it just fucking be the same as origin/master. Usually I accidentally committed to local master and then did a couple other things and it's just easier to blat and re-clone than work out how to resolve.

Git is hard for idiots imo, and there are a lot of us


> Usually I accidentally committed to local master and then did a couple other things

Create a new branch and check it out while you are on the last commit (git checkout -b my-branch), delete the master branch (git branch -D master), and pull it again (git pull -u origin master). You'll end up with a local branch with a bunch of commits that you can merge, rebase or cherrypick, depending on what you want.

If you want to learn more about git in a practical way, there's an awesome book called Git Recipes.


That is not easier than "blat and re-clone" so I think you're proving their point.

Recloning means redoing your work on top of a potentially different codebase. The approach I described is how you "blat and reclone" using git instead of the filesystem, and it has the clear advantage of keeping everything in the same repo. You can then mix all the code together in whatever way you prefer.

Git is a very flexible tool that allows for individual local workflows independent of how teams collaborate. Finding a personal workflow that works for you is a little investment that pays huge dividends for a long time. Git is a 15 year old tool that is expected to live for 10-30 years more at the very least. I encourage everyone to learn enough Git to not be afraid of it.


If you’re just going to blat and reclone, then you may as well use a folder system instead of git.

I see no reason git needs to be changed in order to cater to people who refuse to read basic documentation or learn from their mistakes.


I did not read brigandish's comment as advocacy of 'blat and reclone', I took it to be a comment on git's awkwardness.

I have considerable sympathy for the RTFM reply, but I do not think it is the last word that shuts down any question of usability. What seems clear to me is that there are a lot of people using git who probably should not be. In many cases, they do not have a choice, but I also suspect that many of the organizations that have chosen git do not have the issues that it is optimized for.


> I see no reason git needs to be changed in order to cater to people who refuse to read basic documentation or learn from their mistakes.

In my opinion, solving problems and making improvements involves reducing complexity, not defending it. Many people, including myself, have read the Git docs and learnt about the underlying data structures etc etc and still we can make the claim that it could be better, in numerous ways.

Calling everyone feckless won't invalidate that.


> we can make the claim that it could be better

I’m not disputing this. Of course git isn’t perfect.

What I’m against is changing git to cater to people who can’t read the manual and make basic mistakes.


> What I’m against is changing git to cater to people who can’t read the manual and make basic mistakes.

Why? Isn't software that doesn't require reading a manual and doesn't let the user make irreversible mistakes considered good design?


Not if it means reducing capabilities of the program in order to add bumper guards.

I can’t think of any software that handles a complex program that doesn’t have a manual, documentation like a manual, or a learning curve. Git is a tool for developers, not casual users who want typical apps.

Again, you wouldn’t make an argument like this for a tool used by a plumber or a mechanic. If a tool succinctly handles a problem, good! But using tools is part of the profession; they have learning curves.

Most issues with git are PEBKAC issues because people refuse to spend 10 minutes of their life reading about a tool they may use for hundreds or thousands of hours. I wouldn’t want to cater to those kinds of people.


Software can cater to multiple types of uses at the same time. You can have a learn-as-you-go experience while keeping your powerful tools that enable more fine-tuned or complex tasks. Easy-to-use vs. powerful is a false dichotomy.

About the plumbing/mechanic analogy, I totally would make the same case! Hammers and wrenches don't require a manual and can be used for very complex tasks, and that's exactly what makes them so well designed and popular. Few people want their hammer to have more features, and if they do, they still want to keep the good old hammer ready, because it's so easy and simple to use.

Especially calling out PEBKAC (Problem Exists Between Keyboard And Computer) - while even most of the expert git users, including the author himself say the interface could at least be made much better - makes me really suspicious that you simply like feeling superior to other people because you know something they don't, and you don't want to lose your "edge" if suddenly everyone can use version control without resorting to manuals.


> Easy-to-use vs. powerful is a false dichotomy.

iMovie vs Premiere/Final Cut. Final Cut X vs 7. Garageband vs Pro Tools. Word vs LaTeX. and so on. It's very difficult to design interfaces that are easy enough for average users that don't impede pros/power users.

> hammer

A hammer isn't a good comparison. Something like a multimeter is what I was thinking of, etc. Git solves a significantly more complex problem than either of these, though.

> including the author himself say the interface could at least be made much better

I don't disagree! Git's interface -could- be better. That has nothing to do with my points above with regards to people refusing to read basic literature about the tools they use, expecting them to just magically do everything for them out of the box, "intuitively".

> feeling superior ... you don't want to lose your "edge"

This could not be further from the truth. I simply have no sympathy for people who refuse to read the manual or an intro to using a tool, and then complain about the tool being hard to use. Yeah.. it's hard because you didn't do any reading! Git is actually really easy if you read about the model that it uses. Most people don't need to venture out beyond ~5-6 subcommands, and even then it's easy to learn new subcommands like cherrypick, rebase, etc.

Adobe Photoshop, as another example, has a learning curve, but that tool is indispensable for professionally working on / editing images. (GIMP is also good, but that's not in the scope of this discussion). A lot of beginner issues are basically PEBKAC because they didn't read the manual. Same with Pro Tools, or probably any other software used by industry professionals. They're harder to use but what you can do with them (since they treat you like an adult, instead of holding your hand and limiting you) is incomparable to the output of apps designed for casual users.


What can Git do that Fossil or Mercurial can't?

Deleting master is a thing that would _actually_ never have occurred to me! A neat trick

That being an example that I remember from late last year, there are just sharp edges to git I end up catching myself on :)


The git "master" branch is just an example of 'convention over configuration': some commands use it as the default argument, just like "origin" is the default remote name. Nothing in git is special or sacred! :)

Is there a reason for -D and not -d? Wouldn't -D also delete the remote branch if you accidentally pushed your changes?

"git branch -d" doesn't remove a branch if that means losing track of some local commits. "-D" doesn't check that. The "git branch" commands only operate on your local repo, they don't push any changes to remote repos and they don't pull commits from anywhere.

`-D` doesn't involve the remote at all (unless you are using something like the intentional remote:branch syntax which this example isn't). It is a force delete in that if there were commits locally in that branch and only in that branch it should still delete that branch. It should be unlikely you need that force because the first step was to branch everything as is, so it is safer to just use `-d`, but if the intention is to "blat" it from orbit anyway, `-D` is that.

The workflow that I have found works best is to just not change anything myself, I let other people do all the work. This way I can be sure that I won't get merge conflicts that I can't fix.

I wish git had a “metahistory” feature to allow everyone to undo anything. A `git revert` isn’t of any help when you’ve already merged and pushed.

It's called "git reflog": https://www.edureka.co/blog/git-reflog/

Though you do have to have committed. One of the things I hammer on in my tutorials for work is that if you get confused in git, make sure you commit. If you commit, you can take your problem to the other engineers and we can almost certainly get you straightened away. Fail to commit, though, and you really may lose something.

Also, metapoint about git: While I won't deny its UI carries along some dubious decisions carried over from the very first design, in 2020, basically, if you thing "Git really ought to be able to do [this sensible thing]", it can. It has that characteristic that open source software that has been worked on by a ton of contributors has, which is that almost anything you could want to do was probably encountered by somebody else and solved five years ago. It just may take some searching around to figure out what that is. (And on the flip side, when you read the git man pages and are going "Why the hell is that in there?", the answer may well be "a problem that you're going to have in six months".)


> make sure you commit

This is not absolute gospel. If you screw up a rebase and commit, whatever you removed in the rebase is simply gone.


Are you saying that with knowledge of what "git reflog" is? I suspect not. I'd really need to see a sequence of commands that removes committed state from the repo to buy this. If you try to produce it, bear in mind the first thing I'm going to do is run "git reflog" on the result, so if you find your committed state is still there, then I'm going to say it's still saved.

(That's not a git thing. I don't really even want some sort of hypothetical source control system that literally tracks every change I make. It's technically conceivable and should be practical to what would at least be considered a "medium sized" project in the git world, but I'd just be buried in the literally thousands of "commits" I'd be producing an hour. Failing that sort of feature, a source control system can't help but "lose" things not actually put into it.)


As I am not familiar with the details of reflog (I don't recall ever using it) I took a look at the article. I wasn't long until I reached what looks like a caveat: "This command has to be executed in the repository that had the lost branch. If you consider the remote repository situation, then you have to execute the reflog command on the developer’s machine who had the branch." Joe, who works on another continent, quit last week and his computer was a VM...

OK, so we have backups of his VM and we can recreate a clone of it, but will that be satisfactory? Are there any issues with hardware MAC addresses or CPU ids? How far down the rabbit hole of git minutiae do you have to go before you are confident that you can do all basic source-control operations safely?


No source control system can solve the problem of not having things it was never given.

git log --reflog --all

is the more important imo than

git reflog


The main thing that people fail to understand is that commits are immutable and the overall commit graph is immutable (with the caveat that pathways in the graph that don't end in a branch head are subject to garbage collection).

A rebase does not destroy information. It creates new commits and moves the branch head to a different spot on the graph.

The reason git is seen as painful is because you can't claim expertise until you develop the ability to form a mental map of the graph. But once you do this the lights turn on and everything starts to make sense.

This is why the mantra "commit early and often" still holds. The more experienced git user will tell the newer people this, so when they come with a mess it will always be recoverable.


That GC a pretty big caveat! Combined with the fact that unreferenced objects are never pushed.

reflog is like undelete in filesystems, it's a probabilistic accident recovery mechanism for an individual computer (repo checkout in this case) that you can try to use if you don't have backups.


In any case, git garbage collection isn't a common phenomenon. It usually triggers every few weeks, even in repos with high activity. The chance of hitting a GC that deletes an untracked commit you need is extremely small.

How would it be gone? You can't rebase onto a dirty working directory so you can't blow away uncommitted changes accidentally and any state prior to and during the rebase is always recoverable via the reflog.

It takes trying to do anything that's not easily recoverable as long as you commit before you start messing around and don't rm -rf .git.


It's definitely not gone. Just `git checkout ORIG_HEAD` and you'll be back to whatever the tree was before you rebased.

NO. Nothing is gone after commit. You just don't see it.

Make a git log --reflog --all and you will see all the commits you made (or rebase made) in the last 3 months.

You can than rescucitate an old branch by simply putting a branch name on it with git branch newname <old sha1>


It won't be gone until you actively prune dangling commits. The commit may not be reachable from the HEADs of existing branches, but go through the reflog and your commits will be there. You can then create a branch to make the commit reachable.

Does this not show up in the reflog?

You have a branch named <branchname> you do your operation that fouls up your branch. If you check with

git log --reflog --all

you will see with this magical command that git DOESN'T REMOVE any commit. Your old tree is still there, only normally hidden. The commit that was there before you fouled up your branch is still there.

You now only need to set your branch to the old commit. A branch is nothing else than a pointer in the tree.

You have 2 possiblities to change the commit a branch points to

1. git branch --force <branchname> SHA1 (works only if <branchname> is not the current checked-out branch. Simply checking out with the SHA1 works also as it deteches the HEAD).

2. replace the SHA1 in the text file in .git/refs/heads/<branchname> by the SHA1 where you want the branch to point to.

With that, your repo is in the same state it was before your error.


Harkening back to my early days of git, I have a rough guess as to what you can do to fix that. If all you want is remote master,

  git stash
  git checkout master
  git stash
  git fetch origin
  git reset --hard origin/master
and maybe a 'git rm -r --cached .' in case you have staged files you didn't intend to which stash failed to drop.

With the amount of information available, there is no excuse:

* https://rogerdudler.github.io/git-guide/

* https://git-scm.com/book/en/v2

Also see stackoverflow.

Git is a complex tool because it’s tackling a complex problem. I don’t see a way of making it “easier” without massively reducing what it can do. It’s like saying we should reduce a formula one car so people can use it without reading up on it, etc.

If something happens once, it happens. If something happens multiple times then it means you’re not evaluating why it occurred in the first place and learning from it. No tool in the world can solve this problem because it’s not a problem with the tool, rather the user.

Git is really not so hard, but it requires a little reading.


Git isn’t something which you can generally be successful using in a shallow way. Most developers will need to devote significant time and energy to mastering it. There really needs to be a better layer on top of it in order to make it easier for developers to figure out how to do what they want to do. Some of the commands and switches don’t seem to be orthogonal and/or intuitive.

Git already has layers on top of it like git porcelain or the various GUI tools that attempt to handle things smoothly.

> significant time and energy

All someone needs is to read through https://rogerdudler.github.io/git-guide/, and learn a few commands.

Are we seriously going to refer to “reading the manual” as “significant time and energy”? In this case you don’t even have to read the manual, just a primer on how git works. You know, on how the tool that you’re using works. Why are people so allergic to spending even a modicum of time on learning a tool that massively simplifies their life and makes their work possible?

Do plumbers complain about having to read manuals for the equipment that they use? Electricians?

As programmers our tools are easier to learn and use, yet we complain about having to any work at all.

Why even be a programmer? If reading about git is so hard, what about the rest of the field that doesn’t even have documentation?

How about we don’t make tools that cater to the lowest common denominator, in this case people who basically can’t be assed to do anything? RTFM.


Because a lot of us have used tools besides git that enable the workflow we need without that complexity, and without the fragility that often necessitates going to stackoverflow or asking on a slack channel.

I have a way of picking the losing side so I've been using mercurial for everything until now, and until now Bitbucket offered hg. They're decommissioning it so I'm moving over to git and I feel like my workflow has been hampered, not just in the immediate complexity of learning the new tool, but in the ongoing complexity of using a less good tool for my needs.

I'm dealing with it, but the situation you're describing isn't really the one that I and a lot of other whiners are dealing with.


> Because a lot of us have used tools besides git that enable the workflow we need without that complexity

I spend ages unfucking local svn working copies and long running branches on both windows and linux. git needs some serious flaws to keep up with that experience.


> Because a lot of us have used tools besides git that enable the workflow we need

Thankfully the standard DVCS is flexible enough to enable the workflows others need too.


A lot of people get by just staging and pushing/pulling commits, myself included. That’s 3 commands, 4 if you count git status. You do not need to dig deep to get a lot of use out of git as a basic remote sync.

> Git is a complex tool because it’s tackling a complex problem.

Fossil tackles much the same sort of problem, yet it's far simpler to use.

Most of Git's problems are due to purposeful choices, but they're design choices, not inherent aspects of how a DVCS must behave.

We've laid out our case for the differences here: https://fossil-scm.org/fossil/doc/trunk/www/fossil-v-git.wik...


> their thing: Sprawling, incoherent, and inefficient

> our thing: Self-contained and efficient

This is not biased in any way and makes me want to continue reading. /s

Also, you can’t claim something to be “efficient” when it’s doing many different things like scm, issues/tickets, a web forum/ui ....

Then you have non-issues like git being installed via a package manager instead of dragging and dropping a binary. Yeah, this is such a huge problem that concerns people, better switch to Better Project (tm).

And then you take Gitlab and conflate Gitlab’s issues with problems with Git. I guess gogs/gitea don’t exist?

This page needs to be rewritten to simply list the differences in neutral language. There are good points but they’re lost in unnecessary epithets like “caused untold grief for git users”. I get it: git bad, our product good. Switch!

Personally, I don’t want something that tries to do many different things all at once.


> This is not biased in any way

Of course we're biased, but every row in that table corresponds to a section below where we lay out our argument for the few words up in the table at the top.

Here's the direct link for that particular point:

https://fossil-scm.org/fossil/doc/trunk/www/fossil-v-git.wik...

Now, if you want to debate section 2.2 on its merits, we can get into that.

> you can’t claim something to be “efficient” when it’s doing many different things

We can when all of that is in a single binary that's 4.4 MiB, as mine here is.

A Git installation is much larger, particularly if you count its external dependencies, yet it does less. That's what we mean when we say Git is "inefficient."

But I don't really want to re-hash the argument here. We laid it out for you already, past the point where you stopped reading.

> git being installed via a package manager instead of dragging and dropping a binary. Yeah, this is such a huge problem that concerns people, better switch to Better Project (tm).

It is on Windows, where they had to package 44-ish megs of stuff in order to get Git to run there.

On POSIX platforms, the package manager isn't much help when you want to run your DVCS server in a chroot or jail. The more dependencies there are, the more you have to manually package up yourself.

If your answer to that is "just" install a Docker container or whatever, you're kind of missing the original point. `/home/repo/bin/fossil` chroots itself and is self-contained within that container. (Modulo a few minor platform details like /dev/null and /dev/urandom.)

> This page needs to be rewritten to simply list the differences in neutral language.

We accept patches, and we have an active discussion forum. Propose alternate language, and we'll consider it.

> unnecessary epithets like “caused untold grief for git users”

You don't have to go searching very hard to find those stories of woe. They're so common XKCD has satirized them. We think the characterizations are justified, but again, if you think they're an over-reach, propose alternate language.

> I don’t want something that tries to do many different things all at once.

Not a GitHub user, then?


Hey, are you a fossil dev?

I haven't used nor looked at fossil in maybe 5 years, but had a couple of questions.

Does fossil now have any kind of email support built in to the ticket manager? I remember when I tried to use fossil for actual production use, there was no way to trigger emails sent when, e.g. tickets were submitted, and one of the devs said to just write a script to monitor the fossil rss feed and send the appropriate email, which seemed like a baroque and fragile (and time-consuming) solution.

And is any more of the command-line behavior configurable (like the mv/rm behavior -- affecting the file on disk as well as the repository, or just marking the file as (re)moved in the repository)?


> Hey, are you a fossil dev?

I have commit access, yes, but mainly I work on the docs.

> Does fossil now have any kind of email support built in to the ticket manager?

Yes. It was added in support of the forum feature last year, but it also applies to several other event types: https://fossil-scm.org/fossil/doc/trunk/www/alerts.md

> one of the devs said to just write a script to monitor the fossil rss feed

Probably me. :)

> seemed like a baroque and fragile (and time-consuming) solution.

A dozen lines of Perl; easy-peasy. That and a pile of CPAN modules, but that's easily fetched with `cpanm`.

> the mv/rm behavior -- affecting the file on disk as well as the repository

The default you're referring to was changed a few years ago: the old `--hard` option is now the default.


By the way, the "one checkout per repository" is not strictly true. You can use "git worktree"; this is a lightweight way to reuse an existing git repository and have each worktree use a different branch. It's a nice feature, and I use it daily.

Also, a comment about the argumentation in "test before commit". It feels a bit artificial wrt. what can be done locally, what git commit and git push do and what their relation is in a sane workflow. Certainly, one can push untested stuff to the remote server by mistake; but, even so, this should be OK, because if one can push directly to important branches like master or similar without going through any reviews and other sanity checks, one has a problem... and the problem isn't really Git :)


> By the way, the "one checkout per repository" is not strictly true.

You must be referring to just the table at the top, not to the detailed argument below, which mentions git-worktree and then points you to a web search that gives a bunch of blog articles, Q&A posts, project issue reports and such talking about the problems that come from using that feature of Git.

I suspect this is because git-worktree is a relatively recent feature of Git (2.5?) so most tutorials aren't written to assume use of it, so most tools don't focus on making it work well, so bugs and weaknesses with it don't get addressed.

Fossil is made to work that way from the start, so you can't run into these problems with Fossil. You'd have to go out of your way to use Fossil in the default Git style, such as by cloning into ~/ckout/.fossil and opening that repo in place.

> test before commit". It feels a bit artificial wrt. what can be done locally, what git commit and git push do and what their relation is in a sane workflow.

That just brings you back to the problems you buy when separating commit from push, which we cover elsewhere in that doc, primarily here: https://www.fossil-scm.org/xfer/doc/trunk/www/fossil-v-git.w...


> Fossil is made to work that way from the start, so you can't run into these problems with Fossil. You'd have to go out of your way to use Fossil in the default Git style, such as by cloning into ~/ckout/.fossil and opening that repo in place.

That's unfortunate. Reading the comments here, switch-branch-in-place is seen as some kind of flaw, but I don't think I would voluntarily use a VCS that doesn't let me easily do that (it's most sensible way for me to work, from way before Git was a thing).


> switch-branch-in-place is seen as some kind of flaw

You're conflating two separate concepts:

1. Git's default of commingled repo and and working/checkout directory

2. Switch-in-place workflow encouraged by #1

Fossil doesn't do #1, but that doesn't prevent switch-in-place or even discourage it. The hard separation of repo and checkout in Fossil merely encourages multiple separate long-lived checkouts.

A common example is having one checkout directory for the active development branch (e.g. "trunk" or "master") and one for the latest stable release version of the software. A customer calls while you're working on new features, and their problem doesn't replicate with the development version, so you switch to the release checkout to reproduce the problem they're having against the latest stable code. When the call ends, you "cd -" to get back to work on the development branch, having confirmed that the fix is already done and will appear in the next release.

Another example is having one checkout for a feature development branch you're working on solo and one for the team's main development branch. You start work from the team's working branch, realize you need a feature branch to avoid disturbing the rest of the team, so you check your initial work in on that branch, open a checkout of that new branch in a separate directory and continue work there so you can switch back to the team's working branch with a quick cd if something comes up. Another team member might send you a message about a change needed on the main working branch that you're best suited to handle: you don't want to disturb your personal feature branch with the work by switching that checkout in place to the other branch, so you cd over to the team branch checkout, do the work there, cd back, and probably merge the fix up into your feature branch so you can work with the fix in place there, too.

These are just two common reasons why it can be useful to have multiple long-lived checkouts which you switch among with "cd" rather than invalidate build artifacts multiple times in a workday when switching versions.

Git can give you multiple long-lived working checkouts via git-worktree, but according to the Internets it has several well-known problems. Not being a daily Git user, I'm not able to tell you whether this is still true, just that it apparently has been true up to some point in the past.

Since no one is telling me those issues with git-worktree are all now fixed, it remains a valid point of comparison in the fossil-v-git article.


If what you're trying to tell me is that I can use switch-in-place workflow in Fossil as easily as in Git then that's cool. Thumbs up!

Edit: It could be worth it to emphasize this in the Fossil vs. Git comparison that you linked, as it wasn't very clear to me after reading it.


It's updated now, here: https://www.fossil-scm.org/fossil/doc/trunk/www/fossil-v-git...

Thanks for the feedback!


> You must be referring to just the table at the top, not to the detailed argument below

Well, not entirely, because in my opinion the detailed argument kind of hand-waves away the entire git worktree. Continuously switching branches inside a single large Git repo is certainly a suboptimal way to work with Git, but most of the time one should be able to avoid that with the worktree (though the worktree stuff is, of course, not a miracle cure for everything).


You’re not addressing the list of problems resulting from the use of this tacked-on feature.

Also, Git continues to be taught with the switch-in-place method by default.

I’m not saying it is impossible to get a Fossil-like workflow with Git, just that there are consequences from that not being the default.


> It’s like saying we should reduce a formula one car so people can use it without reading up on it, etc

A far better analogy is:

It's like saying we should reduce a programming language so people can use it without reading up on it, etc.


Sorry, but this is a terrible analogy.

The core problem with it is that very few people can get paid more by being better at using their [D]VCS, whereas those more skilled with their programming language(s) of choice often do get paid more to wield that knowledge.

Consequently, most people do not fully master their version control system to the same level that they do with their programming language, their text editor, etc.

To be specific, there are many more C++ wizards and Vim wizards than there are Git wizards.

In situations like this, I prefer a tool that lets me pick it up quickly, use it easily, and then put it back down again without having to think too much about it.

You see this pattern over and over in software. It is why all OSes now have some sort of Control Panel / Settings app, even if all it does is call down to some low-level tool that modifies a registry setting, XML file, or whatever, which you could edit by hand if you wanted to. These tools exist even for geeky OSes like Linux because driving the OS is usually not the end user's goal, it is to do something productive atop that OS.

[D]VCSes are at this same level of infrastructure: something to use and then get past ASAP, so you can go be productive.


You could always do:

  git reflog
to see when you last checked out master, then:

  git reset <commit hash> --soft
to reset to the commits at that point (but keep your files the same)

  git add . && git stash save
to stash your changes (not 100% sure you actually need the "git add ." part)

  git pull
in master

and then finally:

  git stash pop
and commit again as needed.

> Git is hard for idiots imo, and there are a lot of us

Yes, thus https://xkcd.com/1597/

I find myself saying "git reset --hard origin/" and such with disturbing frequency.

Both are examples of "I give up; it's faster to start over." This is not what I want in a DVCS.


Git push/pull -f

Then write a little merge message and you are good to go.


> Git is hard for idiots imo, and there are a lot of us

Then let the idiots end up screwing their own local repo, instead of doing some magic and making it easy to screw up upstream or someone else repo.


> I think it's simple and elegant as a data structure, when what people need and want is something that is (at least also) simple and elegant in its UX and most importantly VERY simple and elegant for the 80/20 use cases.

That's what UIs (whether CLIs or otherwise) for standardized workflows like git-flow are, IMO.


It doesn't nearly go all the way there though. Why do people need to use a command line and a gui tool (usually) for git? Because it's fundamentally not written to be used with a GUI. That I think is one of its biggest flaws. Using a GUI with git always feels like you are missing vital information and just trying to poke a cli underneath to do what you want.

Some design decisions also shine through like "no branch is more important than any other branch" which is completely mental considering how people actually use git.


Most of the guis are crap because everybody who builds them thinks that a GUI should just be a more or less a visual representation of the command line.

"Why do people need to use a command line and a gui tool (usually) for git?"

You don't. The reason is that you're using a tool that didn't budget the time to directly work on git data files and it uses the command line under the hood, because that's a hard business case to make for most small tools. This is not fundamental to git; the very top-end git-based tools like Github or Bitbucket all do their own internal, direct implementation of git functionality for this reason. It's not a characteristic of git, it's a characteristic of the GUI tools you're using.

A perfectly sensible one based on perfectly sensible engineering tradeoffs, let me add; no criticism of such tools intended. Git's internals from what I've seen are not particularly difficult to manipulate directly as such things go, but you are simply by the nature of such a thing taking on a lot more responsibility than if you use the command line-based UI.


Another simple tracking thing that git doesn't do that would easily make git much much better is if it tracked when you did a cherrypick of another commit in the commit graph (not just as some kind of metadata comment in the commit message, but as a kind of soft parent); then if you did a rebase, you could actually "reverse engineer" the rebase (if and only if you absolutely needed to, such as to track what happened during a squash, or to automate re-applying the rebase correctly to someone who was tracking one of the prior commits) and it would largely solve the question of "merge or rebase" with "por que no los dos".

Another thing that Git does not track -

I saw a web designer check in a huge hierarchy of empty directories which would be the structure of the new project that their team should work on. They were quite surprised when it didn't show up on any of the other designer's computers after a "pull". They had to go to the "Git guru" for help.

Windows and Mac both have directories as a major fundamental concept. Everyone knows them and is familiar with them. Subversion tracks directories. Git does not.


I also can't figure out why it doesn't: An empty tree object should be sufficient to do the job. I actually had to write extra code in git9[1] to avoid accidentally allowing empty directories.

[1] https://github.com/oridb/git9


According to this[1], which I think might be an official FAQ:

> Currently the design of the Git index (staging area) only permits files to be listed, and nobody competent enough to make the change to allow empty directories has cared enough about this situation to remedy it.

[1] https://git.wiki.kernel.org/index.php/GitFaq#Can_I_add_empty...


Hm. I should see how git behaves when it gets a repository with empty directories. If it doesn't blow up, I may just add support -- it'd be useful for me.

It kind of does, iff you have a file there. In which case it tracks the path to the file, and then creates the relevant directory structure to get to it.

Of course git is also incredibly painful and brittle if you want "exclude/except" behavior on the gitignore involving subdirectories.


Another thing Git does not and cannot even attempt to do - file locking.

The assumption behind Git is, everyone develops on their machines and/or branches, and then things are merged. This only works for files which can be merged.

There are plenty of things pretty much any project wants to track which cannot be merged, for example Word documents (documentation), Photoshop files (source of graphics), PNGs (icons in webapps), and so on.

With a centralized system, that's easy, just go over to using file locks ("svn lock") for those files. With a distributed system, that's impossible.


> Another thing Git does not and cannot even attempt to do - file locking.

That's a seriously hard problem for a DVCS if you're serious about the "D".

This topic turned into [the single longest thread in the history of the Fossil forum](https://www.fossil-scm.org/forum/forumpost/2afc32b1ab) because it drags in the CAP theorem and all of the problems people run into when they try to have all three of C, A, and P at the same time.

To the extent that Fossil based projects are usually more centralized than Git ones, Fossil has a better chance of solving this, but I'm still not holding my breath that Fossil will get what a person would naively understand as file locking any time soon.

> Word documents (documentation), Photoshop files (source of graphics), PNGs (icons in webapps), and so on.

You want to avoid putting such things into a VCS anyway, because it [bloats the repo size](https://fossil-scm.org/fossil/doc/trunk/www/image-format-vs-...). I wrote that article in the context of Fossil, but its key result would replicate just as well under Git or anything else that doesn't do some serious magic to avoid the key problem here.

Instead of Word files, check in Markdown or [FODT](https://en.wikipedia.org/wiki/OpenDocument_technical_specifi...). (Flat XML OpenDocument Text.) Or with Fossil, put the doc in the wiki.

Instead of PNG, check in BMP, uncompressed TIFF, etc., then "build" the PNG as part of your app's regular build process.

This has the side benefit that when you later change your mind on the parameters for the final delivered PNGs, you can just adjust the build script, not check in a whole new set of PNGs. My current web app has several such versions: 8-bit paletted versions from back before IE could handle 24-bit PNG, then matted 24-bit PNGs from the days when IE couldn't handle transparency in PNG, and finally the current alpha-blended 24-bit PNGs. It'd have been better if I'd checked in TIFF originals and built deliverable PNGs at each step.


> Instead of Word files, check in Markdown or [FODT]

Another fun option is to unzip the DOCX and check that in, since it is mostly a collection of XML files in a zip container. I built a tool to automate zipping/unzipping files like DOCX years ago as pre-commit/post-checkout/post-merge hooks. [1] It's an interesting way to source control some types of files if you can find a way to deconstruct them into smaller pieces that merge better. Admittedly, merging Office Open XML by hand is not a great experience (and dealing with subtly broken or corrupt internal contents is not fun, because programs like Word can fussy when things are even slightly wrong), but you get better diffs sometimes than you would expect.

[1] https://github.com/WorldMaker/musdex


Yes, I cover that option for Fossil at the end of the pointed-to document. I did it in terms of Makefiles rather than commit hooks, but whichever...

> You want to avoid putting such things into a VCS anyway, because it [bloats the repo size]

How do you suggest projects like games handle this, where data files are naturally linked to source files? Imagine trying to sort out an animation bug when you only have source level tracking and no idea which version of the animation data corresponds to the animation source files of the bug report. These data files are not 'built' from the 'build' step as they are the product of artists.


I’d guess not one user in 100.000 uses git decentralized (as in, doesn’t have a blessed “central” repo). It’s the disabling of locking that should be the special case! The big problem with git is that you can’t mark a repo as a master repo/blessed repo (which would be the one where lockfiles are stored). A lot of functionality would be helped if the commands could know which end is the important/central one.

> I’d guess not one user in 100.000 uses git decentralized

I understand your sentiment, but the denominator in that fraction is probably much lower than your guess.

Consider even simple cases like the disconnected laptop case. You may work at a small office with only local employees, and so you have one central "blessed" repo, but if one person locks a file and then goes off to lunch, working on the file while at the restaurant, you still have a CAP problem:

CA: Because the one guy with a laptop went off-network, you have no full quorum, so no one can use the repo at all until he gets back and rejoins the network. (No practical DVCS does this, but it's one of the options, so I list it.)

CP: When the one guy went off to lunch, we lost the ability to interact with his lock, and that will continue to be the case until he gets back from lunch. Also vice versa: if someone still at the office takes out a lock, the guy off at lunch doesn't realize there is lock, so he could do something bad with the "locked" file. (This is the mode DVCSes generally run in by default.)

AP: No locking at all, thus no consistency, thus your original problem that inspired the wish to have file locking.


Not sure I understand the problem. If I lock fileX and go to lunch, then I own the lock on that file while I’m out to lunch. It’s basically analogous me pushing the file fileX.lock to the repo next to fileX, with my user id as content. I can only do it if it isn’t there.

Everyone else will only see that lock if they fetch and if they don’t, they might edit their local copy of fileX too, but would be prevented from pushing their version to the blessed repository by the lock. They can push a copy under another name, or wait until I have removed the lock (but probably can’t resolve the conflict anyway because it’s likely a binary document). So they user will remember to never start editing without taking the lock in the future.

It’s not perfect by any stretch of the imagination but it’s all anyone asks for in terms of file locking. It’s what Subversion always did.


> Not sure I understand the problem. If I lock fileX and go to lunch, then I own the lock on that file while I’m out to lunch.

And if you go on vacation for two weeks instead?


Same thing obviously. But this is just a method of communication. It’s instead of emailing/shouting “don’t edit the background image today please” across the office.

An admin can remove the lock. Or you can allow force-pushing by anyone to replace it or whatever.

Not sure why this is seen as so complicated, version control systems have done it since forever. It’s not trying to solve some distributed lock system in a clever way. It’s dumb centralized mutex per file. And yet again this is all that’s needed (and it’s also added to git in git-LFS!).


You can set up custom merge drivers for different file types.

More importantly, if you've got stuff in your decentralised repo that shouldn't be decentralised, that's not the fault of the DVCS you're using, it's your fault. That everything looks like a nail does not speak against the value of a hammer.


> You can set up custom merge drivers for different file types

True, but there are inevitably some files which still cannot be merged, so the problem remains.

> More importantly, if you've got stuff in your decentralised repo that shouldn't be decentralised, that's not the fault of the DVCS you're using, it's your fault.

Indeed, if you want to store the history of your files - the whole software including the icons it uses, so that you can go back to any previous version and build it - and you chose a DVCS like Git, I would agree the fault was yours.

That's basically what I was arguing, that Git is the wrong choice if you have any binary assets like icons (even if those assets have small filesize) due to the lack of locking, sorry if I was unclear.


You should not be checking in binary files into git. That defeats the entire purpose and bloats the repo size enormously.

Git LFS should be used instead. Or storing a Sha256sum and putting the file elsewhere.


That is a solution but then it's no longer distributed.

Being distributive isn’t a feature for the binary files contents. I don’t want 100 historical versions of a huge game texture, just the latest one. The history is distributed however, so I can see who changed the texture even when disconnected. Centralized binaries like git LFS works like any package/asset manager.

There is no one that would want distributed binaries in git. But people also don’t want to switch from git to something else just because they have a 100GB or 10TB repo. Tooling (build tools, issue management) everywhere has decided that git is all that’s needed.

Not putting binaries in git isn’t a solution at all. Binaries are part of the source in many applications (e.g game assets, web site images...). Distributing every version of every binary to everyone is also not a solution.


Maybe for your particular problem it's worthwhile to setup a CM policy for the (topic) branch naming. For example in your case something like: topic/alkonaut/version-1.0/foo or topic/alkonaut/master/bar.

Or use something like: git log --all --graph --oneline

One tip for "zigzag spiderweb" is to always rebase your topic branch to the target branch prior to a fast-forward merge to the target branch (e.g. master). To clarify: while in your branch topic/foobar: "git rebase master", "git checkout master", "git merge --ff-only topic/foobar".

(There's surely a clever shorthand for the above procedure but when it comes to the command line, I like to combine small things instead of complicated memorized things, it's some kind of Lego syndrome)


Rebase + FF solves the spiderweb problem by removing the branches. But some insist on keeping the branches and I don’t get why a “git log” doesn’t have (and default) to showing important branches as straight lines.

Also with dozens of tiny commands but only a handful of actual desired outcomes, the high operations should be explicit commands. E.g “rebase this branch on master and then squash it and commit on master”.

A lot of the local/remote could also be hidden. The number of times I want to rebase on my local master which is behind origin by 2 commits is... zero.


Said 80/20 was the idea behind Fossil, along the "easy to learn because similar to svn". Seemed a good idea. But lost to having a famous user, which the Linux kernel obviously is.

If its interface is not simple and elegant, I don't see how you can call git simple and elegant, since it's how all users will interact through the interface. And personally I prefer a VCS with less ways to shoot myself in the foot than git.

> If its interface is not simple and elegant, I don't see how you can call git simple and elegant, since it's how all users will interact through the interface.

At least in my experience, the interface makes a lot more sense if you understand the underlying data structure, which does have a certain elegant simplicity. (Even if it doesn't work quite the same as traditional source code control systems. Failing to work with directories is a problem of the git approach. Having a nice offline story is a distinct advantage.)

> And personally I prefer a VCS with less ways to shoot myself in the foot than git.

Oddly, the thing I love about git is how easy it makes it to recover from mistakes. Even if there are more ways to shoot yourself in the foot, there are also more ways to put your foot back exactly the way it was before you shot it. (If only real life worked that way!) This is what the immutable content storage under the hood of a git repository gets you.

If you know the commit hash (and there are a bunch of ways to easily keep track of these), you can get back to the state that's represented by that hash. Commands like merge/rebase/cherry-pick make this particularly easy by providing an '--abort' option that means "I've screwed this operation up beyond repair and need to bail out." And the abort works. As long as you had your target state committed, you can get back to it. (And if that's just a transient state that you don't want to persist, it's easy enough to squash it into something coherent.)


>the interface makes a lot more sense if you understand the underlying data structure

Except that I don't have to understand the underlying data structure to use a more basic VCS like Mercurial. What makes git so special that I would have to do that before being able to use it?

And for recovery from mistakes, I meant stashing the changes somewhere, deleting the repository and downloading a clean copy to start again, which I had to do a few times with Git and never with Mercurial (I might had to do it once or twice with SVN, though).


> Except that I don't have to understand the underlying data structure to use a more basic VCS like Mercurial. What makes git so special that I would have to do that before being able to use it?

I don't think it is special. Generally after a while using a given tool, library, etc. I find it useful to dig in a bit and see what's happening under the hood to help understand why it works the way it does. git just happens to be the tool under discussion at the moment.

> And for recovery from mistakes, I meant stashing the changes somewhere, deleting the repository and downloading a clean copy to start again, which I had to do a few times with Git and never with Mercurial (I might had to do it once or twice with SVN, though).

I think we're talking about the same sort of mistakes. It's hard for me to imagine a case where you'd need to blow away a local git repository entirely. Worst case scenario, there should be good refs available in a remote that are just a 'git fetch' away. (If there's no remote, then blowing away the local repo is essentially just starting from scratch anyway.)


You also don't have to understand the underlying structure for a similarly powerful DVCS like bitkeeper. Yes, it isn't open source, but git was a major step back in usability for my group from bk to git.

Yes, this. I actually tried bk before git, and actually used bazaar and then mercurial before git as well. I was stunned at how arcane the UI in git was made (And how arrogant the community of users around it could be, too). Bk was clean and elegant frankly. I'm no idiot when it comes to the concepts -- but git's CLI interface is just awful.

Bitkeeper is in fact open source now, BTW. Too late, but it is.


You're right. The arrogance was hilarious. People with no experience with bk saying, "What's your problem?" Now, git is super fast, because its core is written by Linus, but I think he is just so much better technically and so far into the internal weeds of Linux for so long in so many areas that he had trouble creating an API for mere mortals.

> git was a major step back in usability for my group from bk to git.

What does that mean in concrete terms? What are the failures you're seeing with git that you weren't with bk? How long has your team used git? bk?


We used bk when I was with the group for 3 years and then switched to git. Been using git for 8 years. I know git, but the ergonomics and basic English semantic meaning of commands is much worse. I have to look up git commands and subflags _all_ the time still for checking out old versions of files to a new file. Looking at tags. Commiting to a new branch, et c. Bk's version of gitk was superior and the usage was nicer. I've used mercurial, svn, cvs, git, and bk. Git is hard, but it is the standard now, so of course I'll continue to embrace it. Just not as ergonomic.

> At least in my experience, the interface makes a lot more sense if you understand the underlying data structure, which does have a certain elegant simplicity.

Yeah. Take an afternoon to read through gittutorial(7), gittutorial-2(7), and gitcore-tutorial(7). Git is a tool, and just like any other tool (car, tablesaw), you will be much better off if you take the time to learn to use it properly. Once you see "The Matrix" behind Git, it becomes an incredibly easy to use and flexible tool for managing source code and other plaintext files.


The fact that you put Git in the same category as tools having a potential of inflicting grievous bodily harm if misused is telling. And why does Git require this whereas other VCS don't? Mercurial was incredibly easy to use nearly right out of the gate, not after an afternoon of work.

> The fact that you put Git in the same category as tools having a potential of inflicting grievous bodily harm if misused is telling.

They're just examples of tools.

> Mercurial was incredibly easy to use nearly right out of the gate, not after an afternoon of work.

I talk about this elsewhere in this thread, but I disagree with this assertion. I find Mercurial baffling and Git very elegant, though it could be an artifact of the order in which I learned the tools.


> And why does Git require this whereas other VCS don't?

It doesn't. It just works better when you take the time to learn how it works. (Which is an experience I commonly have with the tools I use, for whatever that's worth.)


Agreed. If the applications we built using git were as awkward as git...our users and clients would scream, or worse.

I don't understand the relevance of git in all of that

You can almost always recover from your mistakes with git reflog.

Yes and no. I'm a git guy and a fan, but you can really, really mess things up. Usually, this is only when using features like force push; however, there are arguably legitimate use cases for that.

Buddy had a teammate that almost force pushed references from a slightly different repo. What a mess that could have been! I agree regarding the usefulness of reflog, and think the complaints about messing things up with rebase, reset, ect are overblown. It really isn't an issue for intermediate users.


Hence almost always. It’s not a common situation to delete commits from history, etc.

I don’t see the capability to force push as a negative. There are situations in which it’s necessary, like forcibly removing history (something I had to do just today).

Git gives you the ability to shoot yourself in the foot, so it’s up to the operator to not make a mistake like that without backing up the repo to a different place first, etc. Something something only a poor carpenter blames their tools.


Git is neither easy not is it really elegant. It is useful for projects like Linux™ but for the vast majority of projects way better tools like mercurial or fossil would be a much better fit.

After svn, git was a breath of fresh air; far easier to use and reason about, not to mention much faster.

I don't think much of your all-in-one solution like fossil - that's a competitor for GitHub (without the bits that make GH good), not git.

I tried to use hg at one point in the early days, and found it much slower than git. Git's low latency for commands made a substantial difference, perceptually. In principle I think git encourages too much attention to things like rebases, which fraudulently rewrite history and lie about how code was written, just so the diff history can look neater. Working code should be the primary artifact, not neat history, and rebases and other rewrites make it too easy to cause chaos with missing or duplicated commits in a team environment. So ideologically, mercurial is a better fit, but that's not enough to make me use it.

Fit is a function of an environment; when we say survival of the fittest, we mean fitness as adapted to an environment. Feature set isn't the only aspect; at this point, the network effects of git are insurmountable without a leap forward in functionality of some kind.

(I think git & hg are just as elegant as one another; to me, the elegance is in the Merkle tree and the conceptual model one needs to operate the history graph.)


Can you explain what you mean by fossil being a competitor for github, rather than git? Fossil is a scm with additional features for usage, but (the last I used it, and to my memory) it was just the command line fossil very much like git, and that's how I used it.

What makes it the case that fossil cannot be a competitor to git (or hg), in that they are both a vcs?

edit I haven't had a lot of sleep. What I'm trying to ask, I suppose, is why can't you use fossil just like git and ignore any all-in-one features it provides? (This is not to comment on how good, scalable, fast, correct, or robust it is.)


You can, though I suspect the OP's focus on speed means you'd want to turn off Fossil's autosync feature, which makes it operate more like Git: checkins go only the local repository initially, and then you must later explicitly push them to the repo you cloned from.

This is why Subversion was "slow": your local working speed was gated by the speed of the central repo, which could be slow if it was under-powered or overloaded, as was common with the free Subversion hosts of the day. At least with Git, you can batch your local changes and push them all at some more convenient time, such as when you were going off for a break anyway.


> way better tools

I work with a group of people who all know enough git that we're productive, and a few of us know enough git to solve complicated problem.

I've not seriously considered fossil or mercurial -- what are the top three tangible benefits I'd get from them getting our team to switch?


I have never used Fossil, but I used to be a strong proponent of Mercurial. My advice is don't - Mercurial lost, git has won, and fighting against the current is just going to make your life harder.

The main advantage Mercurial has over git is a command line syntax that makes consistent sense. The operations you want to do are easy and as you try and do more complicated things, the new commands will be unsurprising and predictable. If you already know how to use git then this advantage is (mostly) irrelevant.

There are some other features that are interesting - Mercurial has a couple of different types of branches. Bookmarks are like git branches, whereas named branches are a completely different concept which can be useful. 'Phases' tracks whether commits have been shared, and prevents you rewriting (rebasing) them when appropriate.

If you do experiment, note that many 'power user' features are turned off by default. There is a robust extension system, and the default mercurial installation includes a load of standard ones. My config file includes the following to turn on some useful stuff ('record' is the most useful for a staging area like facility):

[extensions] pager = color = convert = fetch = graphlog = progress = record = rebase = purge =


I know Git inside and out, but I had to use Mercurial for a client a couple years ago. I found it to be the most baffling and nonsensical source control experience of my life. It might be a case of cross-contamination. Like you said, each SCM uses similar terms for different concepts, so my Git knowledge may have unfairly colored how I expected similar terms to work in Mercurial.

But stuff like: "hg log" gives you _every commit in the repo_?? When is that ever useful? How do I get only the commits that lead to the current state of the repo? Mercurial doesn't have branches; instead you're supposed to _copy the whole directory_[1] at the filesystem level?? Of course this is ridiculous, so they invented "bookmarks" which are actually Git branches. The extensions thing you mention is also a ridiculous chore. Just have sane defaults. I also found hg's output very dense and hard to understand and read, poorly suited for human consumption.

I dunno. I'm sure Mercurial is fine, many people use it every day, and likely my strong Git bias was affecting my ability to learn Mercurial. But I found it far easier to just clone into Git, use Git to do source control, and then export back to Mercurial when I'm ready to share my work.

[1] https://www.mercurial-scm.org/wiki/TutorialMerge


Mercurial does absolutely not require you copying at the fs level. You're not the first person to be caught out by that tutorial, which I think would serve us best by being deleted.

The 'original' branching method for Mercurial is called Named Branches. The big difference with Git is that every commit is labelled with what branch it is on. This has advantages - if you imagine looking at the train track of 'master' in git with it's divergence for a few commits and then merge, you can see that the 3 commits were on a branch called 'performance', whereas with git that history is completely lost. See: https://www.mercurial-scm.org/wiki/NamedBranches

As usage of git grew, the git branching model gained popularity and so the Mercurial bookmarks extension was created (https://www.mercurial-scm.org/wiki/Bookmarks).

It can be seen as a downside that there are two branching options that you have to choose between.


It's not just that tutorial, see also [1,2]. I think this is/was really an "official" way to do branching, and it seems utter madness to me :)

[1] Sadly the popular hginit.com seems dead, this was my first introduction to Mercurial. https://web.archive.org/web/20180722012242/http://hginit.com...

[2] https://stevelosh.com/blog/2009/08/a-guide-to-branching-in-m...


Branching by cloning was copied from bitkeeper. It was also early git's only branching mechanism. If you listen to Linus's talk when he introduced git at Google, you'll hear him conflate "branch" with "clone" because that's what he was thinking of at the time.

https://www.youtube.com/watch?v=4XpnKHJAok8


I associate that particular madness with Bazaar rather than Mercurial. It stopped being standard practice a while ago, and those old tutorials should be updated or removed.

That is exactly the point. For git you need the ecosystem to cope with it's shortcomings and in addition some experts to help you out of the pickles this software gets you into.

I mainly use fossil for personal projects.

Whats nice about it is that it not only is a very capable VCS but also a complete project management tool with tickets/issues, wiki, blog, mailing list and user management. The setup is ridiculously easy and everyone always has everything in the repository.

In addition fossil never looses data, unlike git which can easily destroy branches that are not pushed, delete stuff while stashing or unstashing, delete stuff when rebasing and so on.

And fossil has a sane command-line interface so that everyone in the team is expert enough to work with it. No need for heroes that save the day from git fricking everything up.


> Whats nice about it is that it not only is a very capable VCS but also a complete project management tool with tickets/issues, wiki, blog, mailing list and user management.

That is not nice. That is way more things that might not match me, more attack surface, more irrelevant cruft I'll probably have to look up how to disable. Project management, wiki and issue tracking preferences are very personal and often don't map particularly well to specific repositories. And _blog_ and _mailing list_? Why, you're spending time on stuff most of your users will hate, not because it's bad, but because they either don't need it or would like it different.

> In addition fossil never looses data, unlike git which can easily destroy branches that are not pushed, delete stuff while stashing or unstashing, delete stuff when rebasing and so on.

Which is why Git is successful. That's by design, not accident. We want to, and sometimes _have to_, delete stuff.


>Whats nice about it is that it not only is a very capable VCS but also a complete project management tool with tickets/issues, wiki, blog, mailing list and user management.

That seems like feature creep.


Not at all. There's a lot of nice stuff that falls out of Fossil's integration of these features, things which you don't get when you lash them up from separate parts.

For example, if I have a check-in comment "Fixes [abcd1234]" I get an automatic link from that check-in comment to ticket abcd1234 from the web UI's timeline view. If I then close that ticket, the comment in the timeline view is rendered in strikethrough text, so I don't have to visit the ticket to see that it's closed.

Similarly, a built-in forum means the project's developers can discuss things with easy internal reference to wiki articles, tickets, checkins...

A recent feature added to Fossil is the ability to have a wiki article bound to a particular check-in or branch, so that whenever someone views that artifact in the web UI, they get a link to the ongoing discussion about it. This is useful when you have more to say about the check-in or branch than can reasonably fit into a comment box. This solves a common problem with experimental features, where you want to discuss it and evolve the idea before it's merged back into the parent branch.

Fossil's user management features are also highly helpful.

    http://fossil-scm.org/fossil/doc/trunk/www/caps/
With raw Git (no Github, GitLab, etc.) it's pretty much all-or-nothing, but with Fossil, you can say "this user can do these things, but not these other things." Thus you can set up a public project, giving anonymous users the ability to file tickets and read the forum but not make check-ins.

These features are as seductive as what Github, GitLab, BitBucket, etc. add to Git, but whereas those are all proprietary services with some of the roach hotel nature to them, with Fossil, you clone the repo and now you've got all of that locally, too. If the central repo goes down, you can stand your local clone up as a near-complete replacement for it.

It's not 100% because Fossil purposely doesn't clone a few things like the user table, for security reasons. You can build a new user table from the author names on the check-ins, though.


>In addition fossil never looses data, unlike git which can easily destroy branches that are not pushed, delete stuff while stashing or unstashing, delete stuff when rebasing and so on.

I have created a lot of feature branches that contain useless commits which I then later corrected with a simple git merge --squash. Preserving those commits sounds like a drag.


You're making bisects, cherry-picks, and backouts harder by squashing history like that.

We cover this and more in the Fossil project document "Rebase Considered Harmful": https://fossil-scm.org/fossil/doc/trunk/www/rebaseharm.md


I hadn't heard of fossil yet, but I'll definitely look into it.

The simplest way to do that is to just use it for a local project. Say, you local ~/bin directory, containing your local scripts, or your editor's config files that you want sync'd everywhere, or a novel you're writing on the side.

If you're like me, you'll find yourself increasingly wonder, "Why would I put up with Git any time I'm not forced to by some outside concern?"


It's been immensely useful in every software project I have ever partaken in. None of them were like linux.

It is immensely useful. That doesn't mean that some other tool might not be better for most cases.

I love git, and don't know most other post-SVN version control systems, but I do recognise the complaints people have about git. There's clearly still room for improvement.


What?

To use git you need to know clone, pull, commit, push. For larger projects branch and merge. Those fall into a lot of boxes that say "easy" or "elegant," and I really wouldn't hesitate to recommend git to a lot of projects, big or small, discounting specific needs, but I guess you've got some specific concerns that really don't translate well into simple statements.

I've used mercurial only to get some external requirements or tools going, and never used fossil. Could you elaborate a bit on why git is worse than either of them and why I should consider switching ?


> To use git you need to know clone, pull, commit, push. For larger projects branch and merge

What if you committed to the wrong branch? What if you tried to merge commits from another user and made a mess of it all? What if you pushed something you want to roll back? What if you committed with the wrong commit message and want to fix it? What if you followed the policy of "commit often" but ended up with lots of irrelevant commits, and want to fix this so that it only has meaningful commits. How can you find who committed what? Or which branches contain a commit?

I know how to do all of this. But these are genuine questions a user of git will need to get answered, and git quickly becomes confusing/inconsistent once you're off the "happy path".


You need to know much more to use git in anything involving more than one branch. You need to know git checkout, git stash, you need to know how to fix conflicts, you need to know rebase vs merge and how to understand git log when you use merge, you need git reset, you probably need git cherry pick occasionally.

One of the major day to day annoyances is the fact that (by default) I can't work on multiple branches without committing or stashing all the time, since switching branches, instead of being a simple 'cd' like in other VCSs is instead a destructive operation on my entire repo (also causing re-indexing for all content-aware tools...). And if I want the normal behavior? I need to learn some other commands to set that up, and learn their filesystem restrictions...


Disagree. I could teach a child to use git. I don't agree it's elegant in a face value way. Yet if you've used other version control systems, git has features that you'd dream of (I "invented" some of the features of git on my own). So in a way it really is an elegant solution. I can't think of much that I really hate, or wish to change; and I can't think of any serious proposal to "fix" it.

People have probably been happy with their tools for centuries. Just because one cannot imagine something better doesn't mean there's no possibility for it to exist. If anything, this defeatist attitude may prove the author right.

> defeatist attitude

This implies that the parent's sentiment about git is negative.

At least I personally feel very positively about git and am not missing much from it.


> This implies that the parent's sentiment about git is negative.

No, I meant defeatist as in "I have given up finding something better as it will never exist".


If the tool you have works well for your use case, looking for and learning to use a "better tool" can be a waste of your time with little benefit.

And that's how you can get stuck in a local maximum.

With the exception of very very singular people, isn't every maximum going to be local? Even then they'll be stuck in a bunch of other local maximum's outside their own area of expertise.

Maybe, but that has nothing to do with my point.

I remember what was the go-to tool before git. It was Subversion. And CVS before that. I would not say people were happy with those tools.

I was happy with SVN because I used CVS before. I was even happy with CVS, because I had nothing before. It's hard to imagine now.

Software development has seen massive improvements in the past 20 years. I see no reason why that would stop now.


Asymptotes. Sometimes it turns out we can solve a particular problem so comprehensively that "solve this problem better" is never a reasonable step. You can try it anyway, of course, but you're unlikely to get acknowledgement much less praise.

The answer to "Why doesn't my music sound as good as I wanted?" isn't going to be "CD's 44.1kHz and 16-bit PCM isn't enough". It might be "This cable has been chewed by a dog" or "These speakers you got with a cheap MIDI system in your student dorm are garbage" or even "the earbuds you're wearing don't fit properly" but it won't be the 44.1kHz 16-bit PCM.

Likewise, it is plausible that Git is done technology-wise. That doesn't mean there won't be refirements to how it's used, recommended branching strategies, auto-complete, or even some low-level stuff like fixing the hash algorithm - but the core technology is done.


> The answer to "Why doesn't my music sound as good as I wanted?" isn't going to be "CD's 44.1kHz and 16-bit PCM isn't enough".

Yeah, try telling this to a fan of 1960s or 1970s rock. You'll get an earful about rich guitars and fat synths, which only a 100% analog, tube-amp process from studio to ear is capable of replicating.


> Yeah, try telling this to a fan of 1960s or 1970s rock. You'll get an earful about rich guitars and fat synths, which only a 100% analog, tube-amp process from studio to ear is capable of replicating.

And anyone with a basic understanding of electronics should laugh in the faces of these people. The idea that a signal can be carried on a wire or recorded on tape but can not be replicated digitally is absolute nonsense.

If someone wants to claim that their preferred format captures higher frequencies than 44.1kHz sampling allows for, that's at least plausible, but that can be solved by using higher sampling rates like 96 or 192 kHz. At that point you've exceeded the capabilities of all mainstream analog storage media.

If they are looking for specific effects created when pushing the limits of analog hardware, like the "crunch" of a tube amp, that's fine too, but they need to acknowledge that they're treating the amp as an instrument in that case and its output can still be recorded digitally just fine.


I was happy copying my files back in the days for versioning (final_draft, final_draft01, final_draft_absolute_final, final_draft_use_this_one), but that was because I didn't know of anything better. What I'm saying is that even though we don't see it now, there's probably something better out there waiting to be discovered.

I hope it's the last DVCS, if it can save me the hassle of learning a new one. There is some learning curve, but it works just fine once you know how to use it.

Over time you realize there are no transient states. You cannot neglect, for example, complex install just because it "happens once". Nothing ever happens just once. You will always have to reinstall, probably multiple times. So when people say the same thing about installation complexity, they are being naive.

In the same way, learning is not a transient state either. You will always have to relearn. Those impossible barriers that you eventually got through will reduce to speedbumps - but they will always be there, slowing you down. And if you don't use it enough, you'll have to relearn.

Also, be aware that once you've climbed a learning curve, at least unconsciously you are no longer incentivized to simplify it for those who come after. Why reduce the barrier-to-entry for others, after all? You got through it, so why can't they? And this is why generations of kids learn bad music theory, and generations of physicists learn bad particle names. It's important to be aware of this effect so you can counter-act it.


> I hope it's the last DVCS,

The only reason I'd disagree is if the next source code control system were somehow as much an improvement over git as git was over its predecessors.

> if it can save me the hassle of learning a new one.

Not to mention the hassle of converting all those legacy repositories, converting CI/CD, etc.


After using darcs, I can never see Git as elegant, no matter how clearly it is a more pragmatic choice these days.

I agree. I’m always embarrassed to say that I still use darcs but it’s just entirely obvious how to use it. There is no mystery. The choice to prompt the user for thing makes usability insanely high.

Yes it’s slow for large projects but honestly I just deal with that.


Have you looked at Pijul? I understand it has resolved the major shortcomings of Darcs.

>> The possibility of git being the last mass-market DVCS within my lifetime leaves me with warm fuzzy feelings.

Agreed. Technology should converge on a best solution so we can stop chasing things and get work done. Stable open source standard solutions are what we need more of.

Some anti-pattern examples are C++ and Vulcan.


> Some anti-pattern examples are C++ and Vulcan.

Are you referring to the Vulkan API? If so, why do you see it as an anti-pattern example?

I like the API and I think it is a great and necessary improvement over OpenGL. I actually hope to see Vulkan become the ‘stable open source standard solution’ for graphics.


Maybe I should have said OpenGL since it keeps getting new versions. Does Vulcan? It just seems like kronos keeps moving the target.

You want technology to converge without changing? That's impossible.

Honestly if we can reduce the tooling churn for development that sounds great

here's me sincerely hoping for pijul and/or darcs to deliver something much better. (obviously if it's only a little better then there isn't much point in switching.)

I was hopeful for pijul. But the more time passes without any noticeable progress or traction the less hope I have.

To me it's proof that we as an industry finally arrived at a consensus over something.

Yeah, but it's like Betamax losing to VHS.

The interface is arguably the most important part. I.e. it is a tool developed for humans to use, so would ideally have simple, consistent, and by extension intuitive ergonomics. Elegance of internal implementation is secondary.

The interface can be swapped out if the underlying storage is fine. There is nothing preventing you from writing a different front end where “checkout” doesn’t do all the things.

Which is why there are dozens of Git front-ends, fragmenting the market, reducing the benefit of cross-training within a team. "Oh, you can't do that in your TortoiseStudioCodeThingy? I just right-click and select Frobnicate File, and it fixes all that!"

Thus editor wars, language wars...

Alternative: a tool like Fossil where the CLI is sensible from jump so the whole team doesn't replace it with something better, uniquely per team member.


> There is nothing preventing you from writing a different front end

Or you could just use a tool where the interface is fine out of the box


Yet no one is doing it.

Yup, doesn't sound like a negative to me. People will still try to build simpler ecosystems like GitLab/GitHub/Atlassian, with similarly mixed results.

git is like a dozen Perl and shell scripts taped together with a tiny bit of Tcl on the side. It is nowhere near elegant. It's a monstrosity.

Every version control system that's become dominant in my lifetime became popular because it fixed a major obvious flaw in the previous dominant system (RCS, CVS, SVN).

From where I sit, Git has a couple obvious flaws, and I expect its successor will be the one that fixes one of them. The most obvious (and probably easiest) is the monorepo/polyrepo dichotomy.


If I were to pie in the sky dream up a replacement for git, I'd have it store the AST of the parsed code instead of a text file. It would solve a lot of problems with refactoring crapping all over the history. Like I said, pie in the sky. Probably never gonna happen.

Personally I don't see git's problems with large binary files and tens of millions of commits as being major issues. Those two alone are way less valuable than git's ecosystem and mindshare.


What you want has nothing to do with git (which is a storage model). You can use arbitrary diff and merge resolution algorithms with git's plumbing, which would give you the AST-aware functionality that you want.

I used VisualAge's Envy for 3 years while working in a Smalltalk project.

Envy does versioning of classes and methods, and you can programmatically access to the model.

It allowed us to build tools around the VCS. For example, we had tools to merge multiple feature branches and resolve conflicts automatically. We also used the same tools to produce migration scripts for our database (GemStone). That was 18yrs ago! and today sounds irreal.

You can build tools on top of git, but the versioning “unit” gets in the way. (e.g imagine the possibility to encode refactorings in your change history and reapply or rollback them).

I’m not trying to criticize git. I think it is the best file based DVCS. My point is that many dev tools that we use today are extremely rudimentary, because the lack of good abstractions. And I don’t think that git provides a good model to build those abstractions on top of it.


That's like saying DVCS has nothing to do with VCS -- it's just the server model. Every major advance has been accomplished by increasing the scope of version control.

Arbitrary diff/merge in Git is a great example of the Turing Tar-Pit. It's possible, but prohibitively inefficient for many things I want to do. You can't add your own types, index, or query optimizations.

Today, if I want to store data for my application, I have a choice between good support for rich object types and connections (e.g., Postgres), or good support for history and merging (e.g., Git). There's no one system that provides both.


> Today, if I want to store data for my application, I have a choice between good support for rich object types and connections (e.g., Postgres), or good support for history and merging (e.g., Git). There's no one system that provides both.

I like the way you put this. In case anyone's interested in brainstorming I'm dabbling in this problem with a thing called TreeBase (https://jtree.treenotation.org/treeBase/). It's still a toy at this point, but it stores richly typed data as plain text files to leverage git for history and merging and then can use SQLite (or others) for querying and analysis. A very simple database in the wild looks like this: https://github.com/treenotation/jtree/tree/master/treeBase/p...


Have you looked at Qri? (https://github.com/qri-io) - free & open source dataset versioning. Also: https://qri.io

AFAIK I've never seen that one. Thank you very much for the link. Looks very interesting and related to the stuff I'm working on. Thanks!

Couldn't you store the exported database as sql commands? I'm not familiar with every git hook, but if there aren't enough to automated that I guess you could wrap it.

The slowness of destroying a whole database and then recreate it when checking out should be something you can handle by relying on the diff to generate a series of delete commands and a series of insert commands.

But yeah, I guess committing will be slow if you have a lot of data to export. For the time being, it's a trade off to be made.

[I might consider testing this with my current database project. But I'm using SQLite so I guess that implies a lot less data than Postgres.]


> That's like saying DVCS has nothing to do with VCS

I could see myself agreeing to that.


frutiger is correct that the diff algorithm has nothing to do with git itself, in that git can accept pretty arbitrary diff algorithms in the first place for all the commands that take one.

Check out git-diff(1) and --diff-algorithm. --anchored is the one I find the neatest.


DVCS indeed has nothing to do with VCS, it has a lot to do with the data model used by the VCS.

A modern but still centralized VCS like Subversion or Perforce is what you get if you first add networking (CVS) and then atomic commits. Without atomic commits you are pretty much forced to keep a centralized server, and Subversion didn't try to change the server model after adding atomic commits.

DVCS instead is what you get if you start with local revision tracking like RCS, and add atomic commits before networking. Now the network protocol can work at the commit level and is much more amenable to distributed development.


Except you have to run them every time.

Imagine instead of that were available as a sort of materialized view.


> Imagine instead of that were available as a sort of materialized view.

I don’t understand what you mean by this, can you provide some more detail?


Why do we have byte code? Why not run everything in interpreters? Because parsing pure text takes a lot of work. So we store it in an intermediate mode to economize.

Saying just parse it every time is denying that there are very real costs associated with hat decision.


You can store the result of the parse in git if you want to.

It would have to be specific to certain languages, which would, in turn, hinder adoption of new languages to some degree if the git-next took off. So, I'd prefer not to have that be a feature. :)

I think you could write it as the ability to a diff on a binary ast without fussing too much about what the ast represents. Then you merely need to write a parser/serialiser combo for your language to the ast as a repo plugin.

Otherwise it won't just be new languages which suffer, but users of supported languages will suffer when there's an upgrade.


Most language ASTs don't encode unfinished or work-in-progress code very well (the difference in AST shape between missing one `{` and the fixed code can be substantial). You may think it better to always only commit working code, but your source control system is also a backup system if you need to save a work in progress branch to come back to it, and also sometimes a communications system if you want to request a coworker examine your code to help you pinpoint bugs you can't find or review work in progress.

Most language ASTs also don't encode useful-to-the-programmer but useless-to-the-compiler information like comments and whitespace. There's been good progress in that (the Roslyn AST system has some neat features), but in practice an AST is always intended more for the compiler than the user/source writer. This also is reflected often in speed, a lot of languages have a relatively slow AST generation (which would add sometimes very noticeable wall clock time to commit time, depending of course on language and hardware).

Plus, of course, all the usual bits that ASTs are extremely varied among themselves (some are weirder DAG shapes than trees, for instance).

An experiment I ran was to the use the "next step down" from full AST which is your basic tokenizer / syntax highlighter. Those designed to deal well with malformed/unfinished/work-in-progress input, and to do it very quickly. Years back I built a simple example diff tool that can do token-based diffs for any language Python's commonly used syntax highlighter Pygments supports. [1] In my experiments it created some really nice character-based diffs that seemed "smart" like you might want from an AST-like approach but just by doing the dumb thing of aligning diff changes to syntax highlighting token boundaries.

You could even use it/something like it/something based on it today as your diff tool in git if you wanted, with the hardest part configuring it for which language to use for what file. (I never did do that though, partly because the DVCS I experimented with this for didn't have a pluggable diff system like git does, nor did it support character based unidiff as a storage format which the experiment was partly to prove both ideas could be useful.)

[1] https://github.com/WorldMaker/tokdiff


Look into Unison, a language that stores the AST and immutable history of all functions to provide a combination of package manager, IDE, and DVCS. Once you store the AST and all history, some fascinating side effects happen!

https://github.com/unisonweb/unison


I want an editing environment which operates on the AST of my code (obviously it would have to support every language I wanted explicitly to do this), so that files become entirely irrelevant, I never have to worry about formatting differences or where things are or whether that function is in that file or that file. A bit like working in a Smalltalk image.

If that was then extended into the version control system that'd be even better. Oh yes.

But getting a new language into these things would probably be a nightmare.


I don't see any obvious flaws with Git.

The monorepo/polyrepo discussion exists apart from your choice of version control system and has little to do with Git, as far as I can tell


>I don't see any obvious flaws with Git.

Large files and long histories hinder its total dominance in the game and art industries. Because of git's shortcomings polyrepo is a near necessity not simply a stylistic choice. LFS is a bolt on solution that could/should have better support.


> Because of git's shortcomings polyrepo is a near necessity

I'm intrigued by this claim. I've come to the opposite conclusion - that monorepo is near necessity with git because there's no tools for branching/rebasing multiple repos at once.


After using both I can say both have problems. Polyrepos lack tools for working with multiple repos simultaneously, and require more attention to versioning. Monorepos have longer histories, and the large number of objects can hurt performance.

Which one you should use depends on which downsides are less impactful for your use case.


I think he's talking about large files there, which is an issue in GIT btw.

I would use it for storing and syncing libraries of large images for my photography, if it were feasible.


Git stores the diffs in chronological order doesn’t it? I recall reading about someone doing a commercial implementation where the commits are stored in reverse chronological order. I’d been thinking that was github but I’ve never been able to find the article again.

Git's model (which it copied from monotone IIRC) is not diff-based, it's snapshot-based. That is, commits are not stored as a diff to the previous commit, but as the whole state of the tree plus a pointer to the previous commit(s).

As an optimization, when it packs several objects together in a pack file, it can store objects as a delta to other (possibly unrelated) objects; there's a whole set of heuristics used to choose which objects to delta against, like having the same file name. And yes, one of these heuristics does have an effect similar to "reverse chronological order"; see https://github.com/git/git/blob/master/Documentation/technic... for the details.


Git does not store differences between files, but packs (versions of) files (to save disk space) in single .pack files.

> LFS is a bolt on solution that could/should have better support.

Like gitlabs own web ide that seems to ignore LFS rules.


Git is extremely user-unfriendly from the command line.

It would also be nice to have a repo that isn't language-agnostic. It's too easy to track non-semantic changes, like white space.


I don't think that's true. I've been using git's CLI since I started using git a few years ago, and exactly zero of my problems with git could've been solved by a different user interface (be it GUI or a "better" designed CLI). Pretty much all of my problems have been with my lacking understanding of the abstractions that git uses to make all of the powerful things it can do possible.

> I don't think that's true.

You are in the minority. It's such a ubiquitous experience, a running joke in the industry. Saying a tool is useful and powerful, is fine and good. That sentiment has nothing to do with usability.


Isn't that the same thing? If you need to be taught the underlying abstraction to be able to understand the UI you've got a text book example of leaky abstraction.

After I had to unfuck a repository for the n-th time, I trialed a switch to Mercurial, we switched shortly after. I can count on one hand how many times I've had to intervene in the last few years.


> It would also be nice to have a repo that isn't language-agnostic. It's too easy to track non-semantic changes, like white space.

In my opinion this is a problem with programming languages rather than version control. Namely we mix presentation and representation when using text as our source code. In the case of whitespace we have an infinite number of syntactic presentations which all correspond to the same semantic representation. Tooling has been created to try to deal with this such as code formatters which canonicalize the syntactic presentation for other tools. Git itself even has to deal with this because of platform differences, i.e. LF and CRLF.


> Git itself even has to deal with this because of platform differences, i.e. LF and CRLF.

I loathe this about git. It has caused my team members so many problems, like images marked as changed.

git should either be dumb about content or smart, not secretly in between.


I tend to think that it's fine, because I remember what it used to be like... what we have now is the "easy" UI!

And they do occasionally put some new stuff in that helps. Like the recent version which adds new commands to split out the two completely different uses for `git checkout` (making/switching branches and reverting files).


Between CVS and git, Canonical releases an excellent DVCS called Bazaar that I loved. I think we all ended up with git because of GitHub.

There is zero doubt in my mind that it's Github that "made" Git. Without it, it would be just another one of many DVCSes. Git's value isn't inherent, it's all down to network effects.

Git submodules have lots of problems. I wanted it to work like a symbolic link to another repository so that I could develop both at the same time on the same machine. Just like how pip allows me to install a Python package in editable mode. If I make a change in the submodule, the superproject should automatically see it.

Maybe it's time for a version control system to come up and fix a problem no one knew they had.

How about history tracking for file renames doesn't work well? It's hit and miss if git blame --follow works.

Subversion did this better even before Git existed.

Git is completely oblivious to moving code from one file to another. Git blame will never show you the original commit if you just relocated a method to another file. Due to this, refactoring often put additional hurdles into exploring the code history.


Yes -- that's what it means to be a flaw with the entire current generation of version control systems.

"Need a centralized server" wasn't specific to one (pre-DVCS) system, either.


That's true, but it's possible to imagine a system superior to git solving this in someway, in the same way that git solves distributed merges.

cherry picking is a shitshow!

i would love to be able to use a patch-based VC.


You can still generate and apply patches with Git.

I believe GP is referring to Darcs/Pijul-style patches, rather than Git "patches," which are really just diffs. See https://en.wikibooks.org/wiki/Understanding_Darcs/Patch_theo...

yep, nemo got it right. git basically hacks cherry-picks in the same way previous VCS’s hacked branches.

unfortunately there are no patch-theory based VCS’s with a practical level of usability. what git was to monotone, X is to darcs/pijul, where X hasn’t been created yet.


> I don't see any obvious flaws with Git.

Merge conflicts.


Merge conflicts are not so scary, and are an elegant way to handle distributed changes with simultaneous edits to a single file.

If the conflict is huge, rebasing can help you by "playing" the commits from one branch one at a time so the conflicts are smaller / easier to fix.

At a previous job, a team was forced to use checkout-style VCS due to their manager's unfounded fear of merge conflicts; I couldn't go in that office without hearing one developer shout to another: "Hey, can you finish up and check in that file so I can get started on my changes?"


I’ve spent too much time helping others fix bad merges, and I still catch myself making mistakes. There’s a lot of work that could be done for clarity and error avoidance.

Sounds like something that would have been solved by using Darcs instead, honestly.

No VCS will be merge conflict-less without locking files.

I'd rather not lock files.


I'd rather lock files, but be able to unlock them, to be able to coordinate with others and proactively reduce merge conflicts.

The only way to make that simple is to centralize the version control system so that you can have a single arbiter of who has what locked. To add easy locking to Git, you'd have to turn it back into a non-distributed VCS.

I don't think I need to sell the value of DVCS over VCS, but what seems to get lost is that buys you a certain amount of essential complexity, expressed in the CAP theorem and its consequences.

We discussed this deeply on the Fossil forum: https://www.fossil-scm.org/forum/forumpost/2afc32b1ab

We came to no easy answers, because there aren't any. You only get a choice of which problems to accept.


I'm talking about purely advisory locks. Accessing a locked file would let you know who locked it, and if you unlock it, it would just notify them. So it's just a communications mechanism in addition to regular merges.

The alternate method is that locking a file marks you as an interested party to a merge, allowing you to review the correctness of a merge.

This would be purely to avoid changes being lost during merges.


Merge conflicts are a part of any VCS that allows two people to edit the same file at the same time.

However, in some cases (like non-mergeable binary files), it is actually better to have a system that allows one user to take a lock on a file and have exclusive editing abilities. The git protocol has no support for those workflows, and so people end up using a Google Doc or something to track who is modifying what file. Definately a place for improvement.

How do you lock a file in a distributed system? Many people won't be online. Many people won't be on the exact same head.

> Many people won't be on the exact same head.

This consideration is actually irrelevant to locking non-mergeable binary files. It doesn't matter what branch we're on or where the file is located, only that you and I both want to edit the logo. Eventually, either your version must be based on mine, or mine based on yours, since they will be merged.

So it's probably better not to have that file in Git, since it doesn't support the workflow around which Git is based.

It's actually right to store your design documents in Google Docs or a wiki and your code in Git, rather than everything in Git.

It is easy to have one filestore to rule them all and in the darkness bind them, but if you want to do different things with them, you have to do different things with them. I'm not sure that it's possible to unify text file and binary doc based workflows, but it seems we don't have to worry because users automatically use the best tool for the job and it's only hackers who tie themselves in knots trying to make git do everything.


I've worked in teams where developers seemed terrified of merge conflicts, to the point of telling eachother not to edit a particular file which seems absurd. Maybe I don't know any better but they seem like part of life.

When at least one person is making changes that touch large parts of the file, it's very sensible to ask others to not touch it, if you don't want to spend hours merging it manually later.

> Git has a couple obvious flaws, and I expect its successor will be the one that fixes one of them

The successor might very well be git 3.0, though.


> The most obvious (and probably easiest) is the monorepo/polyrepo dichotomy.

Isn't that being addressed in Git with the partial clone functionality?


I’ve read a couple descriptions of the internals of git that don’t disagree with the design of SVN, so I’m not unsure why you can’t theoretically check out a single subtree. Either the documentation is too hand wavy or some implementation details have blocked that possibility.

Are you aware of git-subtree?

Git sub tree is the exact opposite of what I’m talking about.

Also: > The responsibility of not mixing super and sub-project code in commits lies with you.

Is a lie. The responsibility of not mixing code lies with every member of the team. Does the author work alone?


Can you explain more about that? I've worked with both and I feel like monorepo is kinda a pain, but I dont understand where Git fits in to either directly. Seems like it just snapshots files.

It's only a problem at a really large scale. At the scale of Microsoft or Facebook, there are factors that lead to the use of a monorepo being more efficient. At that big of a scale, companies have enough resource to develop internal tooling to deal with the problem (e.g. the use of Mercurial at Facebook).

FWIW, in case of Microsoft at least, it's more a question of product size than company size. Microsoft doesn't use a single monorepo for everything, like Google (so far as I know) does - just look at http://github.com/microsoft/; and that's not even counting all the VSO repos! It uses product-specific monorepos for some large products.

It would be nice if you and everyone else stopped gatekeeping this problem.

We have 600 devs and face these problems. I can assure you we sure as hell dont have the resources spare to reroll git. We're way too busy rerolling everything else.


In this context "monorepo" means a huge repository with many many revisions. Git has several well documented deficiencies in this scenario. See [1] for microsoft's experiences with git.

[1] https://devblogs.microsoft.com/bharry/the-largest-git-repo-o...


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: