Is Git Irreplaceable? (2019)

comex · on Jan 7, 2020

Git's biggest flaw is that it doesn't scale. If a new system can fix that without sacrificing any of Git's benefits, I think it can topple Git.

It's ironic that Git was popularized in the same era as monorepos, yet Git is a poor fit for monorepos. There have been some attempts to work around this. Google's `repo` command is a wrapper around Git that treats a set of smaller repos like one big one, but it's a (very) leaky abstraction. Microsoft's GVFS is a promising attempt to truly scale Git to giant repos, but it's developed as an addon rather than a core part of Git, and so far it only works on Windows (with macOS support in development). GVFS arguably has the potential to become an ubiquitous part of the Git experience, someday... but it probably won't.

Git also has trouble with large files. The situation is better these days, as most people have seemingly standardized on git-lfs (over its older competitor git-annex), and it works pretty well. Nevertheless, it feels like a hack that "large" files have to be managed using a completely different system from normal files, one which (again) is not a core part of Git.

There exist version control systems that do scale well to large repos and large files, but all the ones I've heard of have other disadvantages compared to Git. For example, they're not decentralized, or they're not as lightning-fast as Git is in smaller repos, or they're harder to use. That's why I think there's room for a future competitor!

(Fossil is not that competitor. From what I've heard, it neither scales well nor matches Git in performance for small repos, unfortunately.)

kemitche · on Jan 7, 2020

I disagree that Git's biggest flaw is its lack of scalability. Cases where git needs to scale tend to be isolated to companies that have the manpower to build a finely-tuned replacement (see: MS, Google).

Git's flaws are primarily in usability/UX. But I think for its purpose, functionality is far more important than a perfect UX. I'm perfectly happy knowing I might have to Google how to do something in Git as long as I can feel confident that Git will have the power to do whatever it is I'm trying to do. A competitor would need to do what git does as well as git does it, with a UX that is not just marginally better but categorically better, to unseat git. (Marginally better isn't strong enough to overcome incumbent use cases)

And for the record: I think git-lfs issues are primarily usability issues, and tech improvements. The tech enhancements will be solved if there's enough desire, and as I mentioned the usability problems are more annoyances than actual problems.

Ace17 · on Jan 7, 2020

I work for a 40-people game studio.

A major limitation of git is how it deals with many "big" (~10Mb) binary files (3D models, textures, sounds, etc.).

We ended up developing our own layer over git, and we're very happy ; even git-lfs can't provide similar benefits. This technique seems to be commonplace for game studios (e.g Naughty Dog, Bungee), so certainly git has room for improvement here.

mathw · on Jan 7, 2020

This does not surprise me. Git's original purpose of managing versions of a tree of text files (i.e. the source code of the Linux kernel) pervasively influences it, and I wouldn't expect it to be any good for working with binary files or large files.

If somebody comes up with something that matches Git's strengths and also handles binaries and biggies much, much better then they could definitely topple Git with it. It'd take time for the word to spread, the tools to mature and the hosting to appear, but I can definitely see it happening.

I think most people know that Git isn't perfect, but it's also the case that coming up with anything better is an extremely difficult task. If it wasn't, someone would have already done it. It's not like people haven't been trying.

taftster · on Jan 7, 2020

Do you have tools that you can utilize diffs from your binary file changes? Or does a change simply just replace all the bytes.

I'd argue if it's the later, that git was never the right choice to begin with. You don't really want to record a full 10MB of data every time you change one pixel in your texture or one blip in your sound, right?

So I don't know if this is a "major limitation" of git per se. Not saying there's a better solution off-the-shelf (you're obviously happy with your home grown). But this was probably never a realistic use for git in the first place.

MaulingMonkey · on Jan 7, 2020

While I can't speak for the person you're replying to, the technology at least exists. Binary diffs are sometimes used to distribute game updates, where you're saving on bandwidth for thousands if not millions of players - which costs enough $$$ to actually be worth optimizing for. On the other hand, between simpler designs and content encryption being sometimes at odds with content compression... so is just sending the full 10MB. For a VCS - I'd probably be happy enough to just have storage compression - using any of the standard tools on the combination.

> You don't really want to record a full 10MB of data every time you change one pixel in your texture or one blip in your sound, right?

Actual changes to content in a gamedev studio are very unlikely to be as small as a single pixel. Changes to source code are unlikely to be as small as a single character either. And we definitely want a record of that 10MB.

We're willing to sacrifice some of our CI build history. Maybe only keeping ~weekly archives, or milestone/QAed builds after awhile, of dozens or hundreds of GB - and maybe eventually getting rid of some of the really old ones eventually. Having an exact binary copy of a build a bug was reported against can be incredibly useful.

tyingq · on Jan 7, 2020

I usually see bsdiff or courgette cited as good tools for binary diffs:

http://www.daemonology.net/bsdiff/

https://www.chromium.org/developers/design-documents/softwar...

chrisweekly · on Jan 7, 2020

"Having an exact binary copy of a build a bug was reported against can be incredibly useful."

Sure, immutable build artifacts can be invaluable -- but aren't they also an orthogonal concern?

MaulingMonkey · on Jan 8, 2020

> Sure, immutable build artifacts can be invaluable -- but aren't they also an orthogonal concern?

One person's immutable build artifact is another person's vendored build input.

It's common to vendor third party libraries by uploading their immutable build artifacts (.dll, .so, .a, .lib, etc.) into your VCS, handling distribution, and keeping track of which versions were used for any given build. It makes a lot of sense if those third party libraries are slow to build, rarely modified, and/or closed source - no sense wasting dev time forcing them to rebuild it all from scratch.

The next logical step is to have a build server auto-upload said immutable build artifacts into your VCS, for those third party libraries that you do have source code for, when your VCS copy of said source is modified. Much more secure and reproducable than having random devs do it.

And hey, if your build servers are already uploading build artifacts to VCS for third party libraries, why not do so for your own first party build artifacts too? Tools devs spending most of their time in C# probably don't need to spend hours rebuilding the accompanying C++ engine it interoperates with from scratch, for example, so why not "vendor" the engine to improve their iteration times?

This can lead to dozens of gigs of mostly identical immutable build artifacts reuploaded into your VCS several times per day, with QA testing and then integrating those build artifacts into other branches on top of that. The occasional 10MB png is no longer noticable by comparison.

Nullabillity · on Jan 8, 2020

I can sympathize with the game assets argument, but this problem is just the result of trying to stuff a square peg into the round hole.

Build artifact caching is a different problem from source control, with very different requirements:

1. As you mentioned, the artifacts tend to get huge.

2. The cache needs to be easy to bypass. From your example, it needs to be easy for the C++ engine devs to do builds like "the game but with the new engine" to test out their changes.

3. The cache needs to be precise, so you don't end up with mystery errors once it finally does trigger, or people wondering why their changes don't seem to apply.

4. The builds need to be exactly reproducible, so you don't end up with some critical package that only Steve Who Left 5 Years Ago could build (or Jenkins Node 3 That Just Suffered A Critical HDD Failure).

Git either doesn't care about or fails spectacularly for each of those points. In particular, #3 will be very confusing since there will be a delay between the code push and the related build push.

Nix[0] solves #2 and #3 by caching build artifacts (both locally and remotely[1][2][3]) based on code hashes and a dependency DAG (for each subproject or build artifact, so changing subproject X won't trigger a rebuild of unrelated subproject Y, but will rebuild Z that depends on X). It helps with #4 by performing all builds in an isolated sandbox.

#1 is solved by evicting old artifacts, which is safe as long as you trust #4. If the old artifact is needed again then it will be rebuilt for you transparently. Currently this is done by evicting the oldest artifacts first, but it could be an interesting project to add a cost/benefit bias here (how long did it take to build this artifact, vs the amount of space it consumes?).

[0]: https://builtwithnix.org/

[1]: https://nixos.wiki/wiki/Binary_Cache

[2]: https://nixos.org/nix/manual/#sec-sharing-packages

[3]: https://cachix.org/

MaulingMonkey · on Jan 9, 2020

Assets and code have mostly the same needs out of a version control system - diffs, history, control over versions, etc. - and there are version control systems which handle both adequately. That said, I'll grant git is quite focused on code version control specifically - and I would not dream of trying to scale assets into it directly.

> 1. As you mentioned, the artifacts tend to get huge.

This, admittedly, is more common with build artifacts. That said, I've hit quota limits with autogenerated binding code on crates.io, with several hundred megs of code still being in the double digits when better compressed by cargo than I can figure out how to compress with 7-zip.

And that's a small single person hobby project, not a google monorepository.

> 2. The cache needs to be easy to bypass

I need to bypass locally vendored source code frequently as well, to test upstream patches etc.

> 3. The cache needs to be precise, so you don't end up with mystery errors once it finally does trigger, or people wondering why their changes don't seem to apply.

Also entirely true of source code.

> 4. The builds need to be exactly reproducible, so you don't end up with some critical package that only Steve Who Left 5 Years Ago could build (or Jenkins Node 3 That Just Suffered A Critical HDD Failure).

Enshrining built libs in VCS is an alternative tackling of the problem. You might not be able to reproduce that exact build bit-for-bit thanks to who knows what minor compiler updates have been forced upon you, but at least you'll have the immutable original to reproduce bugs against.

> In particular, #3 will be very confusing since there will be a delay between the code push and the related build push.

It's already extremely common - in the name of build stability, including with git - to protect a branch from direct push, and have CI generate and delay committing a merge until it's verified the build goes green. By wonderful coincidence, this is also well after CI has finished building those artifacts - in fact, it's been running tests against those artifacts - so it can atomically commit the source merge + binaries of said source merge all at once. No delay between the two.

There are some caveats - gathering the binaries can be a pain for some CI systems, or perhaps your build farm is underfunded and can only reasonably build a subset of your build matrix before merging. Or perhaps the person setting it up didn't think it through and has set things up such that code reaches a branch that uses VCS libs before the built libs reach the same spot in VCS - I'll admit I've experienced that, and it's horrible.

Nix, Incredibuild, etc. are wonderful alternatives to tackle the problem from a different angle though.

chrisweekly · on Jan 8, 2020

Yeah, I get it. Still seems a stretch to fault git for "failing" to optimize for that inefficient-by-design use case though.

MaulingMonkey · on Jan 9, 2020

To be fair, I mostly don't fault git for failing to optimize that far, even if there are alternatives that do. That's far enough outside the core use case for myself and those I know that I'd be willing to sacrifice it for other, more important considerations.

But I'm totally willing to fault git for failing to optimize enough to handle the manual commit cadence of source game assets though. Because that's not just a tertiary use case - frequently for coworkers it's their primary use case. The end result is I mostly only use git for personal hobby stuff, where it's a secondary use case and my assets are sufficiently small as to not cause problems.

chrisweekly · on Jan 11, 2020

Right on. Thanks for clarifying -- and for confirming a legitimate complaint based on real-world, personal experience.

smcameron · on Jan 7, 2020

> You don't really want to record a full 10MB of data every time you change one pixel in your texture or one blip in your sound, right?

Ideally, yes, why wouldn't I? I want to capture the exact state of the thing at each change.

taftster · on Jan 8, 2020

I kind of phrased that poorly. I should have added the context of "in git". Saving a new 10MB file every time you change it, as per my original premise, is not something that git was really designed for. It's asking a screwdriver to do the work of a hammer.

I totally get the use case of saving each iteration of that 10MB file _somewhere_. But expecting git to do that job is not the right level of expectation, was my main point.

When I have worked with binaries like that described, I will place a URI reference to a file hash and have something that knows how to resolve it. A file store (think S3 or whatever) that has files named: texture1.dat-[sha1] and change the reference to the file in the source. e.g. a "poor man's" version control by way of file naming conventions. Does this approach work in your world?

DaiPlusPlus · on Jan 7, 2020

Aren’t game studios and other creative studios meant to use “asset management” systems instead for their large binaries?

Diffing a PSD as a binary is impossible - whereas proper asset management tools will deconstruct the PSD’s format to make for a human-readable diff (e.g. added/removed layers, properties, etc).

lodi · on Jan 7, 2020

Separate version control for code vs assets leads to a world of pain. Also you can use whatever diff tool you want; doesn't have to be the built-in textual diff.

rustybolt · on Jan 7, 2020

Jup, I experienced this too (not at a game studio though, and the team I worked with wasn't nearly experienced enough to write a layer over git). When we switched to a new version of the git gui it would stop working because when you click through the GUI, it would perform some git operations that were supposed to run fast. I filed an issue that quickly got shot down with 'wontfix, your repo is too large and git is not for binary files'.

AndriyKunitsyn · on Jan 7, 2020

The usual technique of game studios is using Perforce. It is clunky and sometimes straight up infuriating, but it handles large files well.

erikbye · on Jan 7, 2020

Why did you not use Plastic SCM or Perforce?

1auralynn · on Jan 8, 2020

I can't recommend Plastic enough - it's so fast and a great UI but super powerful. I've been using it at my company for years.

onion2k · on Jan 7, 2020

Has any studio open sourced what they've built? Or turned it in to a product? It seems there could be an opportunity to do something before git solves the problems for that use case.

zeofig · on Jan 7, 2020

Yeah, git with binary files is not fun. Have you tried git annex?

j88439h84 · on Jan 7, 2020

What does your layer do?

MaulingMonkey · on Jan 7, 2020

> Cases where git needs to scale tend to be isolated to companies that have the manpower to build a finely-tuned replacement (see: MS, Google).

I constantly run into git scalability issues as an individual. I don't use any of the UI clients because they all fail hard on mostly-code git repositories. I abandoned my VisualRust port in part because the mere 100MB of mingw binaries involved for that meant it was using github LFS, which meant CI was hitting github quota limits, and as I wasn't part of the organization - nevermind an admin with billing rights - I couldn't even pay to up said quota limits paying out of pocket myself even if I wanted to.

I'm not going to inflict git's command line experience - which confounds and confuses even seasoned programmers - on any of the less technical artists that might be employed at a typical gamedev shop, even if git might be able to scale acceptably if locally self-hosted at a single-digit employee shop.

A few dozen or hundred employees? Forget it. Use perforce, even though it costs $$$, is far from perfect, and also has plenty of scaling issues eventually.

pja · on Jan 7, 2020

The fact that you had a problem with github quotas isn’t really a problem with git though, is it?

MaulingMonkey · on Jan 7, 2020

The whole reason git lfs exists is to workaround git scalability problems. Its raison d'etre is problems with git.

That one of - if not the - most popular tool to solve said git scalability problems, also has scalability problems in practice, is both ironic - and absolutely a problem with the git ecosystem. To be pithy - "Even the workarounds don't work."

"Technically", you might say, "that specific symptom with git lfs, and that service provider, isn't the fault of git the command line tool, nor the git protocol". And you would be technically correct - which is the best kind of correct.

But I don't think we're referring to either of those particularly specific things with "Git" when we ask the article's question of "Is Git Irreplacable?". I'm already the weirdo for using git the command line tool - most of my peers use alternative git UI clients, and I don't mean gitk. The git protocol is routinely eschewed in favor of zips or tarballs over HTTPS, Dropbox, Sneakernet, you name it - and is invisible enough to not be worth complaining about to pretty much every developer who isn't actively working on the backend of a git client or server. Not to mention it's been extended/replaced with incremental improvements over the years already.

So I'm using a slightly broader definition of "git", inclusive of the wider ecosystem, that allows me to credit it for the alternative UI clients that do exist, rather than laughing off the question at face value - as something that has already been replaced.

pja · on Jan 7, 2020

Nothing about your problems had anything to do with git & everything to do with the commercial service you were using for your source code hosting.

Github the company is not interested in providing you (or anyone else) with free storage for arbitrary data. You were unable to pay for the storage options they do provide because you did not have admin rights to the github account you wanted to work with.

None of this is a problem with git, be it GUI git clients or command line ones.

This isn’t just "technically correct". It’s the "a commercial company doesn’t have to provide you with a service if they don’t want to" kind of correct.

MaulingMonkey · on Jan 7, 2020

> Nothing about your problems had anything to do with git & everything to do with the commercial service you were using for your source code hosting.

All the commercial service providers recommend keeping total repository sizes <1GB or so, and I hear nothing but performance complaints and how much they miss perforce from those who foolishly exceed those limits, even when self hosting on solid hardware - which is 100% the fault, or at least limitation, of git - I believe you'll agree.

LFS is a suggested alternative by several commercial service providers, not just one, and seems to be one of the least horrible options with git. You're certainly not suggesting any better alternatives, and I really wish you would, because I would love for them to exist. This results in a second auth system on top of my regular git credentials, recentralization that defeats most of the point of using a DVCS in the first place, and requires a second set of parallel commands to learn, use, and remember. I got tired enough of explaining to others why you have a broken checkout when you clone an LFS repository before installing the LFS extension, that I wrote a FAQ entry somewhere that I could link people. If you don't think these are problems with "git", we must simply agree to disagree, for there will be no reconciling of viewpoints.

When I first hit the quota limits, I tried to setup caching. Failing that, I tried setting up a second LFS server and having CI pull blobs from that first when pulling simple incremental commits not touching said blobs. Details escape me this long after the fact - I might've tried to redirect LFS queries to gitlab? After a couple hours of failing to get anywhere with either despite combing through the docs and trying things that looked like they should've worked, then I tried to pay github more money - on top of my existing monthly subscription - as an ugly business-level kludge to solve a technical issue of using more bandwidth than should really have been necessary. When that too failed... now you want to pin the whole problem on github? I must disagree. We can't pin it on the CI provider either - I had trouble convincing git to use an alternative LFS server for globs when fetching upstream, even when testing locally.

I've tried gitlab. I've got a bitbucket account and plenty of tales of people trying to scale git on that. I've even got some Microsoft hosted git repositories somewhere. None of them magically scale well. In fact, so far in my experience, github has scaled the least poorly.

> Github the company is not interested in providing you (or anyone else) with free storage for arbitrary data.

I pay github, and tried to pay github more, and still had trouble. Dispense with this "free storage" strawman.

> You were unable to pay for the storage options they do provide because you did not have admin rights to the github account you wanted to work with.

To be clear - I was also unable to pay to increase LFS storage on my fork, because they still counted against the original repository. Is this specific workaround for a workaround for a workaround failing, github's fault? Yes. When git and git lfs both failed to solve the problem, github also failed to solve the problem. Don't overgeneralize the one ancedote of a failed github-specific solution, from a whole list of git problems, to being the whole problem and answer and it all being github's fault.

> None of this is a problem with git, be it GUI git clients or command line ones.

My git gui complaints are a separate issue, which I apparently shouldn't merely summarize for this discussion.

Clone https://github.com/rust-lang/rust and run your git GUI client of choice on it. git and gitk (ugly, buggy, and featureless though it may be) handle it OK. Source Tree hangs/pauses frequently enough I uninstalled, but not so frequently as to be completely unusable. I think I tried a half dozen other git UI clients, and they all repeatedly hung or showed progress bars for minutes at a time, without ever settling down, when doing basic local use involving local branches and local commits - not interacting with a remote. Presumably due to insufficient lazy evaluation or insufficient caching. And these problems were not unique to that repository either, and occured on decent machines with an SSD for the git UI install and the clone. These performance problems are 100% on those git gui clients. Right?

> This isn’t just "technically correct".

Then please share how to simply scale git in practice. Answers that include spending money are welcome. I haven't figured it out, and neither has anyone I know. You can awkwardly half-ass it by making a mess with git lfs. Or git annex. Or maybe the third party git lfs dropbox or git bittorrent stuff, if you're willing to install more unverified unreviewed never upstreamed random executables off the internet to maybe solve your problems. I remember using bittorrent over a decade ago for gigs/day of bandwidth, back when I had much less of it to spare.

> It’s the "a commercial company doesn’t have to provide you with a service if they don’t want to" kind of correct.

If it were one company not providing a specific commercial offering to solve a problem you'd have a point. No companies offering to solve my problem for git to my satisfaction, despite a few offering it for perforce, is what I'd call a git ecosystem problem.

pja · on Jan 8, 2020

No one is saying git doesn’t have problems. It's just weird that you keep on conflating issues with Github with issues with git.

MaulingMonkey · on Jan 9, 2020

I'm conflating at most one github specific issue (singular), not "issues". And I'm doing so because it's at best a subproblem of a subproblem of a subproblem.

If my computer caught fire and exploded due to poor electrical design, you wouldn't say "nothing about your problems had anything to do with your computer and everything to do with the specific company that provided your pencils" when in my growing list of fustrations I offhandedly mentioned breaking a pencil tip after resorting to that, what with the whole computer being unavailable and all. That would be weird.

Even if we did hyper focus on that pencil - pretty much every pencil manufacturer is giving me roughly the same product, and the fundamental problem of "pencils break if you grip them too hard" isn't company specific. It's more of a general problem with pencils.

Github gave me a hard quota error. Maybe Gitlab would just 500 on me, or soft throttle me to heck to the point where CI times out. Maybe Bitbucket's anti-abuse measures would have taken action and I'd have been required to contact customer support to explain and apologize to get unbanned. git lfs's fundamental problem of being difficult to configure to scale via caching or distribute via mirroring isn't company specific. It's more of a general problem with git lfs. Caching and mirroring are strategies nearly as old as the internet for distribution - git lfs should be better about using them.

It would've turned github's hard quota error into a non-event, non-issue, non-problem - just like they are with core git. Alternatively, core git should be better about scaling. Or, as a distant third alternative, I could suggest a business solution to a technical problem - GitHub should be better about letting me pay them to waste their bandwidth. Then I could workaround git's poor scaling for a little bit more, for a bit longer.

icebraining · on Jan 7, 2020

Not waiting to provide you with free storage is not a "scalability problem". I can't spend company money on Perforce either, is that a Perforce problem?

MaulingMonkey · on Jan 7, 2020

I pay for a github subscription. I set out to pay more for a github quota bump, but found I was limited by upstream's LFS quota rather than my fork's LFS quota.

signal11 · on Jan 7, 2020

> Cases where git needs to scale tend to be isolated to companies that have the manpower to build a finely-tuned replacement (see: MS, Google).

I recollect that for Windows (which also uses git), MS have actually extended git with "Git Virtual File System" rather than replace it[1]. But I do agree that broadly, not everyone needs the scale.

[1] https://devblogs.microsoft.com/bharry/the-largest-git-repo-o...

rtpg · on Jan 7, 2020

Scaling isn't even just about number of files or size of them. A problem I've hit is just in having cross-repo stuff work well. Monorepos are helpful partly because git submodules are not ideal for good workflows, and splitting stuff across multiple git repos can backfire (it doesn't help that almost all the tooling around CI and the like is repo-based instead of project based).

I would love a layer over Git to handle workflow issues related to multi-repo projects

nsomaru · on Jan 7, 2020

Sub modules have terrible UX and are completely counterintuitive. Subtrees work well for my small repos when I need to vendor something in

nopurpose · on Jan 7, 2020

There is also a git subrepo from subtree author, worth checking out IMHO

chme · on Jan 7, 2020

> I disagree that Git's biggest flaw is its lack of scalability. Cases where git needs to scale tend to be isolated to companies that have the manpower to build a finely-tuned replacement (see: MS, Google).

I would say that the sole thing git was developed for, the Linux Kernel, is (starting to be) painful to work with when using git.

taftster · on Jan 7, 2020

The Linux Kernel is big, but it's not likely as big (in terms of lines of code or pick your metric) as Google or Microsoft repositories. Maybe the kernel is just starting to feel that pain?

Honestly asking.. Do you speak from some level of authority that the Linux kernel is stretching the boundaries of git? Or are you just saying that more speculatively? What is the painful part?

jeltz · on Jan 7, 2020

Maybe I am a weirdo but I have always thought that git's UI is very intuitive (with some exceptions like sub modules). SVN on the other hand was an unintutive mess where I had to look up commands all the time.

jMyles · on Jan 7, 2020

Agreed - I had been looking around this thread like, "<slow blink> - surely I'm not the only one that finds git to be a rewarding exercise in teamwork?"

jariel · on Jan 7, 2020

Fully agree.

The magic sweet spot might be the fact that most projects to not need to be distributed. This is where a lot of complexity is derived.

So without all those extra concerns - and - a more elegant UI framework (i.e. rational commands) - and possibly something that scales a little better. That's enough mojo to unseat git for a lot of things.

james_s_tayler · on Jan 7, 2020

I'd say the number of git repos on Earth that would encounter problems of that nature would be a vanishingly microscopic minority. Sure, it's a problem for those companies but it's not a problem for anyone else.

hamandcheese · on Jan 7, 2020

Vanishingly small in number, but quite significant in terms of the number of developers working in them.

Polylactic_acid · on Jan 7, 2020

All of the organizations that have outgrown git will have such incredibly specific requirements meaning nothing but a custom built tool will work for them.

rumanator · on Jan 7, 2020

The so called problem would also vanish if the monorepo was modularized and broken up into smaller repos.

adrianN · on Jan 7, 2020

That's probably a harder task for an existing monorepo that is too big for git than writing a replacement for git that works with a repo of that size.

rumanator · on Jan 7, 2020

The problem boils down to refactoring a large monolith. I feel like Git is a scapegoat for a much larger problem.

neerajsi · on Jan 7, 2020

Let's say you started with a well factored set of code that is managed within your organization. What advantage is there to having multiple repos if you're not limited by your tools? Refactoring is easier within a single repo...

adrianN · on Jan 7, 2020

In my experience, code doesn't stay well factored unless there are technical hurdles that keep it so. That of course doesn't have to be a repo boundary, but in can be.

neoeldex · on Jan 7, 2020

There probably will be a plethora of different hard issues to fix in such situations. It's also easier to institute change in a dictatorship as opposed to a democracy (being a dictator that is :).

alexhutcheson · on Jan 7, 2020

Objectively not true. Many (most?) are just using Perforce in an off-the-shelf configuration.

TheFuntastic · on Jan 7, 2020

This reads to me as a failure of imagination. Any mid size game development shop is going to feel this pain - not just giants like Microsoft and Google. I believe the Unity Game Engine has a user base in the millions? Even a subset of that may be small in comparison to the entire developer population but by no means microscopic.

yaih · on Jan 7, 2020

"Barney Oliver was a good man. He wrote a letter one time to the IEEE. At that time the official shelf space at Bell Labs was so much and the height of the IEEE Proceedings at that time was larger; and since you couldn't change the size of the official shelf space he wrote this letter to the IEEE Publication person saying, since so many IEEE members were at Bell Labs and since the official space was so high the journal size should be changed."

- http://www.paulgraham.com/hamming.html

comex · on Jan 7, 2020

What is the analogy here? The first guess that came to my mind was the monorepo versus multi-repo debate: since Git can only support repos that are so large (shelf space) without getting slow, you should split up your repos (journal size) even if semantically you would prefer a monorepo. But that would support the point I was making, whereas the obliqueness of your reply makes me think you probably meant to criticize it.

ric2b · on Jan 8, 2020

I think he's comparing the journal to the tool (harder to change, impacts everyone) and the shelf to the problem that only impacts a few organizations but actually a lot of people because those organizations are so large.

comex · on Jan 9, 2020

I guess that makes sense. But if that's the analogy, there's a significant difference between the situations. In that example, there was nothing inherently wrong with the journal's size, other than it not matching Bell Labs' arbitrary choice of shelf layout. Git, on the other hand, would be inherently a better tool if it had better performance on large repos (without sacrificing its suitability for small repos).

kccqzy · on Jan 7, 2020

Mercurial is probably that competitor. Only slightly slower than Git. Works on very large monorepos (as large as Facebook's or Google's monorepo). Very similar workflow as compared to Git, with some minor differences in terminology.

reissbaker · on Jan 7, 2020

As an FB employee, I use hg regularly (because it is required). I would not use it as a git replacement for non-FB-sized repos. It has some weird design choices (e.g. branching is bad), and it very often requires manual intervention for merges that git performs correctly and automatically.

You can get around branching-is-bad by changing your workflows a bit, but you can't get around the bad merges: over time it's like death by a thousand papercuts.

GuB-42 · on Jan 8, 2020

What is so bad about mercurial branching? The underlying structure is the same as git: a directed acyclic graph, the only real difference is how branches are named.

Mercurial has 3 ways of doing branching:

- bookmarks: these are like git branches, a pointer to a revision

- branches: when you are in a branch, all commits are permanently affixed with that branch name. Less flexible than bookmarks (and therefore git branches) but good for traceability

- heads: unlike with git, a branch name can refer to several actual branches, it usually happens when you are pulling from a central repository, but you can create them yourself if you need some kind of anonymous branching. These can be pushed but it is not recommended.

Git only has the first option.

The way central repository are managed is also a bit different even if the fundamentals are the same. Git has the "origin" namespace to distinguish remote branches from local branches. Mercurial uses a "phase" which can be "public" (in remote), "private" (local only, will become "public" after a push) and "secret" (like "private", but will not be pushed and therefore will not become "public"). So if you are not synchronized with the remote, in git you will have two branches: origin/my_branch and my_branch, in mercurial, you will have two branches named my_branch, one public, one private. That's essentially the same thing, presented differently.

In the end, they are fundamentally the same. The feel is different though. Git is flexible, and gives you plenty of tools to keep things nice and clean when working with large, distributed project. As expected for something designed for the Linux kernel. Mercurial focuses on preserving history, including the history of your mistakes, and I feel it is better suited for managed teams than a loosely connected community.

reissbaker · on Jan 21, 2020

Specifically I meant that branching is less flexible. Bookmarks are better than Mercurial branches (and at FB it's what we use instead), but even with bookmarks there are gotchas compared to git. For example:

* Pushing them is (slightly) more annoying than pushing git branches — you need a separate command, whereas `git push` just does the right thing by default * Deleting them doesn't delete the corresponding commits * There is only one global namespace shared across all remotes

hvidgaard · on Jan 7, 2020

What kind of merges does git handle that hg doesn't? If it's just a matter of figuring what goes where, someone that uses hg daily could copy the implementation from hg. It could be a big organization that uses daily for instance.

kccqzy · on Jan 7, 2020

Paper cuts can be addressed with more users reporting bugs and contributing fixes. The fundamental design issues with git that prevent scalability cannot.

As a Google employee I use hg every day, even though it's not required. (Some teams at Google do mandate its use, but these are few and far between.) I don't use branches, but I use bookmarks. I didn't notice any merges that really ought to be performed automatically but were not; in any case I use Meld to resolve merge conflicts and it's easy enough to do occasionally.

reissbaker · on Jan 21, 2020

For most people, avoiding thousands of papercuts is better than scaling massively. Few people need massive scale, but everyone hates papercuts.

I'm also not certain that the "fundamental design issues" with git are truly fundamental to its design. For example, partial clones and sparse checkouts are seeing increasing support in recent versions of git — and those are really all you need.

muxator · on Jan 7, 2020

You can always strip a bad committed merge, abort a bad uncommitted one, and perform it again (maybe with different tooling).

Normally mercurial stops when there are conflicts it cannot resolve reliably. In that cases, have a try at kdiff3: it handles hairy merges quite well. In a lot of cases even automatically (and correctly).

There is always meld, but I'd say kdiff3 is superior wrt merge conflict resolution.

anton_gogolev · on Jan 7, 2020

> ...weird design choices (e.g. branching is bad)

What bothers you in particular?

tomesch1982 · on Jan 7, 2020

The branching in hg is actually way better than with git. The reason you are confused is probably because you learned the wrong (git) way of branches.

If you want the wacky and unreliable git branching you can use hg bookmarks.

tomesch1982 · on Jan 15, 2020

For example: https://stackoverflow.com/questions/36358265/when-does-git-r...

This page has been viewed 230 thousand (!!) times. Because git is so easy and elegant that it lies to you what branches exist on the remote.

It is not even funny any more how bad this is.

brachi · on Jan 7, 2020

> Only slightly slower than Git

That's interesting. In your examples isn't it fast because monorepos are network-based, as in, you only fetch what you need when you need it?

Also reminded me of discussions around CPython's startup time and how one use case where milliseconds matter is in small cli utilities such as Mercurial.

kccqzy · on Jan 8, 2020

What I mean is daily operations on the repo like viewing a diff, committing, amending, checking out a different commit, etc. Without doing precise measurements, I would tend to think that it's mostly caused by the slowness of CPython, as compared to a C executable (Git).

The entire repo is stored on a networked file system. So essentially every file operation is remote. That doesn't actually contribute to much slowness because when I didn't use hg, operations were noticeably faster.

mikl · on Jan 7, 2020

Git scales well enough for almost everyone (especially if you have a little discipline with what you put in the repo).

It’s only huge megacorps that need larger scale things like GVFS.

As for large files, that is not what Git is for. Git is for source code. Much like how you don’t put large files in your RDBMS, you should not be putting them in your SCM either.

int_19h · on Jan 7, 2020

What if you need to version them? Git imposes a very specific versioning model: version is a property of the entire repository. Thus, not including some file in the repo implies that it's not versioned in the same manner. It's not just a function of binary vs source.

mikl · on Jan 8, 2020

Versioning big binary blobs is not what Git was designed for. It’ll do fine with smaller assets like icons and the like, but its data model is based on everyone using the repo having a local copy of the full repository history. You can’t easily purge old data. That scales poorly if you want to use it for audio/video files or other large data sets.

You can still do it if you want, but you might be better served using https://git-lfs.github.com/ or using another system designed for that purpose.

P_I_Staker · on Jan 8, 2020

Honestly, you can use Git for large files with lfs. I wouldn't say I love this approach, but it isn't that bad now. You do have to make room for yet-another-tool, and you now have centralized version control comingling with your distributed tool (essentially making it central); but you can still use everything you love about git, and if your lfs doesn't change, you don't need to be connected to a server. It certainly feels pretty absurd. This isn't even a problem in SVN, but now we're tacking on another tool that you have to learn, and introduces issues.

squiggleblaz · on Jan 7, 2020

The only way I've ever successfully `git clone`d my work repo is from another locally connected device. Even with shallow and then gradually unshallowing it, it will not generally complete before the internet falls over.

Nowadays, a new computer means a git clone (or just plain copy-paste) of a USB stick from the old one. This seems like it's a single feature of git that could be written, but if you told me "there's something that works better for large, twenty year old repos", I'd probably take that.

I don't know how Linux survives, but maybe it's just that you only rarely git clone your large repos. (Or maybe it's just that intercontinental internet is less reliable than intracontinental, so that if you're in the US it's a non issue.)

orf · on Jan 7, 2020

I'm guessing your work repo is has lots of large binary files in it's history?

dmitshur · on Jan 9, 2020

> Google's `repo` command is a wrapper around Git that treats a set of smaller repos like one big one, but it's a (very) leaky abstraction.

Could you please provide a link to it? I’m very interested in seeing this command, but ironically it’s not a name that’s easy to google for.

Edit: I was very wrong, searching for “google repo command” displayed https://gerrit.googlesource.com/git-repo as the very first result.

qznc · on Jan 7, 2020

I believe there will be no scalable open-source VCS because the incentives are not there. While the technical problem is interesting, I decided not to work on it because of this. http://beza1e1.tuxen.de/monorepo_vcs.html

tasuki · on Jan 7, 2020

> I worry that Git might be the last mass-market DVCS within my lifetime.

The possibility of git being the last mass-market DVCS within my lifetime leaves me with warm fuzzy feelings. Git is simple and elegant, though its interface might not be.

alkonaut · on Jan 7, 2020

I think it's simple and elegant as a data structure, when what people need and want is something that is (at least also) simple and elegant in its UX and most importantly VERY simple and elegant for the 80/20 use cases.

For example a typical question on Stackoverflow is "How do I answer which branch this branch was created from", always has 10 smug answers saying "You can't because git doesn't really track that, branches are references to commits, and what about a) a detatched head? b) what if you based it off an intermediate branch and that branch is deleted? c) what if...

5 more answers go on to say "just use this alias!" [answer continues with a 200 character zsh alias that anyone on windows, the most common desktop OS, has no idea what to do with].

I don't want to write aliases. I usually don't want to consider the edge cases. If I have 2 long lived branches version-1.0 and master. I want to know whether my feature branch is based on master or version-1.0 and it's an absolute shitshow. Yes it's possible, but is it simple? Is it elegant? No.

The 80/20 (or 99/1) use case is

- centralized workflow.

- "blessed" branches like master and long lived feature branches that should ALWAYS show up as more important in hisory graphs.

- short lived branches like feature branches that should always show up as side tracks in history graphs.

Try to explain to an svn user why the git history for master looks like a zigzag spiderweb just because you merged a few times between master and a few feature branches. Not a single tool I know does a nice straight (svn style swimlane) history graph because it doesn't consider branch importance, when it should be pretty simple to implement simply by configuring what set of branches are "important".

Ntrails · on Jan 7, 2020

As a very basic git user, about once a month my local git repository will get into a state I cannot fix. I cannot revert, cannot reset, cannot make it just fucking be the same as origin/master. Usually I accidentally committed to local master and then did a couple other things and it's just easier to blat and re-clone than work out how to resolve.

Git is hard for idiots imo, and there are a lot of us

rjsw · on Jan 7, 2020

The workflow that I have found works best is to just not change anything myself, I let other people do all the work. This way I can be sure that I won't get merge conflicts that I can't fix.

vvillena · on Jan 7, 2020

> Usually I accidentally committed to local master and then did a couple other things

Create a new branch and check it out while you are on the last commit (git checkout -b my-branch), delete the master branch (git branch -D master), and pull it again (git pull -u origin master). You'll end up with a local branch with a bunch of commits that you can merge, rebase or cherrypick, depending on what you want.

If you want to learn more about git in a practical way, there's an awesome book called Git Recipes.

brigandish · on Jan 7, 2020

That is not easier than "blat and re-clone" so I think you're proving their point.

vvillena · on Jan 7, 2020

Recloning means redoing your work on top of a potentially different codebase. The approach I described is how you "blat and reclone" using git instead of the filesystem, and it has the clear advantage of keeping everything in the same repo. You can then mix all the code together in whatever way you prefer.

Git is a very flexible tool that allows for individual local workflows independent of how teams collaborate. Finding a personal workflow that works for you is a little investment that pays huge dividends for a long time. Git is a 15 year old tool that is expected to live for 10-30 years more at the very least. I encourage everyone to learn enough Git to not be afraid of it.

munmaek · on Jan 7, 2020

If you’re just going to blat and reclone, then you may as well use a folder system instead of git.

I see no reason git needs to be changed in order to cater to people who refuse to read basic documentation or learn from their mistakes.

mannykannot · on Jan 7, 2020

I did not read brigandish's comment as advocacy of 'blat and reclone', I took it to be a comment on git's awkwardness.

I have considerable sympathy for the RTFM reply, but I do not think it is the last word that shuts down any question of usability. What seems clear to me is that there are a lot of people using git who probably should not be. In many cases, they do not have a choice, but I also suspect that many of the organizations that have chosen git do not have the issues that it is optimized for.

brigandish · on Jan 7, 2020

> I see no reason git needs to be changed in order to cater to people who refuse to read basic documentation or learn from their mistakes.

In my opinion, solving problems and making improvements involves reducing complexity, not defending it. Many people, including myself, have read the Git docs and learnt about the underlying data structures etc etc and still we can make the claim that it could be better, in numerous ways.

Calling everyone feckless won't invalidate that.

munmaek · on Jan 7, 2020

> we can make the claim that it could be better

I’m not disputing this. Of course git isn’t perfect.

What I’m against is changing git to cater to people who can’t read the manual and make basic mistakes.

antris · on Jan 10, 2020

> What I’m against is changing git to cater to people who can’t read the manual and make basic mistakes.

Why? Isn't software that doesn't require reading a manual and doesn't let the user make irreversible mistakes considered good design?

munmaek · on Jan 10, 2020

Not if it means reducing capabilities of the program in order to add bumper guards.

I can’t think of any software that handles a complex program that doesn’t have a manual, documentation like a manual, or a learning curve. Git is a tool for developers, not casual users who want typical apps.

Again, you wouldn’t make an argument like this for a tool used by a plumber or a mechanic. If a tool succinctly handles a problem, good! But using tools is part of the profession; they have learning curves.

Most issues with git are PEBKAC issues because people refuse to spend 10 minutes of their life reading about a tool they may use for hundreds or thousands of hours. I wouldn’t want to cater to those kinds of people.

antris · on Jan 10, 2020

Software can cater to multiple types of uses at the same time. You can have a learn-as-you-go experience while keeping your powerful tools that enable more fine-tuned or complex tasks. Easy-to-use vs. powerful is a false dichotomy.

About the plumbing/mechanic analogy, I totally would make the same case! Hammers and wrenches don't require a manual and can be used for very complex tasks, and that's exactly what makes them so well designed and popular. Few people want their hammer to have more features, and if they do, they still want to keep the good old hammer ready, because it's so easy and simple to use.

Especially calling out PEBKAC (Problem Exists Between Keyboard And Computer) - while even most of the expert git users, including the author himself say the interface could at least be made much better - makes me really suspicious that you simply like feeling superior to other people because you know something they don't, and you don't want to lose your "edge" if suddenly everyone can use version control without resorting to manuals.

munmaek · on Jan 10, 2020

> Easy-to-use vs. powerful is a false dichotomy.

iMovie vs Premiere/Final Cut. Final Cut X vs 7. Garageband vs Pro Tools. Word vs LaTeX. and so on. It's very difficult to design interfaces that are easy enough for average users that don't impede pros/power users.

> hammer

A hammer isn't a good comparison. Something like a multimeter is what I was thinking of, etc. Git solves a significantly more complex problem than either of these, though.

> including the author himself say the interface could at least be made much better

I don't disagree! Git's interface -could- be better. That has nothing to do with my points above with regards to people refusing to read basic literature about the tools they use, expecting them to just magically do everything for them out of the box, "intuitively".

> feeling superior ... you don't want to lose your "edge"

This could not be further from the truth. I simply have no sympathy for people who refuse to read the manual or an intro to using a tool, and then complain about the tool being hard to use. Yeah.. it's hard because you didn't do any reading! Git is actually really easy if you read about the model that it uses. Most people don't need to venture out beyond ~5-6 subcommands, and even then it's easy to learn new subcommands like cherrypick, rebase, etc.

Adobe Photoshop, as another example, has a learning curve, but that tool is indispensable for professionally working on / editing images. (GIMP is also good, but that's not in the scope of this discussion). A lot of beginner issues are basically PEBKAC because they didn't read the manual. Same with Pro Tools, or probably any other software used by industry professionals. They're harder to use but what you can do with them (since they treat you like an adult, instead of holding your hand and limiting you) is incomparable to the output of apps designed for casual users.

brigandish · on Jan 10, 2020

What can Git do that Fossil or Mercurial can't?

Ntrails · on Jan 7, 2020

Deleting master is a thing that would _actually_ never have occurred to me! A neat trick

That being an example that I remember from late last year, there are just sharp edges to git I end up catching myself on :)

vvillena · on Jan 8, 2020

The git "master" branch is just an example of 'convention over configuration': some commands use it as the default argument, just like "origin" is the default remote name. Nothing in git is special or sacred! :)

raducu · on Jan 7, 2020

Is there a reason for -D and not -d? Wouldn't -D also delete the remote branch if you accidentally pushed your changes?

vvillena · on Jan 7, 2020

"git branch -d" doesn't remove a branch if that means losing track of some local commits. "-D" doesn't check that. The "git branch" commands only operate on your local repo, they don't push any changes to remote repos and they don't pull commits from anywhere.

WorldMaker · on Jan 7, 2020

`-D` doesn't involve the remote at all (unless you are using something like the intentional remote:branch syntax which this example isn't). It is a force delete in that if there were commits locally in that branch and only in that branch it should still delete that branch. It should be unlikely you need that force because the first step was to branch everything as is, so it is safer to just use `-d`, but if the intention is to "blat" it from orbit anyway, `-D` is that.

DaiPlusPlus · on Jan 7, 2020

I wish git had a “metahistory” feature to allow everyone to undo anything. A `git revert` isn’t of any help when you’ve already merged and pushed.

jerf · on Jan 7, 2020

It's called "git reflog": https://www.edureka.co/blog/git-reflog/

Though you do have to have committed. One of the things I hammer on in my tutorials for work is that if you get confused in git, make sure you commit. If you commit, you can take your problem to the other engineers and we can almost certainly get you straightened away. Fail to commit, though, and you really may lose something.

Also, metapoint about git: While I won't deny its UI carries along some dubious decisions carried over from the very first design, in 2020, basically, if you thing "Git really ought to be able to do [this sensible thing]", it can. It has that characteristic that open source software that has been worked on by a ton of contributors has, which is that almost anything you could want to do was probably encountered by somebody else and solved five years ago. It just may take some searching around to figure out what that is. (And on the flip side, when you read the git man pages and are going "Why the hell is that in there?", the answer may well be "a problem that you're going to have in six months".)

jorvi · on Jan 7, 2020

> make sure you commit

This is not absolute gospel. If you screw up a rebase and commit, whatever you removed in the rebase is simply gone.

jerf · on Jan 7, 2020

Are you saying that with knowledge of what "git reflog" is? I suspect not. I'd really need to see a sequence of commands that removes committed state from the repo to buy this. If you try to produce it, bear in mind the first thing I'm going to do is run "git reflog" on the result, so if you find your committed state is still there, then I'm going to say it's still saved.

(That's not a git thing. I don't really even want some sort of hypothetical source control system that literally tracks every change I make. It's technically conceivable and should be practical to what would at least be considered a "medium sized" project in the git world, but I'd just be buried in the literally thousands of "commits" I'd be producing an hour. Failing that sort of feature, a source control system can't help but "lose" things not actually put into it.)

mannykannot · on Jan 7, 2020

As I am not familiar with the details of reflog (I don't recall ever using it) I took a look at the article. I wasn't long until I reached what looks like a caveat: "This command has to be executed in the repository that had the lost branch. If you consider the remote repository situation, then you have to execute the reflog command on the developer’s machine who had the branch." Joe, who works on another continent, quit last week and his computer was a VM...

OK, so we have backups of his VM and we can recreate a clone of it, but will that be satisfactory? Are there any issues with hardware MAC addresses or CPU ids? How far down the rabbit hole of git minutiae do you have to go before you are confident that you can do all basic source-control operations safely?

jerf · on Jan 7, 2020

No source control system can solve the problem of not having things it was never given.

gallier2 · on Jan 7, 2020

git log --reflog --all

is the more important imo than

git reflog

RichardCA · on Jan 7, 2020

The main thing that people fail to understand is that commits are immutable and the overall commit graph is immutable (with the caveat that pathways in the graph that don't end in a branch head are subject to garbage collection).

A rebase does not destroy information. It creates new commits and moves the branch head to a different spot on the graph.

The reason git is seen as painful is because you can't claim expertise until you develop the ability to form a mental map of the graph. But once you do this the lights turn on and everything starts to make sense.

This is why the mantra "commit early and often" still holds. The more experienced git user will tell the newer people this, so when they come with a mess it will always be recoverable.

fulafel · on Jan 7, 2020

That GC a pretty big caveat! Combined with the fact that unreferenced objects are never pushed.

reflog is like undelete in filesystems, it's a probabilistic accident recovery mechanism for an individual computer (repo checkout in this case) that you can try to use if you don't have backups.

vvillena · on Jan 7, 2020

In any case, git garbage collection isn't a common phenomenon. It usually triggers every few weeks, even in repos with high activity. The chance of hitting a GC that deletes an untracked commit you need is extremely small.

chousuke · on Jan 7, 2020

How would it be gone? You can't rebase onto a dirty working directory so you can't blow away uncommitted changes accidentally and any state prior to and during the rebase is always recoverable via the reflog.

It takes trying to do anything that's not easily recoverable as long as you commit before you start messing around and don't rm -rf .git.

rohansingh · on Jan 7, 2020

It's definitely not gone. Just `git checkout ORIG_HEAD` and you'll be back to whatever the tree was before you rebased.

gallier2 · on Jan 7, 2020

NO. Nothing is gone after commit. You just don't see it.

Make a git log --reflog --all and you will see all the commits you made (or rebase made) in the last 3 months.

You can than rescucitate an old branch by simply putting a branch name on it with git branch newname <old sha1>

sergiosgc · on Jan 7, 2020

It won't be gone until you actively prune dangling commits. The commit may not be reachable from the HEADs of existing branches, but go through the reflog and your commits will be there. You can then create a branch to make the commit reachable.

vrsfvwae5tbh · on Jan 7, 2020

Does this not show up in the reflog?

gallier2 · on Jan 7, 2020

You have a branch named <branchname> you do your operation that fouls up your branch. If you check with

git log --reflog --all

you will see with this magical command that git DOESN'T REMOVE any commit. Your old tree is still there, only normally hidden. The commit that was there before you fouled up your branch is still there.

You now only need to set your branch to the old commit. A branch is nothing else than a pointer in the tree.

You have 2 possiblities to change the commit a branch points to

1. git branch --force <branchname> SHA1 (works only if <branchname> is not the current checked-out branch. Simply checking out with the SHA1 works also as it deteches the HEAD).

2. replace the SHA1 in the text file in .git/refs/heads/<branchname> by the SHA1 where you want the branch to point to.

With that, your repo is in the same state it was before your error.

eyegor · on Jan 7, 2020

Harkening back to my early days of git, I have a rough guess as to what you can do to fix that. If all you want is remote master,

  git stash
  git checkout master
  git stash
  git fetch origin
  git reset --hard origin/master

and maybe a 'git rm -r --cached .' in case you have staged files you didn't intend to which stash failed to drop.

munmaek · on Jan 7, 2020

With the amount of information available, there is no excuse:

* https://rogerdudler.github.io/git-guide/

* https://git-scm.com/book/en/v2

Also see stackoverflow.

Git is a complex tool because it’s tackling a complex problem. I don’t see a way of making it “easier” without massively reducing what it can do. It’s like saying we should reduce a formula one car so people can use it without reading up on it, etc.

If something happens once, it happens. If something happens multiple times then it means you’re not evaluating why it occurred in the first place and learning from it. No tool in the world can solve this problem because it’s not a problem with the tool, rather the user.

Git is really not so hard, but it requires a little reading.

metabagel · on Jan 7, 2020

Git isn’t something which you can generally be successful using in a shallow way. Most developers will need to devote significant time and energy to mastering it. There really needs to be a better layer on top of it in order to make it easier for developers to figure out how to do what they want to do. Some of the commands and switches don’t seem to be orthogonal and/or intuitive.

munmaek · on Jan 7, 2020

Git already has layers on top of it like git porcelain or the various GUI tools that attempt to handle things smoothly.

> significant time and energy

All someone needs is to read through https://rogerdudler.github.io/git-guide/, and learn a few commands.

Are we seriously going to refer to “reading the manual” as “significant time and energy”? In this case you don’t even have to read the manual, just a primer on how git works. You know, on how the tool that you’re using works. Why are people so allergic to spending even a modicum of time on learning a tool that massively simplifies their life and makes their work possible?

Do plumbers complain about having to read manuals for the equipment that they use? Electricians?

As programmers our tools are easier to learn and use, yet we complain about having to any work at all.

Why even be a programmer? If reading about git is so hard, what about the rest of the field that doesn’t even have documentation?

How about we don’t make tools that cater to the lowest common denominator, in this case people who basically can’t be assed to do anything? RTFM.

Frondo · on Jan 7, 2020

Because a lot of us have used tools besides git that enable the workflow we need without that complexity, and without the fragility that often necessitates going to stackoverflow or asking on a slack channel.

I have a way of picking the losing side so I've been using mercurial for everything until now, and until now Bitbucket offered hg. They're decommissioning it so I'm moving over to git and I feel like my workflow has been hampered, not just in the immediate complexity of learning the new tool, but in the ongoing complexity of using a less good tool for my needs.

I'm dealing with it, but the situation you're describing isn't really the one that I and a lot of other whiners are dealing with.

josefx · on Jan 7, 2020

> Because a lot of us have used tools besides git that enable the workflow we need without that complexity

I spend ages unfucking local svn working copies and long running branches on both windows and linux. git needs some serious flaws to keep up with that experience.

anoncake · on Jan 7, 2020

> Because a lot of us have used tools besides git that enable the workflow we need

Thankfully the standard DVCS is flexible enough to enable the workflows others need too.

wasdfff · on Jan 7, 2020

A lot of people get by just staging and pushing/pulling commits, myself included. That’s 3 commands, 4 if you count git status. You do not need to dig deep to get a lot of use out of git as a basic remote sync.

wyoung2 · on Jan 7, 2020

> Git is a complex tool because it’s tackling a complex problem.

Fossil tackles much the same sort of problem, yet it's far simpler to use.

Most of Git's problems are due to purposeful choices, but they're design choices, not inherent aspects of how a DVCS must behave.

We've laid out our case for the differences here: https://fossil-scm.org/fossil/doc/trunk/www/fossil-v-git.wik...

munmaek · on Jan 7, 2020

> their thing: Sprawling, incoherent, and inefficient

> our thing: Self-contained and efficient

This is not biased in any way and makes me want to continue reading. /s

Also, you can’t claim something to be “efficient” when it’s doing many different things like scm, issues/tickets, a web forum/ui ....

Then you have non-issues like git being installed via a package manager instead of dragging and dropping a binary. Yeah, this is such a huge problem that concerns people, better switch to Better Project (tm).

And then you take Gitlab and conflate Gitlab’s issues with problems with Git. I guess gogs/gitea don’t exist?

This page needs to be rewritten to simply list the differences in neutral language. There are good points but they’re lost in unnecessary epithets like “caused untold grief for git users”. I get it: git bad, our product good. Switch!

—

Personally, I don’t want something that tries to do many different things all at once.

wyoung2 · on Jan 7, 2020

> This is not biased in any way

Of course we're biased, but every row in that table corresponds to a section below where we lay out our argument for the few words up in the table at the top.

Here's the direct link for that particular point:

https://fossil-scm.org/fossil/doc/trunk/www/fossil-v-git.wik...

Now, if you want to debate section 2.2 on its merits, we can get into that.

> you can’t claim something to be “efficient” when it’s doing many different things

We can when all of that is in a single binary that's 4.4 MiB, as mine here is.

A Git installation is much larger, particularly if you count its external dependencies, yet it does less. That's what we mean when we say Git is "inefficient."

But I don't really want to re-hash the argument here. We laid it out for you already, past the point where you stopped reading.

> git being installed via a package manager instead of dragging and dropping a binary. Yeah, this is such a huge problem that concerns people, better switch to Better Project (tm).

It is on Windows, where they had to package 44-ish megs of stuff in order to get Git to run there.

On POSIX platforms, the package manager isn't much help when you want to run your DVCS server in a chroot or jail. The more dependencies there are, the more you have to manually package up yourself.

If your answer to that is "just" install a Docker container or whatever, you're kind of missing the original point. `/home/repo/bin/fossil` chroots itself and is self-contained within that container. (Modulo a few minor platform details like /dev/null and /dev/urandom.)

> This page needs to be rewritten to simply list the differences in neutral language.

We accept patches, and we have an active discussion forum. Propose alternate language, and we'll consider it.

> unnecessary epithets like “caused untold grief for git users”

You don't have to go searching very hard to find those stories of woe. They're so common XKCD has satirized them. We think the characterizations are justified, but again, if you think they're an over-reach, propose alternate language.

> I don’t want something that tries to do many different things all at once.

Not a GitHub user, then?

Frondo · on Jan 7, 2020

Hey, are you a fossil dev?

I haven't used nor looked at fossil in maybe 5 years, but had a couple of questions.

Does fossil now have any kind of email support built in to the ticket manager? I remember when I tried to use fossil for actual production use, there was no way to trigger emails sent when, e.g. tickets were submitted, and one of the devs said to just write a script to monitor the fossil rss feed and send the appropriate email, which seemed like a baroque and fragile (and time-consuming) solution.

And is any more of the command-line behavior configurable (like the mv/rm behavior -- affecting the file on disk as well as the repository, or just marking the file as (re)moved in the repository)?

wyoung2 · on Jan 7, 2020

> Hey, are you a fossil dev?

I have commit access, yes, but mainly I work on the docs.

> Does fossil now have any kind of email support built in to the ticket manager?

Yes. It was added in support of the forum feature last year, but it also applies to several other event types: https://fossil-scm.org/fossil/doc/trunk/www/alerts.md

> one of the devs said to just write a script to monitor the fossil rss feed

Probably me. :)

> seemed like a baroque and fragile (and time-consuming) solution.

A dozen lines of Perl; easy-peasy. That and a pile of CPAN modules, but that's easily fetched with `cpanm`.

> the mv/rm behavior -- affecting the file on disk as well as the repository

The default you're referring to was changed a few years ago: the old `--hard` option is now the default.

jnurmine · on Jan 7, 2020

By the way, the "one checkout per repository" is not strictly true. You can use "git worktree"; this is a lightweight way to reuse an existing git repository and have each worktree use a different branch. It's a nice feature, and I use it daily.

Also, a comment about the argumentation in "test before commit". It feels a bit artificial wrt. what can be done locally, what git commit and git push do and what their relation is in a sane workflow. Certainly, one can push untested stuff to the remote server by mistake; but, even so, this should be OK, because if one can push directly to important branches like master or similar without going through any reviews and other sanity checks, one has a problem... and the problem isn't really Git :)

wyoung2 · on Jan 7, 2020

> By the way, the "one checkout per repository" is not strictly true.

You must be referring to just the table at the top, not to the detailed argument below, which mentions git-worktree and then points you to a web search that gives a bunch of blog articles, Q&A posts, project issue reports and such talking about the problems that come from using that feature of Git.

I suspect this is because git-worktree is a relatively recent feature of Git (2.5?) so most tutorials aren't written to assume use of it, so most tools don't focus on making it work well, so bugs and weaknesses with it don't get addressed.

Fossil is made to work that way from the start, so you can't run into these problems with Fossil. You'd have to go out of your way to use Fossil in the default Git style, such as by cloning into ~/ckout/.fossil and opening that repo in place.

> test before commit". It feels a bit artificial wrt. what can be done locally, what git commit and git push do and what their relation is in a sane workflow.

That just brings you back to the problems you buy when separating commit from push, which we cover elsewhere in that doc, primarily here: https://www.fossil-scm.org/xfer/doc/trunk/www/fossil-v-git.w...

balfirevic · on Jan 8, 2020

> Fossil is made to work that way from the start, so you can't run into these problems with Fossil. You'd have to go out of your way to use Fossil in the default Git style, such as by cloning into ~/ckout/.fossil and opening that repo in place.

That's unfortunate. Reading the comments here, switch-branch-in-place is seen as some kind of flaw, but I don't think I would voluntarily use a VCS that doesn't let me easily do that (it's most sensible way for me to work, from way before Git was a thing).

wyoung2 · on Jan 8, 2020

> switch-branch-in-place is seen as some kind of flaw

You're conflating two separate concepts:

1. Git's default of commingled repo and and working/checkout directory

2. Switch-in-place workflow encouraged by #1

Fossil doesn't do #1, but that doesn't prevent switch-in-place or even discourage it. The hard separation of repo and checkout in Fossil merely encourages multiple separate long-lived checkouts.

A common example is having one checkout directory for the active development branch (e.g. "trunk" or "master") and one for the latest stable release version of the software. A customer calls while you're working on new features, and their problem doesn't replicate with the development version, so you switch to the release checkout to reproduce the problem they're having against the latest stable code. When the call ends, you "cd -" to get back to work on the development branch, having confirmed that the fix is already done and will appear in the next release.

Another example is having one checkout for a feature development branch you're working on solo and one for the team's main development branch. You start work from the team's working branch, realize you need a feature branch to avoid disturbing the rest of the team, so you check your initial work in on that branch, open a checkout of that new branch in a separate directory and continue work there so you can switch back to the team's working branch with a quick cd if something comes up. Another team member might send you a message about a change needed on the main working branch that you're best suited to handle: you don't want to disturb your personal feature branch with the work by switching that checkout in place to the other branch, so you cd over to the team branch checkout, do the work there, cd back, and probably merge the fix up into your feature branch so you can work with the fix in place there, too.

These are just two common reasons why it can be useful to have multiple long-lived checkouts which you switch among with "cd" rather than invalidate build artifacts multiple times in a workday when switching versions.

Git can give you multiple long-lived working checkouts via git-worktree, but according to the Internets it has several well-known problems. Not being a daily Git user, I'm not able to tell you whether this is still true, just that it apparently has been true up to some point in the past.

Since no one is telling me those issues with git-worktree are all now fixed, it remains a valid point of comparison in the fossil-v-git article.

balfirevic · on Jan 8, 2020

If what you're trying to tell me is that I can use switch-in-place workflow in Fossil as easily as in Git then that's cool. Thumbs up!

Edit: It could be worth it to emphasize this in the Fossil vs. Git comparison that you linked, as it wasn't very clear to me after reading it.

wyoung2 · on Jan 8, 2020

It's updated now, here: https://www.fossil-scm.org/fossil/doc/trunk/www/fossil-v-git...

Thanks for the feedback!

jnurmine · on Jan 7, 2020

> You must be referring to just the table at the top, not to the detailed argument below

Well, not entirely, because in my opinion the detailed argument kind of hand-waves away the entire git worktree. Continuously switching branches inside a single large Git repo is certainly a suboptimal way to work with Git, but most of the time one should be able to avoid that with the worktree (though the worktree stuff is, of course, not a miracle cure for everything).

wyoung2 · on Jan 8, 2020

You’re not addressing the list of problems resulting from the use of this tacked-on feature.

Also, Git continues to be taught with the switch-in-place method by default.

I’m not saying it is impossible to get a Fossil-like workflow with Git, just that there are consequences from that not being the default.

jnurmine · on Jan 20, 2020

Getting a "Fossil-like workflow with Git" is not really the point, is it? One could argue that Fossil does not support "Git like workflow". It is not really a good argument either way.

It is not like Fossil's workflow is a global optimum, it's just something Fossil does well.

u801e · on Jan 7, 2020

> It’s like saying we should reduce a formula one car so people can use it without reading up on it, etc

A far better analogy is:

It's like saying we should reduce a programming language so people can use it without reading up on it, etc.

wyoung2 · on Jan 8, 2020

Sorry, but this is a terrible analogy.

The core problem with it is that very few people can get paid more by being better at using their [D]VCS, whereas those more skilled with their programming language(s) of choice often do get paid more to wield that knowledge.

Consequently, most people do not fully master their version control system to the same level that they do with their programming language, their text editor, etc.

To be specific, there are many more C++ wizards and Vim wizards than there are Git wizards.

In situations like this, I prefer a tool that lets me pick it up quickly, use it easily, and then put it back down again without having to think too much about it.

You see this pattern over and over in software. It is why all OSes now have some sort of Control Panel / Settings app, even if all it does is call down to some low-level tool that modifies a registry setting, XML file, or whatever, which you could edit by hand if you wanted to. These tools exist even for geeky OSes like Linux because driving the OS is usually not the end user's goal, it is to do something productive atop that OS.

[D]VCSes are at this same level of infrastructure: something to use and then get past ASAP, so you can go be productive.

etripe · on Jan 7, 2020

You could always do:

  git reflog

to see when you last checked out master, then:

  git reset <commit hash> --soft

to reset to the commits at that point (but keep your files the same)

  git add . && git stash save

to stash your changes (not 100% sure you actually need the "git add ." part)

  git pull

in master

and then finally:

  git stash pop

and commit again as needed.

wyoung2 · on Jan 7, 2020

> Git is hard for idiots imo, and there are a lot of us

Yes, thus https://xkcd.com/1597/

I find myself saying "git reset --hard origin/" and such with disturbing frequency.

Both are examples of "I give up; it's faster to start over." This is not what I want in a DVCS.

wasdfff · on Jan 7, 2020

Git push/pull -f

Then write a little merge message and you are good to go.

levosmetalo · on Jan 7, 2020

> Git is hard for idiots imo, and there are a lot of us

Then let the idiots end up screwing their own local repo, instead of doing some magic and making it easy to screw up upstream or someone else repo.

dragonwriter · on Jan 7, 2020

> I think it's simple and elegant as a data structure, when what people need and want is something that is (at least also) simple and elegant in its UX and most importantly VERY simple and elegant for the 80/20 use cases.

That's what UIs (whether CLIs or otherwise) for standardized workflows like git-flow are, IMO.

alkonaut · on Jan 7, 2020

It doesn't nearly go all the way there though. Why do people need to use a command line and a gui tool (usually) for git? Because it's fundamentally not written to be used with a GUI. That I think is one of its biggest flaws. Using a GUI with git always feels like you are missing vital information and just trying to poke a cli underneath to do what you want.

Some design decisions also shine through like "no branch is more important than any other branch" which is completely mental considering how people actually use git.

crdoconnor · on Jan 7, 2020

Most of the guis are crap because everybody who builds them thinks that a GUI should just be a more or less a visual representation of the command line.

jerf · on Jan 7, 2020

"Why do people need to use a command line and a gui tool (usually) for git?"

You don't. The reason is that you're using a tool that didn't budget the time to directly work on git data files and it uses the command line under the hood, because that's a hard business case to make for most small tools. This is not fundamental to git; the very top-end git-based tools like Github or Bitbucket all do their own internal, direct implementation of git functionality for this reason. It's not a characteristic of git, it's a characteristic of the GUI tools you're using.

A perfectly sensible one based on perfectly sensible engineering tradeoffs, let me add; no criticism of such tools intended. Git's internals from what I've seen are not particularly difficult to manipulate directly as such things go, but you are simply by the nature of such a thing taking on a lot more responsibility than if you use the command line-based UI.

saurik · on Jan 7, 2020

Another simple tracking thing that git doesn't do that would easily make git much much better is if it tracked when you did a cherrypick of another commit in the commit graph (not just as some kind of metadata comment in the commit message, but as a kind of soft parent); then if you did a rebase, you could actually "reverse engineer" the rebase (if and only if you absolutely needed to, such as to track what happened during a squash, or to automate re-applying the rebase correctly to someone who was tracking one of the prior commits) and it would largely solve the question of "merge or rebase" with "por que no los dos".

adrianmsmith · on Jan 7, 2020

Another thing that Git does not track -

I saw a web designer check in a huge hierarchy of empty directories which would be the structure of the new project that their team should work on. They were quite surprised when it didn't show up on any of the other designer's computers after a "pull". They had to go to the "Git guru" for help.

Windows and Mac both have directories as a major fundamental concept. Everyone knows them and is familiar with them. Subversion tracks directories. Git does not.

ori_b · on Jan 7, 2020

I also can't figure out why it doesn't: An empty tree object should be sufficient to do the job. I actually had to write extra code in git9[1] to avoid accidentally allowing empty directories.

[1] https://github.com/oridb/git9

debaserab2 · on Jan 7, 2020

According to this[1], which I think might be an official FAQ:

> Currently the design of the Git index (staging area) only permits files to be listed, and nobody competent enough to make the change to allow empty directories has cared enough about this situation to remedy it.

[1] https://git.wiki.kernel.org/index.php/GitFaq#Can_I_add_empty...

ori_b · on Jan 7, 2020

Hm. I should see how git behaves when it gets a repository with empty directories. If it doesn't blow up, I may just add support -- it'd be useful for me.

eyegor · on Jan 7, 2020

It kind of does, iff you have a file there. In which case it tracks the path to the file, and then creates the relevant directory structure to get to it.

Of course git is also incredibly painful and brittle if you want "exclude/except" behavior on the gitignore involving subdirectories.

adrianmsmith · on Jan 7, 2020

Another thing Git does not and cannot even attempt to do - file locking.

The assumption behind Git is, everyone develops on their machines and/or branches, and then things are merged. This only works for files which can be merged.

There are plenty of things pretty much any project wants to track which cannot be merged, for example Word documents (documentation), Photoshop files (source of graphics), PNGs (icons in webapps), and so on.

With a centralized system, that's easy, just go over to using file locks ("svn lock") for those files. With a distributed system, that's impossible.

wyoung2 · on Jan 7, 2020

> Another thing Git does not and cannot even attempt to do - file locking.

That's a seriously hard problem for a DVCS if you're serious about the "D".

This topic turned into [the single longest thread in the history of the Fossil forum](https://www.fossil-scm.org/forum/forumpost/2afc32b1ab) because it drags in the CAP theorem and all of the problems people run into when they try to have all three of C, A, and P at the same time.

To the extent that Fossil based projects are usually more centralized than Git ones, Fossil has a better chance of solving this, but I'm still not holding my breath that Fossil will get what a person would naively understand as file locking any time soon.

> Word documents (documentation), Photoshop files (source of graphics), PNGs (icons in webapps), and so on.

You want to avoid putting such things into a VCS anyway, because it [bloats the repo size](https://fossil-scm.org/fossil/doc/trunk/www/image-format-vs-...). I wrote that article in the context of Fossil, but its key result would replicate just as well under Git or anything else that doesn't do some serious magic to avoid the key problem here.

Instead of Word files, check in Markdown or [FODT](https://en.wikipedia.org/wiki/OpenDocument_technical_specifi...). (Flat XML OpenDocument Text.) Or with Fossil, put the doc in the wiki.

Instead of PNG, check in BMP, uncompressed TIFF, etc., then "build" the PNG as part of your app's regular build process.

This has the side benefit that when you later change your mind on the parameters for the final delivered PNGs, you can just adjust the build script, not check in a whole new set of PNGs. My current web app has several such versions: 8-bit paletted versions from back before IE could handle 24-bit PNG, then matted 24-bit PNGs from the days when IE couldn't handle transparency in PNG, and finally the current alpha-blended 24-bit PNGs. It'd have been better if I'd checked in TIFF originals and built deliverable PNGs at each step.

WorldMaker · on Jan 7, 2020

> Instead of Word files, check in Markdown or [FODT]

Another fun option is to unzip the DOCX and check that in, since it is mostly a collection of XML files in a zip container. I built a tool to automate zipping/unzipping files like DOCX years ago as pre-commit/post-checkout/post-merge hooks. [1] It's an interesting way to source control some types of files if you can find a way to deconstruct them into smaller pieces that merge better. Admittedly, merging Office Open XML by hand is not a great experience (and dealing with subtly broken or corrupt internal contents is not fun, because programs like Word can fussy when things are even slightly wrong), but you get better diffs sometimes than you would expect.

[1] https://github.com/WorldMaker/musdex

wyoung2 · on Jan 7, 2020

Yes, I cover that option for Fossil at the end of the pointed-to document. I did it in terms of Makefiles rather than commit hooks, but whichever...

jessermeyer · on Jan 9, 2020

> You want to avoid putting such things into a VCS anyway, because it [bloats the repo size]

How do you suggest projects like games handle this, where data files are naturally linked to source files? Imagine trying to sort out an animation bug when you only have source level tracking and no idea which version of the animation data corresponds to the animation source files of the bug report. These data files are not 'built' from the 'build' step as they are the product of artists.

alkonaut · on Jan 7, 2020

I’d guess not one user in 100.000 uses git decentralized (as in, doesn’t have a blessed “central” repo). It’s the disabling of locking that should be the special case! The big problem with git is that you can’t mark a repo as a master repo/blessed repo (which would be the one where lockfiles are stored). A lot of functionality would be helped if the commands could know which end is the important/central one.

wyoung2 · on Jan 7, 2020

> I’d guess not one user in 100.000 uses git decentralized

I understand your sentiment, but the denominator in that fraction is probably much lower than your guess.

Consider even simple cases like the disconnected laptop case. You may work at a small office with only local employees, and so you have one central "blessed" repo, but if one person locks a file and then goes off to lunch, working on the file while at the restaurant, you still have a CAP problem:

CA: Because the one guy with a laptop went off-network, you have no full quorum, so no one can use the repo at all until he gets back and rejoins the network. (No practical DVCS does this, but it's one of the options, so I list it.)

CP: When the one guy went off to lunch, we lost the ability to interact with his lock, and that will continue to be the case until he gets back from lunch. Also vice versa: if someone still at the office takes out a lock, the guy off at lunch doesn't realize there is lock, so he could do something bad with the "locked" file. (This is the mode DVCSes generally run in by default.)

AP: No locking at all, thus no consistency, thus your original problem that inspired the wish to have file locking.

alkonaut · on Jan 7, 2020

Not sure I understand the problem. If I lock fileX and go to lunch, then I own the lock on that file while I’m out to lunch. It’s basically analogous me pushing the file fileX.lock to the repo next to fileX, with my user id as content. I can only do it if it isn’t there.

Everyone else will only see that lock if they fetch and if they don’t, they might edit their local copy of fileX too, but would be prevented from pushing their version to the blessed repository by the lock. They can push a copy under another name, or wait until I have removed the lock (but probably can’t resolve the conflict anyway because it’s likely a binary document). So they user will remember to never start editing without taking the lock in the future.

It’s not perfect by any stretch of the imagination but it’s all anyone asks for in terms of file locking. It’s what Subversion always did.

wyoung2 · on Jan 8, 2020

> Not sure I understand the problem. If I lock fileX and go to lunch, then I own the lock on that file while I’m out to lunch.

And if you go on vacation for two weeks instead?

alkonaut · on Jan 8, 2020

Same thing obviously. But this is just a method of communication. It’s instead of emailing/shouting “don’t edit the background image today please” across the office.

An admin can remove the lock. Or you can allow force-pushing by anyone to replace it or whatever.

Not sure why this is seen as so complicated, version control systems have done it since forever. It’s not trying to solve some distributed lock system in a clever way. It’s dumb centralized mutex per file. And yet again this is all that’s needed (and it’s also added to git in git-LFS!).

ivanbakel · on Jan 7, 2020

You can set up custom merge drivers for different file types.

More importantly, if you've got stuff in your decentralised repo that shouldn't be decentralised, that's not the fault of the DVCS you're using, it's your fault. That everything looks like a nail does not speak against the value of a hammer.

adrianmsmith · on Jan 7, 2020

> You can set up custom merge drivers for different file types

True, but there are inevitably some files which still cannot be merged, so the problem remains.

> More importantly, if you've got stuff in your decentralised repo that shouldn't be decentralised, that's not the fault of the DVCS you're using, it's your fault.

Indeed, if you want to store the history of your files - the whole software including the icons it uses, so that you can go back to any previous version and build it - and you chose a DVCS like Git, I would agree the fault was yours.

That's basically what I was arguing, that Git is the wrong choice if you have any binary assets like icons (even if those assets have small filesize) due to the lack of locking, sorry if I was unclear.

munmaek · on Jan 7, 2020

You should not be checking in binary files into git. That defeats the entire purpose and bloats the repo size enormously.

Git LFS should be used instead. Or storing a Sha256sum and putting the file elsewhere.

imtringued · on Jan 7, 2020

That is a solution but then it's no longer distributed.

alkonaut · on Jan 8, 2020

Being distributive isn’t a feature for the binary files contents. I don’t want 100 historical versions of a huge game texture, just the latest one. The history is distributed however, so I can see who changed the texture even when disconnected. Centralized binaries like git LFS works like any package/asset manager.

There is no one that would want distributed binaries in git. But people also don’t want to switch from git to something else just because they have a 100GB or 10TB repo. Tooling (build tools, issue management) everywhere has decided that git is all that’s needed.

Not putting binaries in git isn’t a solution at all. Binaries are part of the source in many applications (e.g game assets, web site images...). Distributing every version of every binary to everyone is also not a solution.

jnurmine · on Jan 7, 2020

Maybe for your particular problem it's worthwhile to setup a CM policy for the (topic) branch naming. For example in your case something like: topic/alkonaut/version-1.0/foo or topic/alkonaut/master/bar.

Or use something like: git log --all --graph --oneline

One tip for "zigzag spiderweb" is to always rebase your topic branch to the target branch prior to a fast-forward merge to the target branch (e.g. master). To clarify: while in your branch topic/foobar: "git rebase master", "git checkout master", "git merge --ff-only topic/foobar".

(There's surely a clever shorthand for the above procedure but when it comes to the command line, I like to combine small things instead of complicated memorized things, it's some kind of Lego syndrome)

alkonaut · on Jan 7, 2020

Rebase + FF solves the spiderweb problem by removing the branches. But some insist on keeping the branches and I don’t get why a “git log” doesn’t have (and default) to showing important branches as straight lines.

Also with dozens of tiny commands but only a handful of actual desired outcomes, the high operations should be explicit commands. E.g “rebase this branch on master and then squash it and commit on master”.

A lot of the local/remote could also be hidden. The number of times I want to rebase on my local master which is behind origin by 2 commits is... zero.

lrem · on Jan 7, 2020

Said 80/20 was the idea behind Fossil, along the "easy to learn because similar to svn". Seemed a good idea. But lost to having a famous user, which the Linux kernel obviously is.

baud147258 · on Jan 7, 2020

If its interface is not simple and elegant, I don't see how you can call git simple and elegant, since it's how all users will interact through the interface. And personally I prefer a VCS with less ways to shoot myself in the foot than git.

mschaef · on Jan 7, 2020

> If its interface is not simple and elegant, I don't see how you can call git simple and elegant, since it's how all users will interact through the interface.

At least in my experience, the interface makes a lot more sense if you understand the underlying data structure, which does have a certain elegant simplicity. (Even if it doesn't work quite the same as traditional source code control systems. Failing to work with directories is a problem of the git approach. Having a nice offline story is a distinct advantage.)

> And personally I prefer a VCS with less ways to shoot myself in the foot than git.

Oddly, the thing I love about git is how easy it makes it to recover from mistakes. Even if there are more ways to shoot yourself in the foot, there are also more ways to put your foot back exactly the way it was before you shot it. (If only real life worked that way!) This is what the immutable content storage under the hood of a git repository gets you.

If you know the commit hash (and there are a bunch of ways to easily keep track of these), you can get back to the state that's represented by that hash. Commands like merge/rebase/cherry-pick make this particularly easy by providing an '--abort' option that means "I've screwed this operation up beyond repair and need to bail out." And the abort works. As long as you had your target state committed, you can get back to it. (And if that's just a transient state that you don't want to persist, it's easy enough to squash it into something coherent.)

baud147258 · on Jan 7, 2020

>the interface makes a lot more sense if you understand the underlying data structure

Except that I don't have to understand the underlying data structure to use a more basic VCS like Mercurial. What makes git so special that I would have to do that before being able to use it?

And for recovery from mistakes, I meant stashing the changes somewhere, deleting the repository and downloading a clean copy to start again, which I had to do a few times with Git and never with Mercurial (I might had to do it once or twice with SVN, though).

mschaef · on Jan 7, 2020

> Except that I don't have to understand the underlying data structure to use a more basic VCS like Mercurial. What makes git so special that I would have to do that before being able to use it?

I don't think it is special. Generally after a while using a given tool, library, etc. I find it useful to dig in a bit and see what's happening under the hood to help understand why it works the way it does. git just happens to be the tool under discussion at the moment.

> And for recovery from mistakes, I meant stashing the changes somewhere, deleting the repository and downloading a clean copy to start again, which I had to do a few times with Git and never with Mercurial (I might had to do it once or twice with SVN, though).

I think we're talking about the same sort of mistakes. It's hard for me to imagine a case where you'd need to blow away a local git repository entirely. Worst case scenario, there should be good refs available in a remote that are just a 'git fetch' away. (If there's no remote, then blowing away the local repo is essentially just starting from scratch anyway.)

ldiracdelta · on Jan 7, 2020

You also don't have to understand the underlying structure for a similarly powerful DVCS like bitkeeper. Yes, it isn't open source, but git was a major step back in usability for my group from bk to git.

cmrdporcupine · on Jan 7, 2020

Yes, this. I actually tried bk before git, and actually used bazaar and then mercurial before git as well. I was stunned at how arcane the UI in git was made (And how arrogant the community of users around it could be, too). Bk was clean and elegant frankly. I'm no idiot when it comes to the concepts -- but git's CLI interface is just awful.

Bitkeeper is in fact open source now, BTW. Too late, but it is.

ldiracdelta · on Jan 7, 2020

You're right. The arrogance was hilarious. People with no experience with bk saying, "What's your problem?" Now, git is super fast, because its core is written by Linus, but I think he is just so much better technically and so far into the internal weeds of Linux for so long in so many areas that he had trouble creating an API for mere mortals.

mschaef · on Jan 7, 2020

> git was a major step back in usability for my group from bk to git.

What does that mean in concrete terms? What are the failures you're seeing with git that you weren't with bk? How long has your team used git? bk?

ldiracdelta · on Jan 7, 2020

We used bk when I was with the group for 3 years and then switched to git. Been using git for 8 years. I know git, but the ergonomics and basic English semantic meaning of commands is much worse. I have to look up git commands and subflags _all_ the time still for checking out old versions of files to a new file. Looking at tags. Commiting to a new branch, et c. Bk's version of gitk was superior and the usage was nicer. I've used mercurial, svn, cvs, git, and bk. Git is hard, but it is the standard now, so of course I'll continue to embrace it. Just not as ergonomic.

coldpie · on Jan 7, 2020

> At least in my experience, the interface makes a lot more sense if you understand the underlying data structure, which does have a certain elegant simplicity.

Yeah. Take an afternoon to read through gittutorial(7), gittutorial-2(7), and gitcore-tutorial(7). Git is a tool, and just like any other tool (car, tablesaw), you will be much better off if you take the time to learn to use it properly. Once you see "The Matrix" behind Git, it becomes an incredibly easy to use and flexible tool for managing source code and other plaintext files.

baud147258 · on Jan 7, 2020

The fact that you put Git in the same category as tools having a potential of inflicting grievous bodily harm if misused is telling. And why does Git require this whereas other VCS don't? Mercurial was incredibly easy to use nearly right out of the gate, not after an afternoon of work.

coldpie · on Jan 7, 2020

> The fact that you put Git in the same category as tools having a potential of inflicting grievous bodily harm if misused is telling.

They're just examples of tools.

> Mercurial was incredibly easy to use nearly right out of the gate, not after an afternoon of work.

I talk about this elsewhere in this thread, but I disagree with this assertion. I find Mercurial baffling and Git very elegant, though it could be an artifact of the order in which I learned the tools.

mschaef · on Jan 7, 2020

> And why does Git require this whereas other VCS don't?

It doesn't. It just works better when you take the time to learn how it works. (Which is an experience I commonly have with the tools I use, for whatever that's worth.)

chiefalchemist · on Jan 7, 2020

Agreed. If the applications we built using git were as awkward as git...our users and clients would scream, or worse.

P_I_Staker · on Jan 8, 2020

I don't understand the relevance of git in all of that

munmaek · on Jan 7, 2020

You can almost always recover from your mistakes with git reflog.

P_I_Staker · on Jan 8, 2020

Yes and no. I'm a git guy and a fan, but you can really, really mess things up. Usually, this is only when using features like force push; however, there are arguably legitimate use cases for that.

Buddy had a teammate that almost force pushed references from a slightly different repo. What a mess that could have been! I agree regarding the usefulness of reflog, and think the complaints about messing things up with rebase, reset, ect are overblown. It really isn't an issue for intermediate users.

munmaek · on Jan 9, 2020

Hence almost always. It’s not a common situation to delete commits from history, etc.

I don’t see the capability to force push as a negative. There are situations in which it’s necessary, like forcibly removing history (something I had to do just today).

Git gives you the ability to shoot yourself in the foot, so it’s up to the operator to not make a mistake like that without backing up the repo to a different place first, etc. Something something only a poor carpenter blames their tools.

tomesch1982 · on Jan 7, 2020

Git is neither easy not is it really elegant. It is useful for projects like Linux™ but for the vast majority of projects way better tools like mercurial or fossil would be a much better fit.

barrkel · on Jan 7, 2020

After svn, git was a breath of fresh air; far easier to use and reason about, not to mention much faster.

I don't think much of your all-in-one solution like fossil - that's a competitor for GitHub (without the bits that make GH good), not git.

I tried to use hg at one point in the early days, and found it much slower than git. Git's low latency for commands made a substantial difference, perceptually. In principle I think git encourages too much attention to things like rebases, which fraudulently rewrite history and lie about how code was written, just so the diff history can look neater. Working code should be the primary artifact, not neat history, and rebases and other rewrites make it too easy to cause chaos with missing or duplicated commits in a team environment. So ideologically, mercurial is a better fit, but that's not enough to make me use it.

Fit is a function of an environment; when we say survival of the fittest, we mean fitness as adapted to an environment. Feature set isn't the only aspect; at this point, the network effects of git are insurmountable without a leap forward in functionality of some kind.

(I think git & hg are just as elegant as one another; to me, the elegance is in the Merkle tree and the conceptual model one needs to operate the history graph.)

jat850 · on Jan 7, 2020

Can you explain what you mean by fossil being a competitor for github, rather than git? Fossil is a scm with additional features for usage, but (the last I used it, and to my memory) it was just the command line fossil very much like git, and that's how I used it.

What makes it the case that fossil cannot be a competitor to git (or hg), in that they are both a vcs?

edit I haven't had a lot of sleep. What I'm trying to ask, I suppose, is why can't you use fossil just like git and ignore any all-in-one features it provides? (This is not to comment on how good, scalable, fast, correct, or robust it is.)

wyoung2 · on Jan 7, 2020

You can, though I suspect the OP's focus on speed means you'd want to turn off Fossil's autosync feature, which makes it operate more like Git: checkins go only the local repository initially, and then you must later explicitly push them to the repo you cloned from.

This is why Subversion was "slow": your local working speed was gated by the speed of the central repo, which could be slow if it was under-powered or overloaded, as was common with the free Subversion hosts of the day. At least with Git, you can batch your local changes and push them all at some more convenient time, such as when you were going off for a break anyway.

peteretep · on Jan 7, 2020

> way better tools

I work with a group of people who all know enough git that we're productive, and a few of us know enough git to solve complicated problem.

I've not seriously considered fossil or mercurial -- what are the top three tangible benefits I'd get from them getting our team to switch?

rkangel · on Jan 7, 2020

I have never used Fossil, but I used to be a strong proponent of Mercurial. My advice is don't - Mercurial lost, git has won, and fighting against the current is just going to make your life harder.

The main advantage Mercurial has over git is a command line syntax that makes consistent sense. The operations you want to do are easy and as you try and do more complicated things, the new commands will be unsurprising and predictable. If you already know how to use git then this advantage is (mostly) irrelevant.

There are some other features that are interesting - Mercurial has a couple of different types of branches. Bookmarks are like git branches, whereas named branches are a completely different concept which can be useful. 'Phases' tracks whether commits have been shared, and prevents you rewriting (rebasing) them when appropriate.

If you do experiment, note that many 'power user' features are turned off by default. There is a robust extension system, and the default mercurial installation includes a load of standard ones. My config file includes the following to turn on some useful stuff ('record' is the most useful for a staging area like facility):

[extensions] pager = color = convert = fetch = graphlog = progress = record = rebase = purge =

coldpie · on Jan 7, 2020

I know Git inside and out, but I had to use Mercurial for a client a couple years ago. I found it to be the most baffling and nonsensical source control experience of my life. It might be a case of cross-contamination. Like you said, each SCM uses similar terms for different concepts, so my Git knowledge may have unfairly colored how I expected similar terms to work in Mercurial.

But stuff like: "hg log" gives you _every commit in the repo_?? When is that ever useful? How do I get only the commits that lead to the current state of the repo? Mercurial doesn't have branches; instead you're supposed to _copy the whole directory_[1] at the filesystem level?? Of course this is ridiculous, so they invented "bookmarks" which are actually Git branches. The extensions thing you mention is also a ridiculous chore. Just have sane defaults. I also found hg's output very dense and hard to understand and read, poorly suited for human consumption.

I dunno. I'm sure Mercurial is fine, many people use it every day, and likely my strong Git bias was affecting my ability to learn Mercurial. But I found it far easier to just clone into Git, use Git to do source control, and then export back to Mercurial when I'm ready to share my work.

[1] https://www.mercurial-scm.org/wiki/TutorialMerge

rkangel · on Jan 7, 2020

Mercurial does absolutely not require you copying at the fs level. You're not the first person to be caught out by that tutorial, which I think would serve us best by being deleted.

The 'original' branching method for Mercurial is called Named Branches. The big difference with Git is that every commit is labelled with what branch it is on. This has advantages - if you imagine looking at the train track of 'master' in git with it's divergence for a few commits and then merge, you can see that the 3 commits were on a branch called 'performance', whereas with git that history is completely lost. See: https://www.mercurial-scm.org/wiki/NamedBranches

As usage of git grew, the git branching model gained popularity and so the Mercurial bookmarks extension was created (https://www.mercurial-scm.org/wiki/Bookmarks).

It can be seen as a downside that there are two branching options that you have to choose between.

coldpie · on Jan 7, 2020

It's not just that tutorial, see also [1,2]. I think this is/was really an "official" way to do branching, and it seems utter madness to me :)

[1] Sadly the popular hginit.com seems dead, this was my first introduction to Mercurial. https://web.archive.org/web/20180722012242/http://hginit.com...

[2] https://stevelosh.com/blog/2009/08/a-guide-to-branching-in-m...

jordigh · on Jan 7, 2020

Branching by cloning was copied from bitkeeper. It was also early git's only branching mechanism. If you listen to Linus's talk when he introduced git at Google, you'll hear him conflate "branch" with "clone" because that's what he was thinking of at the time.

https://www.youtube.com/watch?v=4XpnKHJAok8