It's ironic that Git was popularized in the same era as monorepos, yet Git is a poor fit for monorepos. There have been some attempts to work around this. Google's `repo` command is a wrapper around Git that treats a set of smaller repos like one big one, but it's a (very) leaky abstraction. Microsoft's GVFS is a promising attempt to truly scale Git to giant repos, but it's developed as an addon rather than a core part of Git, and so far it only works on Windows (with macOS support in development). GVFS arguably has the potential to become an ubiquitous part of the Git experience, someday... but it probably won't.
Git also has trouble with large files. The situation is better these days, as most people have seemingly standardized on git-lfs (over its older competitor git-annex), and it works pretty well. Nevertheless, it feels like a hack that "large" files have to be managed using a completely different system from normal files, one which (again) is not a core part of Git.
There exist version control systems that do scale well to large repos and large files, but all the ones I've heard of have other disadvantages compared to Git. For example, they're not decentralized, or they're not as lightning-fast as Git is in smaller repos, or they're harder to use. That's why I think there's room for a future competitor!
(Fossil is not that competitor. From what I've heard, it neither scales well nor matches Git in performance for small repos, unfortunately.)
Git's flaws are primarily in usability/UX. But I think for its purpose, functionality is far more important than a perfect UX. I'm perfectly happy knowing I might have to Google how to do something in Git as long as I can feel confident that Git will have the power to do whatever it is I'm trying to do. A competitor would need to do what git does as well as git does it, with a UX that is not just marginally better but categorically better, to unseat git. (Marginally better isn't strong enough to overcome incumbent use cases)
And for the record: I think git-lfs issues are primarily usability issues, and tech improvements. The tech enhancements will be solved if there's enough desire, and as I mentioned the usability problems are more annoyances than actual problems.
A major limitation of git is how it deals with many "big" (~10Mb) binary files (3D models, textures, sounds, etc.).
We ended up developing our own layer over git, and we're very happy ; even git-lfs can't provide similar benefits. This technique seems to be commonplace for game studios (e.g Naughty Dog, Bungee), so certainly git has room for improvement here.
If somebody comes up with something that matches Git's strengths and also handles binaries and biggies much, much better then they could definitely topple Git with it. It'd take time for the word to spread, the tools to mature and the hosting to appear, but I can definitely see it happening.
I think most people know that Git isn't perfect, but it's also the case that coming up with anything better is an extremely difficult task. If it wasn't, someone would have already done it. It's not like people haven't been trying.
I'd argue if it's the later, that git was never the right choice to begin with. You don't really want to record a full 10MB of data every time you change one pixel in your texture or one blip in your sound, right?
So I don't know if this is a "major limitation" of git per se. Not saying there's a better solution off-the-shelf (you're obviously happy with your home grown). But this was probably never a realistic use for git in the first place.
> You don't really want to record a full 10MB of data every time you change one pixel in your texture or one blip in your sound, right?
Actual changes to content in a gamedev studio are very unlikely to be as small as a single pixel. Changes to source code are unlikely to be as small as a single character either. And we definitely want a record of that 10MB.
We're willing to sacrifice some of our CI build history. Maybe only keeping ~weekly archives, or milestone/QAed builds after awhile, of dozens or hundreds of GB - and maybe eventually getting rid of some of the really old ones eventually. Having an exact binary copy of a build a bug was reported against can be incredibly useful.
Sure, immutable build artifacts can be invaluable -- but aren't they also an orthogonal concern?
One person's immutable build artifact is another person's vendored build input.
It's common to vendor third party libraries by uploading their immutable build artifacts (.dll, .so, .a, .lib, etc.) into your VCS, handling distribution, and keeping track of which versions were used for any given build. It makes a lot of sense if those third party libraries are slow to build, rarely modified, and/or closed source - no sense wasting dev time forcing them to rebuild it all from scratch.
The next logical step is to have a build server auto-upload said immutable build artifacts into your VCS, for those third party libraries that you do have source code for, when your VCS copy of said source is modified. Much more secure and reproducable than having random devs do it.
And hey, if your build servers are already uploading build artifacts to VCS for third party libraries, why not do so for your own first party build artifacts too? Tools devs spending most of their time in C# probably don't need to spend hours rebuilding the accompanying C++ engine it interoperates with from scratch, for example, so why not "vendor" the engine to improve their iteration times?
This can lead to dozens of gigs of mostly identical immutable build artifacts reuploaded into your VCS several times per day, with QA testing and then integrating those build artifacts into other branches on top of that. The occasional 10MB png is no longer noticable by comparison.
Build artifact caching is a different problem from source control, with very different requirements:
1. As you mentioned, the artifacts tend to get huge.
2. The cache needs to be easy to bypass. From your example, it needs to be easy for the C++ engine devs to do builds like "the game but with the new engine" to test out their changes.
3. The cache needs to be precise, so you don't end up with mystery errors once it finally does trigger, or people wondering why their changes don't seem to apply.
4. The builds need to be exactly reproducible, so you don't end up with some critical package that only Steve Who Left 5 Years Ago could build (or Jenkins Node 3 That Just Suffered A Critical HDD Failure).
Git either doesn't care about or fails spectacularly for each of those points. In particular, #3 will be very confusing since there will be a delay between the code push and the related build push.
Nix solves #2 and #3 by caching build artifacts (both locally and remotely) based on code hashes and a dependency DAG (for each subproject or build artifact, so changing subproject X won't trigger a rebuild of unrelated subproject Y, but will rebuild Z that depends on X). It helps with #4 by performing all builds in an isolated sandbox.
#1 is solved by evicting old artifacts, which is safe as long as you trust #4. If the old artifact is needed again then it will be rebuilt for you transparently. Currently this is done by evicting the oldest artifacts first, but it could be an interesting project to add a cost/benefit bias here (how long did it take to build this artifact, vs the amount of space it consumes?).
> 1. As you mentioned, the artifacts tend to get huge.
This, admittedly, is more common with build artifacts. That said, I've hit quota limits with autogenerated binding code on crates.io, with several hundred megs of code still being in the double digits when better compressed by cargo than I can figure out how to compress with 7-zip.
And that's a small single person hobby project, not a google monorepository.
> 2. The cache needs to be easy to bypass
I need to bypass locally vendored source code frequently as well, to test upstream patches etc.
> 3. The cache needs to be precise, so you don't end up with mystery errors once it finally does trigger, or people wondering why their changes don't seem to apply.
Also entirely true of source code.
> 4. The builds need to be exactly reproducible, so you don't end up with some critical package that only Steve Who Left 5 Years Ago could build (or Jenkins Node 3 That Just Suffered A Critical HDD Failure).
Enshrining built libs in VCS is an alternative tackling of the problem. You might not be able to reproduce that exact build bit-for-bit thanks to who knows what minor compiler updates have been forced upon you, but at least you'll have the immutable original to reproduce bugs against.
> In particular, #3 will be very confusing since there will be a delay between the code push and the related build push.
It's already extremely common - in the name of build stability, including with git - to protect a branch from direct push, and have CI generate and delay committing a merge until it's verified the build goes green. By wonderful coincidence, this is also well after CI has finished building those artifacts - in fact, it's been running tests against those artifacts - so it can atomically commit the source merge + binaries of said source merge all at once. No delay between the two.
There are some caveats - gathering the binaries can be a pain for some CI systems, or perhaps your build farm is underfunded and can only reasonably build a subset of your build matrix before merging. Or perhaps the person setting it up didn't think it through and has set things up such that code reaches a branch that uses VCS libs before the built libs reach the same spot in VCS - I'll admit I've experienced that, and it's horrible.
Nix, Incredibuild, etc. are wonderful alternatives to tackle the problem from a different angle though.
But I'm totally willing to fault git for failing to optimize enough to handle the manual commit cadence of source game assets though. Because that's not just a tertiary use case - frequently for coworkers it's their primary use case. The end result is I mostly only use git for personal hobby stuff, where it's a secondary use case and my assets are sufficiently small as to not cause problems.
Ideally, yes, why wouldn't I? I want to capture the exact state of the thing at each change.
I totally get the use case of saving each iteration of that 10MB file _somewhere_. But expecting git to do that job is not the right level of expectation, was my main point.
When I have worked with binaries like that described, I will place a URI reference to a file hash and have something that knows how to resolve it. A file store (think S3 or whatever) that has files named: texture1.dat-[sha1] and change the reference to the file in the source. e.g. a "poor man's" version control by way of file naming conventions. Does this approach work in your world?
Diffing a PSD as a binary is impossible - whereas proper asset management tools will deconstruct the PSD’s format to make for a human-readable diff (e.g. added/removed layers, properties, etc).
I constantly run into git scalability issues as an individual. I don't use any of the UI clients because they all fail hard on mostly-code git repositories. I abandoned my VisualRust port in part because the mere 100MB of mingw binaries involved for that meant it was using github LFS, which meant CI was hitting github quota limits, and as I wasn't part of the organization - nevermind an admin with billing rights - I couldn't even pay to up said quota limits paying out of pocket myself even if I wanted to.
I'm not going to inflict git's command line experience - which confounds and confuses even seasoned programmers - on any of the less technical artists that might be employed at a typical gamedev shop, even if git might be able to scale acceptably if locally self-hosted at a single-digit employee shop.
A few dozen or hundred employees? Forget it. Use perforce, even though it costs $$$, is far from perfect, and also has plenty of scaling issues eventually.
That one of - if not the - most popular tool to solve said git scalability problems, also has scalability problems in practice, is both ironic - and absolutely a problem with the git ecosystem. To be pithy - "Even the workarounds don't work."
"Technically", you might say, "that specific symptom with git lfs, and that service provider, isn't the fault of git the command line tool, nor the git protocol". And you would be technically correct - which is the best kind of correct.
But I don't think we're referring to either of those particularly specific things with "Git" when we ask the article's question of "Is Git Irreplacable?". I'm already the weirdo for using git the command line tool - most of my peers use alternative git UI clients, and I don't mean gitk. The git protocol is routinely eschewed in favor of zips or tarballs over HTTPS, Dropbox, Sneakernet, you name it - and is invisible enough to not be worth complaining about to pretty much every developer who isn't actively working on the backend of a git client or server. Not to mention it's been extended/replaced with incremental improvements over the years already.
So I'm using a slightly broader definition of "git", inclusive of the wider ecosystem, that allows me to credit it for the alternative UI clients that do exist, rather than laughing off the question at face value - as something that has already been replaced.
Github the company is not interested in providing you (or anyone else) with free storage for arbitrary data. You were unable to pay for the storage options they do provide because you did not have admin rights to the github account you wanted to work with.
None of this is a problem with git, be it GUI git clients or command line ones.
This isn’t just "technically correct". It’s the "a commercial company doesn’t have to provide you with a service if they don’t want to" kind of correct.
All the commercial service providers recommend keeping total repository sizes <1GB or so, and I hear nothing but performance complaints and how much they miss perforce from those who foolishly exceed those limits, even when self hosting on solid hardware - which is 100% the fault, or at least limitation, of git - I believe you'll agree.
LFS is a suggested alternative by several commercial service providers, not just one, and seems to be one of the least horrible options with git. You're certainly not suggesting any better alternatives, and I really wish you would, because I would love for them to exist. This results in a second auth system on top of my regular git credentials, recentralization that defeats most of the point of using a DVCS in the first place, and requires a second set of parallel commands to learn, use, and remember. I got tired enough of explaining to others why you have a broken checkout when you clone an LFS repository before installing the LFS extension, that I wrote a FAQ entry somewhere that I could link people. If you don't think these are problems with "git", we must simply agree to disagree, for there will be no reconciling of viewpoints.
When I first hit the quota limits, I tried to setup caching. Failing that, I tried setting up a second LFS server and having CI pull blobs from that first when pulling simple incremental commits not touching said blobs. Details escape me this long after the fact - I might've tried to redirect LFS queries to gitlab? After a couple hours of failing to get anywhere with either despite combing through the docs and trying things that looked like they should've worked, then I tried to pay github more money - on top of my existing monthly subscription - as an ugly business-level kludge to solve a technical issue of using more bandwidth than should really have been necessary. When that too failed... now you want to pin the whole problem on github? I must disagree. We can't pin it on the CI provider either - I had trouble convincing git to use an alternative LFS server for globs when fetching upstream, even when testing locally.
I've tried gitlab. I've got a bitbucket account and plenty of tales of people trying to scale git on that. I've even got some Microsoft hosted git repositories somewhere. None of them magically scale well. In fact, so far in my experience, github has scaled the least poorly.
> Github the company is not interested in providing you (or anyone else) with free storage for arbitrary data.
I pay github, and tried to pay github more, and still had trouble. Dispense with this "free storage" strawman.
> You were unable to pay for the storage options they do provide because you did not have admin rights to the github account you wanted to work with.
To be clear - I was also unable to pay to increase LFS storage on my fork, because they still counted against the original repository. Is this specific workaround for a workaround for a workaround failing, github's fault? Yes. When git and git lfs both failed to solve the problem, github also failed to solve the problem. Don't overgeneralize the one ancedote of a failed github-specific solution, from a whole list of git problems, to being the whole problem and answer and it all being github's fault.
> None of this is a problem with git, be it GUI git clients or command line ones.
My git gui complaints are a separate issue, which I apparently shouldn't merely summarize for this discussion.
Clone https://github.com/rust-lang/rust and run your git GUI client of choice on it. git and gitk (ugly, buggy, and featureless though it may be) handle it OK. Source Tree hangs/pauses frequently enough I uninstalled, but not so frequently as to be completely unusable. I think I tried a half dozen other git UI clients, and they all repeatedly hung or showed progress bars for minutes at a time, without ever settling down, when doing basic local use involving local branches and local commits - not interacting with a remote. Presumably due to insufficient lazy evaluation or insufficient caching. And these problems were not unique to that repository either, and occured on decent machines with an SSD for the git UI install and the clone. These performance problems are 100% on those git gui clients. Right?
> This isn’t just "technically correct".
Then please share how to simply scale git in practice. Answers that include spending money are welcome. I haven't figured it out, and neither has anyone I know. You can awkwardly half-ass it by making a mess with git lfs. Or git annex. Or maybe the third party git lfs dropbox or git bittorrent stuff, if you're willing to install more unverified unreviewed never upstreamed random executables off the internet to maybe solve your problems. I remember using bittorrent over a decade ago for gigs/day of bandwidth, back when I had much less of it to spare.
> It’s the "a commercial company doesn’t have to provide you with a service if they don’t want to" kind of correct.
If it were one company not providing a specific commercial offering to solve a problem you'd have a point. No companies offering to solve my problem for git to my satisfaction, despite a few offering it for perforce, is what I'd call a git ecosystem problem.
If my computer caught fire and exploded due to poor electrical design, you wouldn't say "nothing about your problems had anything to do with your computer and everything to do with the specific company that provided your pencils" when in my growing list of fustrations I offhandedly mentioned breaking a pencil tip after resorting to that, what with the whole computer being unavailable and all. That would be weird.
Even if we did hyper focus on that pencil - pretty much every pencil manufacturer is giving me roughly the same product, and the fundamental problem of "pencils break if you grip them too hard" isn't company specific. It's more of a general problem with pencils.
Github gave me a hard quota error. Maybe Gitlab would just 500 on me, or soft throttle me to heck to the point where CI times out. Maybe Bitbucket's anti-abuse measures would have taken action and I'd have been required to contact customer support to explain and apologize to get unbanned. git lfs's fundamental problem of being difficult to configure to scale via caching or distribute via mirroring isn't company specific. It's more of a general problem with git lfs. Caching and mirroring are strategies nearly as old as the internet for distribution - git lfs should be better about using them.
It would've turned github's hard quota error into a non-event, non-issue, non-problem - just like they are with core git. Alternatively, core git should be better about scaling. Or, as a distant third alternative, I could suggest a business solution to a technical problem - GitHub should be better about letting me pay them to waste their bandwidth. Then I could workaround git's poor scaling for a little bit more, for a bit longer.
I recollect that for Windows (which also uses git), MS have actually extended git with "Git Virtual File System" rather than replace it. But I do agree that broadly, not everyone needs the scale.
I would love a layer over Git to handle workflow issues related to multi-repo projects
I would say that the sole thing git was developed for, the Linux Kernel, is (starting to be) painful to work with when using git.
Honestly asking.. Do you speak from some level of authority that the Linux kernel is stretching the boundaries of git? Or are you just saying that more speculatively? What is the painful part?
The magic sweet spot might be the fact that most projects to not need to be distributed. This is where a lot of complexity is derived.
So without all those extra concerns - and - a more elegant UI framework (i.e. rational commands) - and possibly something that scales a little better. That's enough mojo to unseat git for a lot of things.
You can get around branching-is-bad by changing your workflows a bit, but you can't get around the bad merges: over time it's like death by a thousand papercuts.
Mercurial has 3 ways of doing branching:
- bookmarks: these are like git branches, a pointer to a revision
- branches: when you are in a branch, all commits are permanently affixed with that branch name. Less flexible than bookmarks (and therefore git branches) but good for traceability
- heads: unlike with git, a branch name can refer to several actual branches, it usually happens when you are pulling from a central repository, but you can create them yourself if you need some kind of anonymous branching. These can be pushed but it is not recommended.
Git only has the first option.
The way central repository are managed is also a bit different even if the fundamentals are the same. Git has the "origin" namespace to distinguish remote branches from local branches. Mercurial uses a "phase" which can be "public" (in remote), "private" (local only, will become "public" after a push) and "secret" (like "private", but will not be pushed and therefore will not become "public"). So if you are not synchronized with the remote, in git you will have two branches: origin/my_branch and my_branch, in mercurial, you will have two branches named my_branch, one public, one private. That's essentially the same thing, presented differently.
In the end, they are fundamentally the same. The feel is different though. Git is flexible, and gives you plenty of tools to keep things nice and clean when working with large, distributed project. As expected for something designed for the Linux kernel. Mercurial focuses on preserving history, including the history of your mistakes, and I feel it is better suited for managed teams than a loosely connected community.
As a Google employee I use hg every day, even though it's not required. (Some teams at Google do mandate its use, but these are few and far between.) I don't use branches, but I use bookmarks. I didn't notice any merges that really ought to be performed automatically but were not; in any case I use Meld to resolve merge conflicts and it's easy enough to do occasionally.
Normally mercurial stops when there are conflicts it cannot resolve reliably. In that cases, have a try at kdiff3: it handles hairy merges quite well. In a lot of cases even automatically (and correctly).
There is always meld, but I'd say kdiff3 is superior wrt merge conflict resolution.
What bothers you in particular?
If you want the wacky and unreliable git branching you can use hg bookmarks.
This page has been viewed 230 thousand (!!) times. Because git is so easy and elegant that it lies to you what branches exist on the remote.
It is not even funny any more how bad this is.
That's interesting. In your examples isn't it fast because monorepos are network-based, as in, you only fetch what you need when you need it?
Also reminded me of discussions around CPython's startup time and how one use case where milliseconds matter is in small cli utilities such as Mercurial.
The entire repo is stored on a networked file system. So essentially every file operation is remote. That doesn't actually contribute to much slowness because when I didn't use hg, operations were noticeably faster.
It’s only huge megacorps that need larger scale things like GVFS.
As for large files, that is not what Git is for. Git is for source code. Much like how you don’t put large files in your RDBMS, you should not be putting them in your SCM either.
You can still do it if you want, but you might be better served using https://git-lfs.github.com/ or using another system designed for that purpose.
Nowadays, a new computer means a git clone (or just plain copy-paste) of a USB stick from the old one. This seems like it's a single feature of git that could be written, but if you told me "there's something that works better for large, twenty year old repos", I'd probably take that.
I don't know how Linux survives, but maybe it's just that you only rarely git clone your large repos. (Or maybe it's just that intercontinental internet is less reliable than intracontinental, so that if you're in the US it's a non issue.)
Could you please provide a link to it? I’m very interested in seeing this command, but ironically it’s not a name that’s easy to google for.
Edit: I was very wrong, searching for “google repo command” displayed https://gerrit.googlesource.com/git-repo as the very first result.
The possibility of git being the last mass-market DVCS within my lifetime leaves me with warm fuzzy feelings. Git is simple and elegant, though its interface might not be.
For example a typical question on Stackoverflow is "How do I answer which branch this branch was created from", always has 10 smug answers saying "You can't because git doesn't really track that, branches are references to commits, and what about a) a detatched head? b) what if you based it off an intermediate branch and that branch is deleted? c) what if...
5 more answers go on to say "just use this alias!" [answer continues with a 200 character zsh alias that anyone on windows, the most common desktop OS, has no idea what to do with].
I don't want to write aliases. I usually don't want to consider the edge cases. If I have 2 long lived branches version-1.0 and master. I want to know whether my feature branch is based on master or version-1.0 and it's an absolute shitshow. Yes it's possible, but is it simple? Is it elegant? No.
The 80/20 (or 99/1) use case is
- centralized workflow.
- "blessed" branches like master and long lived feature branches that should ALWAYS show up as more important in hisory graphs.
- short lived branches like feature branches that should always show up as side tracks in history graphs.
Try to explain to an svn user why the git history for master looks like a zigzag spiderweb just because you merged a few times between master and a few feature branches. Not a single tool I know does a nice straight (svn style swimlane) history graph because it doesn't consider branch importance, when it should be pretty simple to implement simply by configuring what set of branches are "important".
Git is hard for idiots imo, and there are a lot of us
Create a new branch and check it out while you are on the last commit (git checkout -b my-branch), delete the master branch (git branch -D master), and pull it again (git pull -u origin master). You'll end up with a local branch with a bunch of commits that you can merge, rebase or cherrypick, depending on what you want.
If you want to learn more about git in a practical way, there's an awesome book called Git Recipes.
Git is a very flexible tool that allows for individual local workflows independent of how teams collaborate. Finding a personal workflow that works for you is a little investment that pays huge dividends for a long time. Git is a 15 year old tool that is expected to live for 10-30 years more at the very least. I encourage everyone to learn enough Git to not be afraid of it.
I see no reason git needs to be changed in order to cater to people who refuse to read basic documentation or learn from their mistakes.
I have considerable sympathy for the RTFM reply, but I do not think it is the last word that shuts down any question of usability. What seems clear to me is that there are a lot of people using git who probably should not be. In many cases, they do not have a choice, but I also suspect that many of the organizations that have chosen git do not have the issues that it is optimized for.
In my opinion, solving problems and making improvements involves reducing complexity, not defending it. Many people, including myself, have read the Git docs and learnt about the underlying data structures etc etc and still we can make the claim that it could be better, in numerous ways.
Calling everyone feckless won't invalidate that.
I’m not disputing this. Of course git isn’t perfect.
What I’m against is changing git to cater to people who can’t read the manual and make basic mistakes.
Why? Isn't software that doesn't require reading a manual and doesn't let the user make irreversible mistakes considered good design?
I can’t think of any software that handles a complex program that doesn’t have a manual, documentation like a manual, or a learning curve. Git is a tool for developers, not casual users who want typical apps.
Again, you wouldn’t make an argument like this for a tool used by a plumber or a mechanic. If a tool succinctly handles a problem, good! But using tools is part of the profession; they have learning curves.
Most issues with git are PEBKAC issues because people refuse to spend 10 minutes of their life reading about a tool they may use for hundreds or thousands of hours. I wouldn’t want to cater to those kinds of people.
About the plumbing/mechanic analogy, I totally would make the same case! Hammers and wrenches don't require a manual and can be used for very complex tasks, and that's exactly what makes them so well designed and popular. Few people want their hammer to have more features, and if they do, they still want to keep the good old hammer ready, because it's so easy and simple to use.
Especially calling out PEBKAC (Problem Exists Between Keyboard And Computer) - while even most of the expert git users, including the author himself say the interface could at least be made much better - makes me really suspicious that you simply like feeling superior to other people because you know something they don't, and you don't want to lose your "edge" if suddenly everyone can use version control without resorting to manuals.
iMovie vs Premiere/Final Cut. Final Cut X vs 7. Garageband vs Pro Tools. Word vs LaTeX. and so on. It's very difficult to design interfaces that are easy enough for average users that don't impede pros/power users.
A hammer isn't a good comparison. Something like a multimeter is what I was thinking of, etc. Git solves a significantly more complex problem than either of these, though.
> including the author himself say the interface could at least be made much better
I don't disagree! Git's interface -could- be better. That has nothing to do with my points above with regards to people refusing to read basic literature about the tools they use, expecting them to just magically do everything for them out of the box, "intuitively".
> feeling superior ... you don't want to lose your "edge"
This could not be further from the truth. I simply have no sympathy for people who refuse to read the manual or an intro to using a tool, and then complain about the tool being hard to use. Yeah.. it's hard because you didn't do any reading! Git is actually really easy if you read about the model that it uses. Most people don't need to venture out beyond ~5-6 subcommands, and even then it's easy to learn new subcommands like cherrypick, rebase, etc.
Adobe Photoshop, as another example, has a learning curve, but that tool is indispensable for professionally working on / editing images. (GIMP is also good, but that's not in the scope of this discussion). A lot of beginner issues are basically PEBKAC because they didn't read the manual. Same with Pro Tools, or probably any other software used by industry professionals. They're harder to use but what you can do with them (since they treat you like an adult, instead of holding your hand and limiting you) is incomparable to the output of apps designed for casual users.
That being an example that I remember from late last year, there are just sharp edges to git I end up catching myself on :)
Though you do have to have committed. One of the things I hammer on in my tutorials for work is that if you get confused in git, make sure you commit. If you commit, you can take your problem to the other engineers and we can almost certainly get you straightened away. Fail to commit, though, and you really may lose something.
Also, metapoint about git: While I won't deny its UI carries along some dubious decisions carried over from the very first design, in 2020, basically, if you thing "Git really ought to be able to do [this sensible thing]", it can. It has that characteristic that open source software that has been worked on by a ton of contributors has, which is that almost anything you could want to do was probably encountered by somebody else and solved five years ago. It just may take some searching around to figure out what that is. (And on the flip side, when you read the git man pages and are going "Why the hell is that in there?", the answer may well be "a problem that you're going to have in six months".)
This is not absolute gospel. If you screw up a rebase and commit, whatever you removed in the rebase is simply gone.
(That's not a git thing. I don't really even want some sort of hypothetical source control system that literally tracks every change I make. It's technically conceivable and should be practical to what would at least be considered a "medium sized" project in the git world, but I'd just be buried in the literally thousands of "commits" I'd be producing an hour. Failing that sort of feature, a source control system can't help but "lose" things not actually put into it.)
OK, so we have backups of his VM and we can recreate a clone of it, but will that be satisfactory? Are there any issues with hardware MAC addresses or CPU ids? How far down the rabbit hole of git minutiae do you have to go before you are confident that you can do all basic source-control operations safely?
is the more important imo than
A rebase does not destroy information. It creates new commits and moves the branch head to a different spot on the graph.
The reason git is seen as painful is because you can't claim expertise until you develop the ability to form a mental map of the graph. But once you do this the lights turn on and everything starts to make sense.
This is why the mantra "commit early and often" still holds. The more experienced git user will tell the newer people this, so when they come with a mess it will always be recoverable.
reflog is like undelete in filesystems, it's a probabilistic accident recovery mechanism for an individual computer (repo checkout in this case) that you can try to use if you don't have backups.
It takes trying to do anything that's not easily recoverable as long as you commit before you start messing around and don't rm -rf .git.
Make a git log --reflog --all and you will see all the commits you made (or rebase made) in the last 3 months.
You can than rescucitate an old branch by simply putting a branch name on it with git branch newname <old sha1>
git log --reflog --all
you will see with this magical command that git DOESN'T REMOVE any commit. Your old tree is still there, only normally hidden.
The commit that was there before you fouled up your branch is still there.
You now only need to set your branch to the old commit. A branch is nothing else than a pointer in the tree.
You have 2 possiblities to change the commit a branch points to
1. git branch --force <branchname> SHA1 (works only if <branchname> is not the current checked-out branch. Simply checking out with the SHA1 works also as it deteches the HEAD).
2. replace the SHA1 in the text file in .git/refs/heads/<branchname> by the SHA1 where you want the branch to point to.
With that, your repo is in the same state it was before your error.
git checkout master
git fetch origin
git reset --hard origin/master
Also see stackoverflow.
Git is a complex tool because it’s tackling a complex problem. I don’t see a way of making it “easier” without massively reducing what it can do. It’s like saying we should reduce a formula one car so people can use it without reading up on it, etc.
If something happens once, it happens. If something happens multiple times then it means you’re not evaluating why it occurred in the first place and learning from it. No tool in the world can solve this problem because it’s not a problem with the tool, rather the user.
Git is really not so hard, but it requires a little reading.
> significant time and energy
All someone needs is to read through https://rogerdudler.github.io/git-guide/, and learn a few commands.
Are we seriously going to refer to “reading the manual” as “significant time and energy”? In this case you don’t even have to read the manual, just a primer on how git works. You know, on how the tool that you’re using works. Why are people so allergic to spending even a modicum of time on learning a tool that massively simplifies their life and makes their work possible?
Do plumbers complain about having to read manuals for the equipment that they use? Electricians?
As programmers our tools are easier to learn and use, yet we complain about having to any work at all.
Why even be a programmer? If reading about git is so hard, what about the rest of the field that doesn’t even have documentation?
How about we don’t make tools that cater to the lowest common denominator, in this case people who basically can’t be assed to do anything? RTFM.
I have a way of picking the losing side so I've been using mercurial for everything until now, and until now Bitbucket offered hg. They're decommissioning it so I'm moving over to git and I feel like my workflow has been hampered, not just in the immediate complexity of learning the new tool, but in the ongoing complexity of using a less good tool for my needs.
I'm dealing with it, but the situation you're describing isn't really the one that I and a lot of other whiners are dealing with.
I spend ages unfucking local svn working copies and long running branches on both windows and linux. git needs some serious flaws to keep up with that experience.
Thankfully the standard DVCS is flexible enough to enable the workflows others need too.
Fossil tackles much the same sort of problem, yet it's far simpler to use.
Most of Git's problems are due to purposeful choices, but they're design choices, not inherent aspects of how a DVCS must behave.
We've laid out our case for the differences here: https://fossil-scm.org/fossil/doc/trunk/www/fossil-v-git.wik...
> our thing: Self-contained and efficient
This is not biased in any way and makes me want to continue reading. /s
Also, you can’t claim something to be “efficient” when it’s doing many different things like scm, issues/tickets, a web forum/ui ....
Then you have non-issues like git being installed via a package manager instead of dragging and dropping a binary. Yeah, this is such a huge problem that concerns people, better switch to Better Project (tm).
And then you take Gitlab and conflate Gitlab’s issues with problems with Git. I guess gogs/gitea don’t exist?
This page needs to be rewritten to simply list the differences in neutral language. There are good points but they’re lost in unnecessary epithets like “caused untold grief for git users”. I get it: git bad, our product good. Switch!
Personally, I don’t want something that tries to do many different things all at once.
Of course we're biased, but every row in that table corresponds to a section below where we lay out our argument for the few words up in the table at the top.
Here's the direct link for that particular point:
Now, if you want to debate section 2.2 on its merits, we can get into that.
> you can’t claim something to be “efficient” when it’s doing many different things
We can when all of that is in a single binary that's 4.4 MiB, as mine here is.
A Git installation is much larger, particularly if you count its external dependencies, yet it does less. That's what we mean when we say Git is "inefficient."
But I don't really want to re-hash the argument here. We laid it out for you already, past the point where you stopped reading.
> git being installed via a package manager instead of dragging and dropping a binary. Yeah, this is such a huge problem that concerns people, better switch to Better Project (tm).
It is on Windows, where they had to package 44-ish megs of stuff in order to get Git to run there.
On POSIX platforms, the package manager isn't much help when you want to run your DVCS server in a chroot or jail. The more dependencies there are, the more you have to manually package up yourself.
If your answer to that is "just" install a Docker container or whatever, you're kind of missing the original point. `/home/repo/bin/fossil` chroots itself and is self-contained within that container. (Modulo a few minor platform details like /dev/null and /dev/urandom.)
> This page needs to be rewritten to simply list the differences in neutral language.
We accept patches, and we have an active discussion forum. Propose alternate language, and we'll consider it.
> unnecessary epithets like “caused untold grief for git users”
You don't have to go searching very hard to find those stories of woe. They're so common XKCD has satirized them. We think the characterizations are justified, but again, if you think they're an over-reach, propose alternate language.
> I don’t want something that tries to do many different things all at once.
Not a GitHub user, then?
I haven't used nor looked at fossil in maybe 5 years, but had a couple of questions.
Does fossil now have any kind of email support built in to the ticket manager? I remember when I tried to use fossil for actual production use, there was no way to trigger emails sent when, e.g. tickets were submitted, and one of the devs said to just write a script to monitor the fossil rss feed and send the appropriate email, which seemed like a baroque and fragile (and time-consuming) solution.
And is any more of the command-line behavior configurable (like the mv/rm behavior -- affecting the file on disk as well as the repository, or just marking the file as (re)moved in the repository)?
I have commit access, yes, but mainly I work on the docs.
> Does fossil now have any kind of email support built in to the ticket manager?
Yes. It was added in support of the forum feature last year, but it also applies to several other event types: https://fossil-scm.org/fossil/doc/trunk/www/alerts.md
> one of the devs said to just write a script to monitor the fossil rss feed
Probably me. :)
> seemed like a baroque and fragile (and time-consuming) solution.
A dozen lines of Perl; easy-peasy. That and a pile of CPAN modules, but that's easily fetched with `cpanm`.
> the mv/rm behavior -- affecting the file on disk as well as the repository
The default you're referring to was changed a few years ago: the old `--hard` option is now the default.
Also, a comment about the argumentation in "test before commit". It feels a bit artificial wrt. what can be done locally, what git commit and git push do and what their relation is in a sane workflow. Certainly, one can push untested stuff to the remote server by mistake; but, even so, this should be OK, because if one can push directly to important branches like master or similar without going through any reviews and other sanity checks, one has a problem... and the problem isn't really Git :)
You must be referring to just the table at the top, not to the detailed argument below, which mentions git-worktree and then points you to a web search that gives a bunch of blog articles, Q&A posts, project issue reports and such talking about the problems that come from using that feature of Git.
I suspect this is because git-worktree is a relatively recent feature of Git (2.5?) so most tutorials aren't written to assume use of it, so most tools don't focus on making it work well, so bugs and weaknesses with it don't get addressed.
Fossil is made to work that way from the start, so you can't run into these problems with Fossil. You'd have to go out of your way to use Fossil in the default Git style, such as by cloning into ~/ckout/.fossil and opening that repo in place.
> test before commit". It feels a bit artificial wrt. what can be done locally, what git commit and git push do and what their relation is in a sane workflow.
That just brings you back to the problems you buy when separating commit from push, which we cover elsewhere in that doc, primarily here: https://www.fossil-scm.org/xfer/doc/trunk/www/fossil-v-git.w...
That's unfortunate. Reading the comments here, switch-branch-in-place is seen as some kind of flaw, but I don't think I would voluntarily use a VCS that doesn't let me easily do that (it's most sensible way for me to work, from way before Git was a thing).
You're conflating two separate concepts:
1. Git's default of commingled repo and and working/checkout directory
2. Switch-in-place workflow encouraged by #1
Fossil doesn't do #1, but that doesn't prevent switch-in-place or even discourage it. The hard separation of repo and checkout in Fossil merely encourages multiple separate long-lived checkouts.
A common example is having one checkout directory for the active development branch (e.g. "trunk" or "master") and one for the latest stable release version of the software. A customer calls while you're working on new features, and their problem doesn't replicate with the development version, so you switch to the release checkout to reproduce the problem they're having against the latest stable code. When the call ends, you "cd -" to get back to work on the development branch, having confirmed that the fix is already done and will appear in the next release.
Another example is having one checkout for a feature development branch you're working on solo and one for the team's main development branch. You start work from the team's working branch, realize you need a feature branch to avoid disturbing the rest of the team, so you check your initial work in on that branch, open a checkout of that new branch in a separate directory and continue work there so you can switch back to the team's working branch with a quick cd if something comes up. Another team member might send you a message about a change needed on the main working branch that you're best suited to handle: you don't want to disturb your personal feature branch with the work by switching that checkout in place to the other branch, so you cd over to the team branch checkout, do the work there, cd back, and probably merge the fix up into your feature branch so you can work with the fix in place there, too.
These are just two common reasons why it can be useful to have multiple long-lived checkouts which you switch among with "cd" rather than invalidate build artifacts multiple times in a workday when switching versions.
Git can give you multiple long-lived working checkouts via git-worktree, but according to the Internets it has several well-known problems. Not being a daily Git user, I'm not able to tell you whether this is still true, just that it apparently has been true up to some point in the past.
Since no one is telling me those issues with git-worktree are all now fixed, it remains a valid point of comparison in the fossil-v-git article.
Edit: It could be worth it to emphasize this in the Fossil vs. Git comparison that you linked, as it wasn't very clear to me after reading it.
Thanks for the feedback!
Well, not entirely, because in my opinion the detailed argument kind of hand-waves away the entire git worktree. Continuously switching branches inside a single large Git repo is certainly a suboptimal way to work with Git, but most of the time one should be able to avoid that with the worktree (though the worktree stuff is, of course, not a miracle cure for everything).
Also, Git continues to be taught with the switch-in-place method by default.
I’m not saying it is impossible to get a Fossil-like workflow with Git, just that there are consequences from that not being the default.
A far better analogy is:
It's like saying we should reduce a programming language so people can use it without reading up on it, etc.
The core problem with it is that very few people can get paid more by being better at using their [D]VCS, whereas those more skilled with their programming language(s) of choice often do get paid more to wield that knowledge.
Consequently, most people do not fully master their version control system to the same level that they do with their programming language, their text editor, etc.
To be specific, there are many more C++ wizards and Vim wizards than there are Git wizards.
In situations like this, I prefer a tool that lets me pick it up quickly, use it easily, and then put it back down again without having to think too much about it.
You see this pattern over and over in software. It is why all OSes now have some sort of Control Panel / Settings app, even if all it does is call down to some low-level tool that modifies a registry setting, XML file, or whatever, which you could edit by hand if you wanted to. These tools exist even for geeky OSes like Linux because driving the OS is usually not the end user's goal, it is to do something productive atop that OS.
[D]VCSes are at this same level of infrastructure: something to use and then get past ASAP, so you can go be productive.
git reset <commit hash> --soft
git add . && git stash save
and then finally:
git stash pop
Yes, thus https://xkcd.com/1597/
I find myself saying "git reset --hard origin/" and such with disturbing frequency.
Both are examples of "I give up; it's faster to start over." This is not what I want in a DVCS.
Then write a little merge message and you are good to go.
Then let the idiots end up screwing their own local repo, instead of doing some magic and making it easy to screw up upstream or someone else repo.
That's what UIs (whether CLIs or otherwise) for standardized workflows like git-flow are, IMO.
Some design decisions also shine through like "no branch is more important than any other branch" which is completely mental considering how people actually use git.
You don't. The reason is that you're using a tool that didn't budget the time to directly work on git data files and it uses the command line under the hood, because that's a hard business case to make for most small tools. This is not fundamental to git; the very top-end git-based tools like Github or Bitbucket all do their own internal, direct implementation of git functionality for this reason. It's not a characteristic of git, it's a characteristic of the GUI tools you're using.
A perfectly sensible one based on perfectly sensible engineering tradeoffs, let me add; no criticism of such tools intended. Git's internals from what I've seen are not particularly difficult to manipulate directly as such things go, but you are simply by the nature of such a thing taking on a lot more responsibility than if you use the command line-based UI.
I saw a web designer check in a huge hierarchy of empty directories which would be the structure of the new project that their team should work on. They were quite surprised when it didn't show up on any of the other designer's computers after a "pull". They had to go to the "Git guru" for help.
Windows and Mac both have directories as a major fundamental concept. Everyone knows them and is familiar with them. Subversion tracks directories. Git does not.
> Currently the design of the Git index (staging area) only permits files to be listed, and nobody competent enough to make the change to allow empty directories has cared enough about this situation to remedy it.
Of course git is also incredibly painful and brittle if you want "exclude/except" behavior on the gitignore involving subdirectories.
The assumption behind Git is, everyone develops on their machines and/or branches, and then things are merged. This only works for files which can be merged.
There are plenty of things pretty much any project wants to track which cannot be merged, for example Word documents (documentation), Photoshop files (source of graphics), PNGs (icons in webapps), and so on.
With a centralized system, that's easy, just go over to using file locks ("svn lock") for those files. With a distributed system, that's impossible.
That's a seriously hard problem for a DVCS if you're serious about the "D".
This topic turned into [the single longest thread in the history of the Fossil forum](https://www.fossil-scm.org/forum/forumpost/2afc32b1ab) because it drags in the CAP theorem and all of the problems people run into when they try to have all three of C, A, and P at the same time.
To the extent that Fossil based projects are usually more centralized than Git ones, Fossil has a better chance of solving this, but I'm still not holding my breath that Fossil will get what a person would naively understand as file locking any time soon.
> Word documents (documentation), Photoshop files (source of graphics), PNGs (icons in webapps), and so on.
You want to avoid putting such things into a VCS anyway, because it [bloats the repo size](https://fossil-scm.org/fossil/doc/trunk/www/image-format-vs-...). I wrote that article in the context of Fossil, but its key result would replicate just as well under Git or anything else that doesn't do some serious magic to avoid the key problem here.
Instead of Word files, check in Markdown or [FODT](https://en.wikipedia.org/wiki/OpenDocument_technical_specifi...). (Flat XML OpenDocument Text.) Or with Fossil, put the doc in the wiki.
Instead of PNG, check in BMP, uncompressed TIFF, etc., then "build" the PNG as part of your app's regular build process.
This has the side benefit that when you later change your mind on the parameters for the final delivered PNGs, you can just adjust the build script, not check in a whole new set of PNGs. My current web app has several such versions: 8-bit paletted versions from back before IE could handle 24-bit PNG, then matted 24-bit PNGs from the days when IE couldn't handle transparency in PNG, and finally the current alpha-blended 24-bit PNGs. It'd have been better if I'd checked in TIFF originals and built deliverable PNGs at each step.
Another fun option is to unzip the DOCX and check that in, since it is mostly a collection of XML files in a zip container. I built a tool to automate zipping/unzipping files like DOCX years ago as pre-commit/post-checkout/post-merge hooks.  It's an interesting way to source control some types of files if you can find a way to deconstruct them into smaller pieces that merge better. Admittedly, merging Office Open XML by hand is not a great experience (and dealing with subtly broken or corrupt internal contents is not fun, because programs like Word can fussy when things are even slightly wrong), but you get better diffs sometimes than you would expect.
How do you suggest projects like games handle this, where data files are naturally linked to source files? Imagine trying to sort out an animation bug when you only have source level tracking and no idea which version of the animation data corresponds to the animation source files of the bug report. These data files are not 'built' from the 'build' step as they are the product of artists.
I understand your sentiment, but the denominator in that fraction is probably much lower than your guess.
Consider even simple cases like the disconnected laptop case. You may work at a small office with only local employees, and so you have one central "blessed" repo, but if one person locks a file and then goes off to lunch, working on the file while at the restaurant, you still have a CAP problem:
CA: Because the one guy with a laptop went off-network, you have no full quorum, so no one can use the repo at all until he gets back and rejoins the network. (No practical DVCS does this, but it's one of the options, so I list it.)
CP: When the one guy went off to lunch, we lost the ability to interact with his lock, and that will continue to be the case until he gets back from lunch. Also vice versa: if someone still at the office takes out a lock, the guy off at lunch doesn't realize there is lock, so he could do something bad with the "locked" file. (This is the mode DVCSes generally run in by default.)
AP: No locking at all, thus no consistency, thus your original problem that inspired the wish to have file locking.
Everyone else will only see that lock if they fetch and if they don’t, they might edit their local copy of fileX too, but would be prevented from pushing their version to the blessed repository by the lock. They can push a copy under another name, or wait until I have removed the lock (but probably can’t resolve the conflict anyway because it’s likely a binary document). So they user will remember to never start editing without taking the lock in the future.
It’s not perfect by any stretch of the imagination but it’s all anyone asks for in terms of file locking. It’s what Subversion always did.
And if you go on vacation for two weeks instead?
An admin can remove the lock. Or you can allow force-pushing by anyone to replace it or whatever.
Not sure why this is seen as so complicated, version control systems have done it since forever. It’s not trying to solve some distributed lock system in a clever way. It’s dumb centralized mutex per file. And yet again this is all that’s needed (and it’s also added to git in git-LFS!).
More importantly, if you've got stuff in your decentralised repo that shouldn't be decentralised, that's not the fault of the DVCS you're using, it's your fault. That everything looks like a nail does not speak against the value of a hammer.
True, but there are inevitably some files which still cannot be merged, so the problem remains.
> More importantly, if you've got stuff in your decentralised repo that shouldn't be decentralised, that's not the fault of the DVCS you're using, it's your fault.
Indeed, if you want to store the history of your files - the whole software including the icons it uses, so that you can go back to any previous version and build it - and you chose a DVCS like Git, I would agree the fault was yours.
That's basically what I was arguing, that Git is the wrong choice if you have any binary assets like icons (even if those assets have small filesize) due to the lack of locking, sorry if I was unclear.
Git LFS should be used instead. Or storing a Sha256sum and putting the file elsewhere.
There is no one that would want distributed binaries in git. But people also don’t want to switch from git to something else just because they have a 100GB or 10TB repo. Tooling (build tools, issue management) everywhere has decided that git is all that’s needed.
Not putting binaries in git isn’t a solution at all. Binaries are part of the source in many applications (e.g game assets, web site images...). Distributing every version of every binary to everyone is also not a solution.
Or use something like: git log --all --graph --oneline
One tip for "zigzag spiderweb" is to always rebase your topic branch to the target branch prior to a fast-forward merge to the target branch (e.g. master). To clarify: while in your branch topic/foobar: "git rebase master", "git checkout master", "git merge --ff-only topic/foobar".
(There's surely a clever shorthand for the above procedure but when it comes to the command line, I like to combine small things instead of complicated memorized things, it's some kind of Lego syndrome)
Also with dozens of tiny commands but only a handful of actual desired outcomes, the high operations should be explicit commands. E.g “rebase this branch on master and then squash it and commit on master”.
A lot of the local/remote could also be hidden. The number of times I want to rebase on my local master which is behind origin by 2 commits is... zero.
At least in my experience, the interface makes a lot more sense if you understand the underlying data structure, which does have a certain elegant simplicity. (Even if it doesn't work quite the same as traditional source code control systems. Failing to work with directories is a problem of the git approach. Having a nice offline story is a distinct advantage.)
> And personally I prefer a VCS with less ways to shoot myself in the foot than git.
Oddly, the thing I love about git is how easy it makes it to recover from mistakes. Even if there are more ways to shoot yourself in the foot, there are also more ways to put your foot back exactly the way it was before you shot it. (If only real life worked that way!) This is what the immutable content storage under the hood of a git repository gets you.
If you know the commit hash (and there are a bunch of ways to easily keep track of these), you can get back to the state that's represented by that hash. Commands like merge/rebase/cherry-pick make this particularly easy by providing an '--abort' option that means "I've screwed this operation up beyond repair and need to bail out." And the abort works. As long as you had your target state committed, you can get back to it. (And if that's just a transient state that you don't want to persist, it's easy enough to squash it into something coherent.)
Except that I don't have to understand the underlying data structure to use a more basic VCS like Mercurial. What makes git so special that I would have to do that before being able to use it?
And for recovery from mistakes, I meant stashing the changes somewhere, deleting the repository and downloading a clean copy to start again, which I had to do a few times with Git and never with Mercurial (I might had to do it once or twice with SVN, though).
I don't think it is special. Generally after a while using a given tool, library, etc. I find it useful to dig in a bit and see what's happening under the hood to help understand why it works the way it does. git just happens to be the tool under discussion at the moment.
> And for recovery from mistakes, I meant stashing the changes somewhere, deleting the repository and downloading a clean copy to start again, which I had to do a few times with Git and never with Mercurial (I might had to do it once or twice with SVN, though).
I think we're talking about the same sort of mistakes. It's hard for me to imagine a case where you'd need to blow away a local git repository entirely. Worst case scenario, there should be good refs available in a remote that are just a 'git fetch' away. (If there's no remote, then blowing away the local repo is essentially just starting from scratch anyway.)
Bitkeeper is in fact open source now, BTW. Too late, but it is.
What does that mean in concrete terms? What are the failures you're seeing with git that you weren't with bk? How long has your team used git? bk?
Yeah. Take an afternoon to read through gittutorial(7), gittutorial-2(7), and gitcore-tutorial(7). Git is a tool, and just like any other tool (car, tablesaw), you will be much better off if you take the time to learn to use it properly. Once you see "The Matrix" behind Git, it becomes an incredibly easy to use and flexible tool for managing source code and other plaintext files.
They're just examples of tools.
> Mercurial was incredibly easy to use nearly right out of the gate, not after an afternoon of work.
I talk about this elsewhere in this thread, but I disagree with this assertion. I find Mercurial baffling and Git very elegant, though it could be an artifact of the order in which I learned the tools.
It doesn't. It just works better when you take the time to learn how it works. (Which is an experience I commonly have with the tools I use, for whatever that's worth.)
Buddy had a teammate that almost force pushed references from a slightly different repo. What a mess that could have been! I agree regarding the usefulness of reflog, and think the complaints about messing things up with rebase, reset, ect are overblown. It really isn't an issue for intermediate users.
I don’t see the capability to force push as a negative. There are situations in which it’s necessary, like forcibly removing history (something I had to do just today).
Git gives you the ability to shoot yourself in the foot, so it’s up to the operator to not make a mistake like that without backing up the repo to a different place first, etc. Something something only a poor carpenter blames their tools.
I don't think much of your all-in-one solution like fossil - that's a competitor for GitHub (without the bits that make GH good), not git.
I tried to use hg at one point in the early days, and found it much slower than git. Git's low latency for commands made a substantial difference, perceptually. In principle I think git encourages too much attention to things like rebases, which fraudulently rewrite history and lie about how code was written, just so the diff history can look neater. Working code should be the primary artifact, not neat history, and rebases and other rewrites make it too easy to cause chaos with missing or duplicated commits in a team environment. So ideologically, mercurial is a better fit, but that's not enough to make me use it.
Fit is a function of an environment; when we say survival of the fittest, we mean fitness as adapted to an environment. Feature set isn't the only aspect; at this point, the network effects of git are insurmountable without a leap forward in functionality of some kind.
(I think git & hg are just as elegant as one another; to me, the elegance is in the Merkle tree and the conceptual model one needs to operate the history graph.)
What makes it the case that fossil cannot be a competitor to git (or hg), in that they are both a vcs?
edit I haven't had a lot of sleep. What I'm trying to ask, I suppose, is why can't you use fossil just like git and ignore any all-in-one features it provides? (This is not to comment on how good, scalable, fast, correct, or robust it is.)
This is why Subversion was "slow": your local working speed was gated by the speed of the central repo, which could be slow if it was under-powered or overloaded, as was common with the free Subversion hosts of the day. At least with Git, you can batch your local changes and push them all at some more convenient time, such as when you were going off for a break anyway.
I work with a group of people who all know enough git that we're productive, and a few of us know enough git to solve complicated problem.
I've not seriously considered fossil or mercurial -- what are the top three tangible benefits I'd get from them getting our team to switch?
The main advantage Mercurial has over git is a command line syntax that makes consistent sense. The operations you want to do are easy and as you try and do more complicated things, the new commands will be unsurprising and predictable. If you already know how to use git then this advantage is (mostly) irrelevant.
There are some other features that are interesting - Mercurial has a couple of different types of branches. Bookmarks are like git branches, whereas named branches are a completely different concept which can be useful. 'Phases' tracks whether commits have been shared, and prevents you rewriting (rebasing) them when appropriate.
If you do experiment, note that many 'power user' features are turned off by default. There is a robust extension system, and the default mercurial installation includes a load of standard ones. My config file includes the following to turn on some useful stuff ('record' is the most useful for a staging area like facility):
But stuff like: "hg log" gives you _every commit in the repo_?? When is that ever useful? How do I get only the commits that lead to the current state of the repo? Mercurial doesn't have branches; instead you're supposed to _copy the whole directory_ at the filesystem level?? Of course this is ridiculous, so they invented "bookmarks" which are actually Git branches. The extensions thing you mention is also a ridiculous chore. Just have sane defaults. I also found hg's output very dense and hard to understand and read, poorly suited for human consumption.
I dunno. I'm sure Mercurial is fine, many people use it every day, and likely my strong Git bias was affecting my ability to learn Mercurial. But I found it far easier to just clone into Git, use Git to do source control, and then export back to Mercurial when I'm ready to share my work.
The 'original' branching method for Mercurial is called Named Branches. The big difference with Git is that every commit is labelled with what branch it is on. This has advantages - if you imagine looking at the train track of 'master' in git with it's divergence for a few commits and then merge, you can see that the 3 commits were on a branch called 'performance', whereas with git that history is completely lost. See: https://www.mercurial-scm.org/wiki/NamedBranches
As usage of git grew, the git branching model gained popularity and so the Mercurial bookmarks extension was created (https://www.mercurial-scm.org/wiki/Bookmarks).
It can be seen as a downside that there are two branching options that you have to choose between.
 Sadly the popular hginit.com seems dead, this was my first introduction to Mercurial. https://web.archive.org/web/20180722012242/http://hginit.com...
I mainly use fossil for personal projects.
Whats nice about it is that it not only is a very capable VCS but also a complete project management tool with tickets/issues, wiki, blog, mailing list and user management. The setup is ridiculously easy and everyone always has everything in the repository.
In addition fossil never looses data, unlike git which can easily destroy branches that are not pushed, delete stuff while stashing or unstashing, delete stuff when rebasing and so on.
And fossil has a sane command-line interface so that everyone in the team is expert enough to work with it. No need for heroes that save the day from git fricking everything up.
That is not nice. That is way more things that might not match me, more attack surface, more irrelevant cruft I'll probably have to look up how to disable. Project management, wiki and issue tracking preferences are very personal and often don't map particularly well to specific repositories. And _blog_ and _mailing list_? Why, you're spending time on stuff most of your users will hate, not because it's bad, but because they either don't need it or would like it different.
> In addition fossil never looses data, unlike git which can easily destroy branches that are not pushed, delete stuff while stashing or unstashing, delete stuff when rebasing and so on.
Which is why Git is successful. That's by design, not accident. We want to, and sometimes _have to_, delete stuff.
That seems like feature creep.
For example, if I have a check-in comment "Fixes [abcd1234]" I get an automatic link from that check-in comment to ticket abcd1234 from the web UI's timeline view. If I then close that ticket, the comment in the timeline view is rendered in strikethrough text, so I don't have to visit the ticket to see that it's closed.
Similarly, a built-in forum means the project's developers can discuss things with easy internal reference to wiki articles, tickets, checkins...
A recent feature added to Fossil is the ability to have a wiki article bound to a particular check-in or branch, so that whenever someone views that artifact in the web UI, they get a link to the ongoing discussion about it. This is useful when you have more to say about the check-in or branch than can reasonably fit into a comment box. This solves a common problem with experimental features, where you want to discuss it and evolve the idea before it's merged back into the parent branch.
Fossil's user management features are also highly helpful.
These features are as seductive as what Github, GitLab, BitBucket, etc. add to Git, but whereas those are all proprietary services with some of the roach hotel nature to them, with Fossil, you clone the repo and now you've got all of that locally, too. If the central repo goes down, you can stand your local clone up as a near-complete replacement for it.
It's not 100% because Fossil purposely doesn't clone a few things like the user table, for security reasons. You can build a new user table from the author names on the check-ins, though.
I have created a lot of feature branches that contain useless commits which I then later corrected with a simple git merge --squash. Preserving those commits sounds like a drag.
We cover this and more in the Fossil project document "Rebase Considered Harmful": https://fossil-scm.org/fossil/doc/trunk/www/rebaseharm.md
If you're like me, you'll find yourself increasingly wonder, "Why would I put up with Git any time I'm not forced to by some outside concern?"
I love git, and don't know most other post-SVN version control systems, but I do recognise the complaints people have about git. There's clearly still room for improvement.
To use git you need to know clone, pull, commit, push. For larger projects branch and merge. Those fall into a lot of boxes that say "easy" or "elegant," and I really wouldn't hesitate to recommend git to a lot of projects, big or small, discounting specific needs, but I guess you've got some specific concerns that really don't translate well into simple statements.
I've used mercurial only to get some external requirements or tools going, and never used fossil. Could you elaborate a bit on why git is worse than either of them and why I should consider switching ?
What if you committed to the wrong branch? What if you tried to merge commits from another user and made a mess of it all? What if you pushed something you want to roll back? What if you committed with the wrong commit message and want to fix it? What if you followed the policy of "commit often" but ended up with lots of irrelevant commits, and want to fix this so that it only has meaningful commits. How can you find who committed what? Or which branches contain a commit?
I know how to do all of this. But these are genuine questions a user of git will need to get answered, and git quickly becomes confusing/inconsistent once you're off the "happy path".
One of the major day to day annoyances is the fact that (by default) I can't work on multiple branches without committing or stashing all the time, since switching branches, instead of being a simple 'cd' like in other VCSs is instead a destructive operation on my entire repo (also causing re-indexing for all content-aware tools...). And if I want the normal behavior? I need to learn some other commands to set that up, and learn their filesystem restrictions...
This implies that the parent's sentiment about git is negative.
At least I personally feel very positively about git and am not missing much from it.
No, I meant defeatist as in "I have given up finding something better as it will never exist".
Software development has seen massive improvements in the past 20 years. I see no reason why that would stop now.
The answer to "Why doesn't my music sound as good as I wanted?" isn't going to be "CD's 44.1kHz and 16-bit PCM isn't enough". It might be "This cable has been chewed by a dog" or "These speakers you got with a cheap MIDI system in your student dorm are garbage" or even "the earbuds you're wearing don't fit properly" but it won't be the 44.1kHz 16-bit PCM.
Likewise, it is plausible that Git is done technology-wise. That doesn't mean there won't be refirements to how it's used, recommended branching strategies, auto-complete, or even some low-level stuff like fixing the hash algorithm - but the core technology is done.
Yeah, try telling this to a fan of 1960s or 1970s rock. You'll get an earful about rich guitars and fat synths, which only a 100% analog, tube-amp process from studio to ear is capable of replicating.
And anyone with a basic understanding of electronics should laugh in the faces of these people. The idea that a signal can be carried on a wire or recorded on tape but can not be replicated digitally is absolute nonsense.
If someone wants to claim that their preferred format captures higher frequencies than 44.1kHz sampling allows for, that's at least plausible, but that can be solved by using higher sampling rates like 96 or 192 kHz. At that point you've exceeded the capabilities of all mainstream analog storage media.
If they are looking for specific effects created when pushing the limits of analog hardware, like the "crunch" of a tube amp, that's fine too, but they need to acknowledge that they're treating the amp as an instrument in that case and its output can still be recorded digitally just fine.
In the same way, learning is not a transient state either. You will always have to relearn. Those impossible barriers that you eventually got through will reduce to speedbumps - but they will always be there, slowing you down. And if you don't use it enough, you'll have to relearn.
Also, be aware that once you've climbed a learning curve, at least unconsciously you are no longer incentivized to simplify it for those who come after. Why reduce the barrier-to-entry for others, after all? You got through it, so why can't they? And this is why generations of kids learn bad music theory, and generations of physicists learn bad particle names. It's important to be aware of this effect so you can counter-act it.
The only reason I'd disagree is if the next source code control system were somehow as much an improvement over git as git was over its predecessors.
> if it can save me the hassle of learning a new one.
Not to mention the hassle of converting all those legacy repositories, converting CI/CD, etc.
Yes it’s slow for large projects but honestly I just deal with that.
Agreed. Technology should converge on a best solution so we can stop chasing things and get work done. Stable open source standard solutions are what we need more of.
Some anti-pattern examples are C++ and Vulcan.
Are you referring to the Vulkan API? If so, why do you see it as an anti-pattern example?
I like the API and I think it is a great and necessary improvement over OpenGL. I actually hope to see Vulkan become the ‘stable open source standard solution’ for graphics.
Thus editor wars, language wars...
Alternative: a tool like Fossil where the CLI is sensible from jump so the whole team doesn't replace it with something better, uniquely per team member.
Or you could just use a tool where the interface is fine out of the box
From where I sit, Git has a couple obvious flaws, and I expect its successor will be the one that fixes one of them. The most obvious (and probably easiest) is the monorepo/polyrepo dichotomy.
Personally I don't see git's problems with large binary files and tens of millions of commits as being major issues. Those two alone are way less valuable than git's ecosystem and mindshare.
Envy does versioning of classes and methods, and you can programmatically access to the model.
It allowed us to build tools around the VCS. For example, we had tools to merge multiple feature branches and resolve conflicts automatically.
We also used the same tools to produce migration scripts for our database (GemStone).
That was 18yrs ago! and today sounds irreal.
You can build tools on top of git, but the versioning “unit” gets in the way. (e.g imagine the possibility to encode refactorings in your change history and reapply or rollback them).
I’m not trying to criticize git. I think it is the best file based DVCS. My point is that many dev tools that we use today are extremely rudimentary, because the lack of good abstractions. And I don’t think that git provides a good model to build those abstractions on top of it.
Arbitrary diff/merge in Git is a great example of the Turing Tar-Pit. It's possible, but prohibitively inefficient for many things I want to do. You can't add your own types, index, or query optimizations.
Today, if I want to store data for my application, I have a choice between good support for rich object types and connections (e.g., Postgres), or good support for history and merging (e.g., Git). There's no one system that provides both.
I like the way you put this. In case anyone's interested in brainstorming I'm dabbling in this problem with a thing called TreeBase (https://jtree.treenotation.org/treeBase/). It's still a toy at this point, but it stores richly typed data as plain text files to leverage git for history and merging and then can use SQLite (or others) for querying and analysis. A very simple database in the wild looks like this: https://github.com/treenotation/jtree/tree/master/treeBase/p...
The slowness of destroying a whole database and then recreate it when checking out should be something you can handle by relying on the diff to generate a series of delete commands and a series of insert commands.
But yeah, I guess committing will be slow if you have a lot of data to export. For the time being, it's a trade off to be made.
[I might consider testing this with my current database project. But I'm using SQLite so I guess that implies a lot less data than Postgres.]
I could see myself agreeing to that.
Check out git-diff(1) and --diff-algorithm. --anchored is the one I find the neatest.
A modern but still centralized VCS like Subversion or Perforce is what you get if you first add networking (CVS) and then atomic commits. Without atomic commits you are pretty much forced to keep a centralized server, and Subversion didn't try to change the server model after adding atomic commits.
DVCS instead is what you get if you start with local revision tracking like RCS, and add atomic commits before networking. Now the network protocol can work at the commit level and is much more amenable to distributed development.
Imagine instead of that were available as a sort of materialized view.
I don’t understand what you mean by this, can you provide some more detail?
Saying just parse it every time is denying that there are very real costs associated with hat decision.
Otherwise it won't just be new languages which suffer, but users of supported languages will suffer when there's an upgrade.
Most language ASTs also don't encode useful-to-the-programmer but useless-to-the-compiler information like comments and whitespace. There's been good progress in that (the Roslyn AST system has some neat features), but in practice an AST is always intended more for the compiler than the user/source writer. This also is reflected often in speed, a lot of languages have a relatively slow AST generation (which would add sometimes very noticeable wall clock time to commit time, depending of course on language and hardware).
Plus, of course, all the usual bits that ASTs are extremely varied among themselves (some are weirder DAG shapes than trees, for instance).
An experiment I ran was to the use the "next step down" from full AST which is your basic tokenizer / syntax highlighter. Those designed to deal well with malformed/unfinished/work-in-progress input, and to do it very quickly. Years back I built a simple example diff tool that can do token-based diffs for any language Python's commonly used syntax highlighter Pygments supports.  In my experiments it created some really nice character-based diffs that seemed "smart" like you might want from an AST-like approach but just by doing the dumb thing of aligning diff changes to syntax highlighting token boundaries.
You could even use it/something like it/something based on it today as your diff tool in git if you wanted, with the hardest part configuring it for which language to use for what file. (I never did do that though, partly because the DVCS I experimented with this for didn't have a pluggable diff system like git does, nor did it support character based unidiff as a storage format which the experiment was partly to prove both ideas could be useful.)
If that was then extended into the version control system that'd be even better. Oh yes.
But getting a new language into these things would probably be a nightmare.
The monorepo/polyrepo discussion exists apart from your choice of version control system and has little to do with Git, as far as I can tell
Large files and long histories hinder its total dominance in the game and art industries. Because of git's shortcomings polyrepo is a near necessity not simply a stylistic choice. LFS is a bolt on solution that could/should have better support.
I'm intrigued by this claim. I've come to the opposite conclusion - that monorepo is near necessity with git because there's no tools for branching/rebasing multiple repos at once.
Which one you should use depends on which downsides are less impactful for your use case.
I would use it for storing and syncing libraries of large images for my photography, if it were feasible.
As an optimization, when it packs several objects together in a pack file, it can store objects as a delta to other (possibly unrelated) objects; there's a whole set of heuristics used to choose which objects to delta against, like having the same file name. And yes, one of these heuristics does have an effect similar to "reverse chronological order"; see https://github.com/git/git/blob/master/Documentation/technic... for the details.
Like gitlabs own web ide that seems to ignore LFS rules.
It would also be nice to have a repo that isn't language-agnostic. It's too easy to track non-semantic changes, like white space.
You are in the minority. It's such a ubiquitous experience, a running joke in the industry. Saying a tool is useful and powerful, is fine and good. That sentiment has nothing to do with usability.
After I had to unfuck a repository for the n-th time, I trialed a switch to Mercurial, we switched shortly after. I can count on one hand how many times I've had to intervene in the last few years.
In my opinion this is a problem with programming languages rather than version control. Namely we mix presentation and representation when using text as our source code. In the case of whitespace we have an infinite number of syntactic presentations which all correspond to the same semantic representation. Tooling has been created to try to deal with this such as code formatters which canonicalize the syntactic presentation for other tools. Git itself even has to deal with this because of platform differences, i.e. LF and CRLF.
I loathe this about git. It has caused my team members so many problems, like images marked as changed.
git should either be dumb about content or smart, not secretly in between.
And they do occasionally put some new stuff in that helps. Like the recent version which adds new commands to split out the two completely different uses for `git checkout` (making/switching branches and reverting files).
Subversion did this better even before Git existed.
Git is completely oblivious to moving code from one file to another. Git blame will never show you the original commit if you just relocated a method to another file. Due to this, refactoring often put additional hurdles into exploring the code history.
"Need a centralized server" wasn't specific to one (pre-DVCS) system, either.
i would love to be able to use a patch-based VC.
unfortunately there are no patch-theory based VCS’s with a practical level of usability. what git was to monotone, X is to darcs/pijul, where X hasn’t been created yet.
If the conflict is huge, rebasing can help you by "playing" the commits from one branch one at a time so the conflicts are smaller / easier to fix.
At a previous job, a team was forced to use checkout-style VCS due to their manager's unfounded fear of merge conflicts; I couldn't go in that office without hearing one developer shout to another: "Hey, can you finish up and check in that file so I can get started on my changes?"
I'd rather not lock files.
I don't think I need to sell the value of DVCS over VCS, but what seems to get lost is that buys you a certain amount of essential complexity, expressed in the CAP theorem and its consequences.
We discussed this deeply on the Fossil forum: https://www.fossil-scm.org/forum/forumpost/2afc32b1ab
We came to no easy answers, because there aren't any. You only get a choice of which problems to accept.
The alternate method is that locking a file marks you as an interested party to a merge, allowing you to review the correctness of a merge.
This would be purely to avoid changes being lost during merges.
This consideration is actually irrelevant to locking non-mergeable binary files. It doesn't matter what branch we're on or where the file is located, only that you and I both want to edit the logo. Eventually, either your version must be based on mine, or mine based on yours, since they will be merged.
So it's probably better not to have that file in Git, since it doesn't support the workflow around which Git is based.
It's actually right to store your design documents in Google Docs or a wiki and your code in Git, rather than everything in Git.
It is easy to have one filestore to rule them all and in the darkness bind them, but if you want to do different things with them, you have to do different things with them. I'm not sure that it's possible to unify text file and binary doc based workflows, but it seems we don't have to worry because users automatically use the best tool for the job and it's only hackers who tie themselves in knots trying to make git do everything.
The successor might very well be git 3.0, though.
Isn't that being addressed in Git with the partial clone functionality?
> The responsibility of not mixing super and sub-project code in commits lies with you.
Is a lie. The responsibility of not mixing code lies with every member of the team. Does the author work alone?
We have 600 devs and face these problems. I can assure you we sure as hell dont have the resources spare to reroll git. We're way too busy rerolling everything else.