Git's biggest flaw is that it doesn't scale. If a new system can fix that without sacrificing any of Git's benefits, I think it can topple Git.
It's ironic that Git was popularized in the same era as monorepos, yet Git is a poor fit for monorepos. There have been some attempts to work around this. Google's `repo` command is a wrapper around Git that treats a set of smaller repos like one big one, but it's a (very) leaky abstraction. Microsoft's GVFS is a promising attempt to truly scale Git to giant repos, but it's developed as an addon rather than a core part of Git, and so far it only works on Windows (with macOS support in development). GVFS arguably has the potential to become an ubiquitous part of the Git experience, someday... but it probably won't.
Git also has trouble with large files. The situation is better these days, as most people have seemingly standardized on git-lfs (over its older competitor git-annex), and it works pretty well. Nevertheless, it feels like a hack that "large" files have to be managed using a completely different system from normal files, one which (again) is not a core part of Git.
There exist version control systems that do scale well to large repos and large files, but all the ones I've heard of have other disadvantages compared to Git. For example, they're not decentralized, or they're not as lightning-fast as Git is in smaller repos, or they're harder to use. That's why I think there's room for a future competitor!
(Fossil is not that competitor. From what I've heard, it neither scales well nor matches Git in performance for small repos, unfortunately.)
I disagree that Git's biggest flaw is its lack of scalability. Cases where git needs to scale tend to be isolated to companies that have the manpower to build a finely-tuned replacement (see: MS, Google).
Git's flaws are primarily in usability/UX. But I think for its purpose, functionality is far more important than a perfect UX. I'm perfectly happy knowing I might have to Google how to do something in Git as long as I can feel confident that Git will have the power to do whatever it is I'm trying to do. A competitor would need to do what git does as well as git does it, with a UX that is not just marginally better but categorically better, to unseat git. (Marginally better isn't strong enough to overcome incumbent use cases)
And for the record: I think git-lfs issues are primarily usability issues, and tech improvements. The tech enhancements will be solved if there's enough desire, and as I mentioned the usability problems are more annoyances than actual problems.
A major limitation of git is how it deals with many "big" (~10Mb) binary files (3D models, textures, sounds, etc.).
We ended up developing our own layer over git, and we're very happy ; even git-lfs can't provide similar benefits. This technique seems to be commonplace for game studios (e.g Naughty Dog, Bungee), so certainly git has room for improvement here.
This does not surprise me. Git's original purpose of managing versions of a tree of text files (i.e. the source code of the Linux kernel) pervasively influences it, and I wouldn't expect it to be any good for working with binary files or large files.
If somebody comes up with something that matches Git's strengths and also handles binaries and biggies much, much better then they could definitely topple Git with it. It'd take time for the word to spread, the tools to mature and the hosting to appear, but I can definitely see it happening.
I think most people know that Git isn't perfect, but it's also the case that coming up with anything better is an extremely difficult task. If it wasn't, someone would have already done it. It's not like people haven't been trying.
Do you have tools that you can utilize diffs from your binary file changes? Or does a change simply just replace all the bytes.
I'd argue if it's the later, that git was never the right choice to begin with. You don't really want to record a full 10MB of data every time you change one pixel in your texture or one blip in your sound, right?
So I don't know if this is a "major limitation" of git per se. Not saying there's a better solution off-the-shelf (you're obviously happy with your home grown). But this was probably never a realistic use for git in the first place.
While I can't speak for the person you're replying to, the technology at least exists. Binary diffs are sometimes used to distribute game updates, where you're saving on bandwidth for thousands if not millions of players - which costs enough $$$ to actually be worth optimizing for. On the other hand, between simpler designs and content encryption being sometimes at odds with content compression... so is just sending the full 10MB. For a VCS - I'd probably be happy enough to just have storage compression - using any of the standard tools on the combination.
> You don't really want to record a full 10MB of data every time you change one pixel in your texture or one blip in your sound, right?
Actual changes to content in a gamedev studio are very unlikely to be as small as a single pixel. Changes to source code are unlikely to be as small as a single character either. And we definitely want a record of that 10MB.
We're willing to sacrifice some of our CI build history. Maybe only keeping ~weekly archives, or milestone/QAed builds after awhile, of dozens or hundreds of GB - and maybe eventually getting rid of some of the really old ones eventually. Having an exact binary copy of a build a bug was reported against can be incredibly useful.
> Sure, immutable build artifacts can be invaluable -- but aren't they also an orthogonal concern?
One person's immutable build artifact is another person's vendored build input.
It's common to vendor third party libraries by uploading their immutable build artifacts (.dll, .so, .a, .lib, etc.) into your VCS, handling distribution, and keeping track of which versions were used for any given build. It makes a lot of sense if those third party libraries are slow to build, rarely modified, and/or closed source - no sense wasting dev time forcing them to rebuild it all from scratch.
The next logical step is to have a build server auto-upload said immutable build artifacts into your VCS, for those third party libraries that you do have source code for, when your VCS copy of said source is modified. Much more secure and reproducable than having random devs do it.
And hey, if your build servers are already uploading build artifacts to VCS for third party libraries, why not do so for your own first party build artifacts too? Tools devs spending most of their time in C# probably don't need to spend hours rebuilding the accompanying C++ engine it interoperates with from scratch, for example, so why not "vendor" the engine to improve their iteration times?
This can lead to dozens of gigs of mostly identical immutable build artifacts reuploaded into your VCS several times per day, with QA testing and then integrating those build artifacts into other branches on top of that. The occasional 10MB png is no longer noticable by comparison.
I can sympathize with the game assets argument, but this problem is just the result of trying to stuff a square peg into the round hole.
Build artifact caching is a different problem from source control, with very different requirements:
1. As you mentioned, the artifacts tend to get huge.
2. The cache needs to be easy to bypass. From your example, it needs to be easy for the C++ engine devs to do builds like "the game but with the new engine" to test out their changes.
3. The cache needs to be precise, so you don't end up with mystery errors once it finally does trigger, or people wondering why their changes don't seem to apply.
4. The builds need to be exactly reproducible, so you don't end up with some critical package that only Steve Who Left 5 Years Ago could build (or Jenkins Node 3 That Just Suffered A Critical HDD Failure).
Git either doesn't care about or fails spectacularly for each of those points. In particular, #3 will be very confusing since there will be a delay between the code push and the related build push.
Nix[0] solves #2 and #3 by caching build artifacts (both locally and remotely[1][2][3]) based on code hashes and a dependency DAG (for each subproject or build artifact, so changing subproject X won't trigger a rebuild of unrelated subproject Y, but will rebuild Z that depends on X). It helps with #4 by performing all builds in an isolated sandbox.
#1 is solved by evicting old artifacts, which is safe as long as you trust #4. If the old artifact is needed again then it will be rebuilt for you transparently. Currently this is done by evicting the oldest artifacts first, but it could be an interesting project to add a cost/benefit bias here (how long did it take to build this artifact, vs the amount of space it consumes?).
Assets and code have mostly the same needs out of a version control system - diffs, history, control over versions, etc. - and there are version control systems which handle both adequately. That said, I'll grant git is quite focused on code version control specifically - and I would not dream of trying to scale assets into it directly.
> 1. As you mentioned, the artifacts tend to get huge.
This, admittedly, is more common with build artifacts. That said, I've hit quota limits with autogenerated binding code on crates.io, with several hundred megs of code still being in the double digits when better compressed by cargo than I can figure out how to compress with 7-zip.
And that's a small single person hobby project, not a google monorepository.
> 2. The cache needs to be easy to bypass
I need to bypass locally vendored source code frequently as well, to test upstream patches etc.
> 3. The cache needs to be precise, so you don't end up with mystery errors once it finally does trigger, or people wondering why their changes don't seem to apply.
Also entirely true of source code.
> 4. The builds need to be exactly reproducible, so you don't end up with some critical package that only Steve Who Left 5 Years Ago could build (or Jenkins Node 3 That Just Suffered A Critical HDD Failure).
Enshrining built libs in VCS is an alternative tackling of the problem. You might not be able to reproduce that exact build bit-for-bit thanks to who knows what minor compiler updates have been forced upon you, but at least you'll have the immutable original to reproduce bugs against.
> In particular, #3 will be very confusing since there will be a delay between the code push and the related build push.
It's already extremely common - in the name of build stability, including with git - to protect a branch from direct push, and have CI generate and delay committing a merge until it's verified the build goes green. By wonderful coincidence, this is also well after CI has finished building those artifacts - in fact, it's been running tests against those artifacts - so it can atomically commit the source merge + binaries of said source merge all at once. No delay between the two.
There are some caveats - gathering the binaries can be a pain for some CI systems, or perhaps your build farm is underfunded and can only reasonably build a subset of your build matrix before merging. Or perhaps the person setting it up didn't think it through and has set things up such that code reaches a branch that uses VCS libs before the built libs reach the same spot in VCS - I'll admit I've experienced that, and it's horrible.
Nix, Incredibuild, etc. are wonderful alternatives to tackle the problem from a different angle though.
To be fair, I mostly don't fault git for failing to optimize that far, even if there are alternatives that do. That's far enough outside the core use case for myself and those I know that I'd be willing to sacrifice it for other, more important considerations.
But I'm totally willing to fault git for failing to optimize enough to handle the manual commit cadence of source game assets though. Because that's not just a tertiary use case - frequently for coworkers it's their primary use case. The end result is I mostly only use git for personal hobby stuff, where it's a secondary use case and my assets are sufficiently small as to not cause problems.
I kind of phrased that poorly. I should have added the context of "in git". Saving a new 10MB file every time you change it, as per my original premise, is not something that git was really designed for. It's asking a screwdriver to do the work of a hammer.
I totally get the use case of saving each iteration of that 10MB file _somewhere_. But expecting git to do that job is not the right level of expectation, was my main point.
When I have worked with binaries like that described, I will place a URI reference to a file hash and have something that knows how to resolve it. A file store (think S3 or whatever) that has files named: texture1.dat-[sha1] and change the reference to the file in the source. e.g. a "poor man's" version control by way of file naming conventions. Does this approach work in your world?
Aren’t game studios and other creative studios meant to use “asset management” systems instead for their large binaries?
Diffing a PSD as a binary is impossible - whereas proper asset management tools will deconstruct the PSD’s format to make for a human-readable diff (e.g. added/removed layers, properties, etc).
Separate version control for code vs assets leads to a world of pain. Also you can use whatever diff tool you want; doesn't have to be the built-in textual diff.
Jup, I experienced this too (not at a game studio though, and the team I worked with wasn't nearly experienced enough to write a layer over git). When we switched to a new version of the git gui it would stop working because when you click through the GUI, it would perform some git operations that were supposed to run fast. I filed an issue that quickly got shot down with 'wontfix, your repo is too large and git is not for binary files'.
Has any studio open sourced what they've built? Or turned it in to a product? It seems there could be an opportunity to do something before git solves the problems for that use case.
> Cases where git needs to scale tend to be isolated to companies that have the manpower to build a finely-tuned replacement (see: MS, Google).
I constantly run into git scalability issues as an individual. I don't use any of the UI clients because they all fail hard on mostly-code git repositories. I abandoned my VisualRust port in part because the mere 100MB of mingw binaries involved for that meant it was using github LFS, which meant CI was hitting github quota limits, and as I wasn't part of the organization - nevermind an admin with billing rights - I couldn't even pay to up said quota limits paying out of pocket myself even if I wanted to.
I'm not going to inflict git's command line experience - which confounds and confuses even seasoned programmers - on any of the less technical artists that might be employed at a typical gamedev shop, even if git might be able to scale acceptably if locally self-hosted at a single-digit employee shop.
A few dozen or hundred employees? Forget it. Use perforce, even though it costs $$$, is far from perfect, and also has plenty of scaling issues eventually.
The whole reason git lfs exists is to workaround git scalability problems. Its raison d'etre is problems with git.
That one of - if not the - most popular tool to solve said git scalability problems, also has scalability problems in practice, is both ironic - and absolutely a problem with the git ecosystem. To be pithy - "Even the workarounds don't work."
"Technically", you might say, "that specific symptom with git lfs, and that service provider, isn't the fault of git the command line tool, nor the git protocol". And you would be technically correct - which is the best kind of correct.
But I don't think we're referring to either of those particularly specific things with "Git" when we ask the article's question of "Is Git Irreplacable?". I'm already the weirdo for using git the command line tool - most of my peers use alternative git UI clients, and I don't mean gitk. The git protocol is routinely eschewed in favor of zips or tarballs over HTTPS, Dropbox, Sneakernet, you name it - and is invisible enough to not be worth complaining about to pretty much every developer who isn't actively working on the backend of a git client or server. Not to mention it's been extended/replaced with incremental improvements over the years already.
So I'm using a slightly broader definition of "git", inclusive of the wider ecosystem, that allows me to credit it for the alternative UI clients that do exist, rather than laughing off the question at face value - as something that has already been replaced.
Nothing about your problems had anything to do with git & everything to do with the commercial service you were using for your source code hosting.
Github the company is not interested in providing you (or anyone else) with free storage for arbitrary data. You were unable to pay for the storage options they do provide because you did not have admin rights to the github account you wanted to work with.
None of this is a problem with git, be it GUI git clients or command line ones.
This isn’t just "technically correct". It’s the "a commercial company doesn’t have to provide you with a service if they don’t want to" kind of correct.
> Nothing about your problems had anything to do with git & everything to do with the commercial service you were using for your source code hosting.
All the commercial service providers recommend keeping total repository sizes <1GB or so, and I hear nothing but performance complaints and how much they miss perforce from those who foolishly exceed those limits, even when self hosting on solid hardware - which is 100% the fault, or at least limitation, of git - I believe you'll agree.
LFS is a suggested alternative by several commercial service providers, not just one, and seems to be one of the least horrible options with git. You're certainly not suggesting any better alternatives, and I really wish you would, because I would love for them to exist. This results in a second auth system on top of my regular git credentials, recentralization that defeats most of the point of using a DVCS in the first place, and requires a second set of parallel commands to learn, use, and remember. I got tired enough of explaining to others why you have a broken checkout when you clone an LFS repository before installing the LFS extension, that I wrote a FAQ entry somewhere that I could link people. If you don't think these are problems with "git", we must simply agree to disagree, for there will be no reconciling of viewpoints.
When I first hit the quota limits, I tried to setup caching. Failing that, I tried setting up a second LFS server and having CI pull blobs from that first when pulling simple incremental commits not touching said blobs. Details escape me this long after the fact - I might've tried to redirect LFS queries to gitlab? After a couple hours of failing to get anywhere with either despite combing through the docs and trying things that looked like they should've worked, then I tried to pay github more money - on top of my existing monthly subscription - as an ugly business-level kludge to solve a technical issue of using more bandwidth than should really have been necessary. When that too failed... now you want to pin the whole problem on github? I must disagree. We can't pin it on the CI provider either - I had trouble convincing git to use an alternative LFS server for globs when fetching upstream, even when testing locally.
I've tried gitlab. I've got a bitbucket account and plenty of tales of people trying to scale git on that. I've even got some Microsoft hosted git repositories somewhere. None of them magically scale well. In fact, so far in my experience, github has scaled the least poorly.
> Github the company is not interested in providing you (or anyone else) with free storage for arbitrary data.
I pay github, and tried to pay github more, and still had trouble. Dispense with this "free storage" strawman.
> You were unable to pay for the storage options they do provide because you did not have admin rights to the github account you wanted to work with.
To be clear - I was also unable to pay to increase LFS storage on my fork, because they still counted against the original repository. Is this specific workaround for a workaround for a workaround failing, github's fault? Yes. When git and git lfs both failed to solve the problem, github also failed to solve the problem. Don't overgeneralize the one ancedote of a failed github-specific solution, from a whole list of git problems, to being the whole problem and answer and it all being github's fault.
> None of this is a problem with git, be it GUI git clients or command line ones.
My git gui complaints are a separate issue, which I apparently shouldn't merely summarize for this discussion.
Clone https://github.com/rust-lang/rust and run your git GUI client of choice on it. git and gitk (ugly, buggy, and featureless though it may be) handle it OK. Source Tree hangs/pauses frequently enough I uninstalled, but not so frequently as to be completely unusable. I think I tried a half dozen other git UI clients, and they all repeatedly hung or showed progress bars for minutes at a time, without ever settling down, when doing basic local use involving local branches and local commits - not interacting with a remote. Presumably due to insufficient lazy evaluation or insufficient caching. And these problems were not unique to that repository either, and occured on decent machines with an SSD for the git UI install and the clone. These performance problems are 100% on those git gui clients. Right?
> This isn’t just "technically correct".
Then please share how to simply scale git in practice. Answers that include spending money are welcome. I haven't figured it out, and neither has anyone I know. You can awkwardly half-ass it by making a mess with git lfs. Or git annex. Or maybe the third party git lfs dropbox or git bittorrent stuff, if you're willing to install more unverified unreviewed never upstreamed random executables off the internet to maybe solve your problems. I remember using bittorrent over a decade ago for gigs/day of bandwidth, back when I had much less of it to spare.
> It’s the "a commercial company doesn’t have to provide you with a service if they don’t want to" kind of correct.
If it were one company not providing a specific commercial offering to solve a problem you'd have a point. No companies offering to solve my problem for git to my satisfaction, despite a few offering it for perforce, is what I'd call a git ecosystem problem.
I'm conflating at most one github specific issue (singular), not "issues". And I'm doing so because it's at best a subproblem of a subproblem of a subproblem.
If my computer caught fire and exploded due to poor electrical design, you wouldn't say "nothing about your problems had anything to do with your computer and everything to do with the specific company that provided your pencils" when in my growing list of fustrations I offhandedly mentioned breaking a pencil tip after resorting to that, what with the whole computer being unavailable and all. That would be weird.
Even if we did hyper focus on that pencil - pretty much every pencil manufacturer is giving me roughly the same product, and the fundamental problem of "pencils break if you grip them too hard" isn't company specific. It's more of a general problem with pencils.
Github gave me a hard quota error. Maybe Gitlab would just 500 on me, or soft throttle me to heck to the point where CI times out. Maybe Bitbucket's anti-abuse measures would have taken action and I'd have been required to contact customer support to explain and apologize to get unbanned. git lfs's fundamental problem of being difficult to configure to scale via caching or distribute via mirroring isn't company specific. It's more of a general problem with git lfs. Caching and mirroring are strategies nearly as old as the internet for distribution - git lfs should be better about using them.
It would've turned github's hard quota error into a non-event, non-issue, non-problem - just like they are with core git. Alternatively, core git should be better about scaling. Or, as a distant third alternative, I could suggest a business solution to a technical problem - GitHub should be better about letting me pay them to waste their bandwidth. Then I could workaround git's poor scaling for a little bit more, for a bit longer.
Not waiting to provide you with free storage is not a "scalability problem". I can't spend company money on Perforce either, is that a Perforce problem?
I pay for a github subscription. I set out to pay more for a github quota bump, but found I was limited by upstream's LFS quota rather than my fork's LFS quota.
> Cases where git needs to scale tend to be isolated to companies that have the manpower to build a finely-tuned replacement (see: MS, Google).
I recollect that for Windows (which also uses git), MS have actually extended git with "Git Virtual File System" rather than replace it[1]. But I do agree that broadly, not everyone needs the scale.
Scaling isn't even just about number of files or size of them. A problem I've hit is just in having cross-repo stuff work well. Monorepos are helpful partly because git submodules are not ideal for good workflows, and splitting stuff across multiple git repos can backfire (it doesn't help that almost all the tooling around CI and the like is repo-based instead of project based).
I would love a layer over Git to handle workflow issues related to multi-repo projects
> I disagree that Git's biggest flaw is its lack of scalability. Cases where git needs to scale tend to be isolated to companies that have the manpower to build a finely-tuned replacement (see: MS, Google).
I would say that the sole thing git was developed for, the Linux Kernel, is (starting to be) painful to work with when using git.
The Linux Kernel is big, but it's not likely as big (in terms of lines of code or pick your metric) as Google or Microsoft repositories. Maybe the kernel is just starting to feel that pain?
Honestly asking.. Do you speak from some level of authority that the Linux kernel is stretching the boundaries of git? Or are you just saying that more speculatively? What is the painful part?
Maybe I am a weirdo but I have always thought that git's UI is very intuitive (with some exceptions like sub modules). SVN on the other hand was an unintutive mess where I had to look up commands all the time.
Agreed - I had been looking around this thread like, "<slow blink> - surely I'm not the only one that finds git to be a rewarding exercise in teamwork?"
The magic sweet spot might be the fact that most projects to not need to be distributed. This is where a lot of complexity is derived.
So without all those extra concerns - and - a more elegant UI framework (i.e. rational commands) - and possibly something that scales a little better. That's enough mojo to unseat git for a lot of things.
I'd say the number of git repos on Earth that would encounter problems of that nature would be a vanishingly microscopic minority. Sure, it's a problem for those companies but it's not a problem for anyone else.
All of the organizations that have outgrown git will have such incredibly specific requirements meaning nothing but a custom built tool will work for them.
Let's say you started with a well factored set of code that is managed within your organization. What advantage is there to having multiple repos if you're not limited by your tools? Refactoring is easier within a single repo...
In my experience, code doesn't stay well factored unless there are technical hurdles that keep it so. That of course doesn't have to be a repo boundary, but in can be.
There probably will be a plethora of different hard issues to fix in such situations.
It's also easier to institute change in a dictatorship as opposed to a democracy (being a dictator that is :).
This reads to me as a failure of imagination. Any mid size game development shop is going to feel this pain - not just giants like Microsoft and Google. I believe the Unity Game Engine has a user base in the millions? Even a subset of that may be small in comparison to the entire developer population but by no means microscopic.
"Barney Oliver was a good man. He wrote a letter one time to the IEEE. At that time the official shelf space at Bell Labs was so much and the height of the IEEE Proceedings at that time was larger; and since you couldn't change the size of the official shelf space he wrote this letter to the IEEE Publication person saying, since so many IEEE members were at Bell Labs and since the official space was so high the journal size should be changed."
What is the analogy here? The first guess that came to my mind was the monorepo versus multi-repo debate: since Git can only support repos that are so large (shelf space) without getting slow, you should split up your repos (journal size) even if semantically you would prefer a monorepo. But that would support the point I was making, whereas the obliqueness of your reply makes me think you probably meant to criticize it.
I think he's comparing the journal to the tool (harder to change, impacts everyone) and the shelf to the problem that only impacts a few organizations but actually a lot of people because those organizations are so large.
I guess that makes sense. But if that's the analogy, there's a significant difference between the situations. In that example, there was nothing inherently wrong with the journal's size, other than it not matching Bell Labs' arbitrary choice of shelf layout. Git, on the other hand, would be inherently a better tool if it had better performance on large repos (without sacrificing its suitability for small repos).
Mercurial is probably that competitor. Only slightly slower than Git. Works on very large monorepos (as large as Facebook's or Google's monorepo). Very similar workflow as compared to Git, with some minor differences in terminology.
As an FB employee, I use hg regularly (because it is required). I would not use it as a git replacement for non-FB-sized repos. It has some weird design choices (e.g. branching is bad), and it very often requires manual intervention for merges that git performs correctly and automatically.
You can get around branching-is-bad by changing your workflows a bit, but you can't get around the bad merges: over time it's like death by a thousand papercuts.
What is so bad about mercurial branching? The underlying structure is the same as git: a directed acyclic graph, the only real difference is how branches are named.
Mercurial has 3 ways of doing branching:
- bookmarks: these are like git branches, a pointer to a revision
- branches: when you are in a branch, all commits are permanently affixed with that branch name. Less flexible than bookmarks (and therefore git branches) but good for traceability
- heads: unlike with git, a branch name can refer to several actual branches, it usually happens when you are pulling from a central repository, but you can create them yourself if you need some kind of anonymous branching. These can be pushed but it is not recommended.
Git only has the first option.
The way central repository are managed is also a bit different even if the fundamentals are the same. Git has the "origin" namespace to distinguish remote branches from local branches. Mercurial uses a "phase" which can be "public" (in remote), "private" (local only, will become "public" after a push) and "secret" (like "private", but will not be pushed and therefore will not become "public"). So if you are not synchronized with the remote, in git you will have two branches: origin/my_branch and my_branch, in mercurial, you will have two branches named my_branch, one public, one private. That's essentially the same thing, presented differently.
In the end, they are fundamentally the same. The feel is different though. Git is flexible, and gives you plenty of tools to keep things nice and clean when working with large, distributed project. As expected for something designed for the Linux kernel. Mercurial focuses on preserving history, including the history of your mistakes, and I feel it is better suited for managed teams than a loosely connected community.
Specifically I meant that branching is less flexible. Bookmarks are better than Mercurial branches (and at FB it's what we use instead), but even with bookmarks there are gotchas compared to git. For example:
* Pushing them is (slightly) more annoying than pushing git branches — you need a separate command, whereas `git push` just does the right thing by default
* Deleting them doesn't delete the corresponding commits
* There is only one global namespace shared across all remotes
What kind of merges does git handle that hg doesn't? If it's just a matter of figuring what goes where, someone that uses hg daily could copy the implementation from hg. It could be a big organization that uses daily for instance.
Paper cuts can be addressed with more users reporting bugs and contributing fixes. The fundamental design issues with git that prevent scalability cannot.
As a Google employee I use hg every day, even though it's not required. (Some teams at Google do mandate its use, but these are few and far between.) I don't use branches, but I use bookmarks. I didn't notice any merges that really ought to be performed automatically but were not; in any case I use Meld to resolve merge conflicts and it's easy enough to do occasionally.
For most people, avoiding thousands of papercuts is better than scaling massively. Few people need massive scale, but everyone hates papercuts.
I'm also not certain that the "fundamental design issues" with git are truly fundamental to its design. For example, partial clones and sparse checkouts are seeing increasing support in recent versions of git — and those are really all you need.
You can always strip a bad committed merge, abort a bad uncommitted one, and perform it again (maybe with different tooling).
Normally mercurial stops when there are conflicts it cannot resolve reliably. In that cases, have a try at kdiff3: it handles hairy merges quite well. In a lot of cases even automatically (and correctly).
There is always meld, but I'd say kdiff3 is superior wrt merge conflict resolution.
That's interesting. In your examples isn't it fast because monorepos are network-based, as in, you only fetch what you need when you need it?
Also reminded me of discussions around CPython's startup time and how one use case where milliseconds matter is in small cli utilities such as Mercurial.
What I mean is daily operations on the repo like viewing a diff, committing, amending, checking out a different commit, etc. Without doing precise measurements, I would tend to think that it's mostly caused by the slowness of CPython, as compared to a C executable (Git).
The entire repo is stored on a networked file system. So essentially every file operation is remote. That doesn't actually contribute to much slowness because when I didn't use hg, operations were noticeably faster.
Git scales well enough for almost everyone (especially if you have a little discipline with what you put in the repo).
It’s only huge megacorps that need larger scale things like GVFS.
As for large files, that is not what Git is for. Git is for source code. Much like how you don’t put large files in your RDBMS, you should not be putting them in your SCM either.
What if you need to version them? Git imposes a very specific versioning model: version is a property of the entire repository. Thus, not including some file in the repo implies that it's not versioned in the same manner. It's not just a function of binary vs source.
Versioning big binary blobs is not what Git was designed for. It’ll do fine with smaller assets like icons and the like, but its data model is based on everyone using the repo having a local copy of the full repository history. You can’t easily purge old data. That scales poorly if you want to use it for audio/video files or other large data sets.
You can still do it if you want, but you might be better served using https://git-lfs.github.com/ or using another system designed for that purpose.
Honestly, you can use Git for large files with lfs. I wouldn't say I love this approach, but it isn't that bad now. You do have to make room for yet-another-tool, and you now have centralized version control comingling with your distributed tool (essentially making it central); but you can still use everything you love about git, and if your lfs doesn't change, you don't need to be connected to a server. It certainly feels pretty absurd. This isn't even a problem in SVN, but now we're tacking on another tool that you have to learn, and introduces issues.
The only way I've ever successfully `git clone`d my work repo is from another locally connected device. Even with shallow and then gradually unshallowing it, it will not generally complete before the internet falls over.
Nowadays, a new computer means a git clone (or just plain copy-paste) of a USB stick from the old one. This seems like it's a single feature of git that could be written, but if you told me "there's something that works better for large, twenty year old repos", I'd probably take that.
I don't know how Linux survives, but maybe it's just that you only rarely git clone your large repos. (Or maybe it's just that intercontinental internet is less reliable than intracontinental, so that if you're in the US it's a non issue.)
I believe there will be no scalable open-source VCS because the incentives are not there. While the technical problem is interesting, I decided not to work on it because of this. http://beza1e1.tuxen.de/monorepo_vcs.html
> I worry that Git might be the last mass-market DVCS within my lifetime.
The possibility of git being the last mass-market DVCS within my lifetime leaves me with warm fuzzy feelings. Git is simple and elegant, though its interface might not be.
I think it's simple and elegant as a data structure, when what people need and want is something that is (at least also) simple and elegant in its UX and most importantly VERY simple and elegant for the 80/20 use cases.
For example a typical question on Stackoverflow is "How do I answer which branch this branch was created from", always has 10 smug answers saying "You can't because git doesn't really track that, branches are references to commits, and what about a) a detatched head? b) what if you based it off an intermediate branch and that branch is deleted? c) what if...
5 more answers go on to say "just use this alias!" [answer continues with a 200 character zsh alias that anyone on windows, the most common desktop OS, has no idea what to do with].
I don't want to write aliases. I usually don't want to consider the edge cases. If I have 2 long lived branches version-1.0 and master. I want to know whether my feature branch is based on master or version-1.0 and it's an absolute shitshow. Yes it's possible, but is it simple? Is it elegant? No.
The 80/20 (or 99/1) use case is
- centralized workflow.
- "blessed" branches like master and long lived feature branches that should ALWAYS show up as more important in hisory graphs.
- short lived branches like feature branches that should always show up as side tracks in history graphs.
Try to explain to an svn user why the git history for master looks like a zigzag spiderweb just because you merged a few times between master and a few feature branches. Not a single tool I know does a nice straight (svn style swimlane) history graph because it doesn't consider branch importance, when it should be pretty simple to implement simply by configuring what set of branches are "important".
As a very basic git user, about once a month my local git repository will get into a state I cannot fix. I cannot revert, cannot reset, cannot make it just fucking be the same as origin/master. Usually I accidentally committed to local master and then did a couple other things and it's just easier to blat and re-clone than work out how to resolve.
Git is hard for idiots imo, and there are a lot of us
The workflow that I have found works best is to just not change anything myself, I let other people do all the work. This way I can be sure that I won't get merge conflicts that I can't fix.
> Usually I accidentally committed to local master and then did a couple other things
Create a new branch and check it out while you are on the last commit (git checkout -b my-branch), delete the master branch (git branch -D master), and pull it again (git pull -u origin master). You'll end up with a local branch with a bunch of commits that you can merge, rebase or cherrypick, depending on what you want.
If you want to learn more about git in a practical way, there's an awesome book called Git Recipes.
Recloning means redoing your work on top of a potentially different codebase. The approach I described is how you "blat and reclone" using git instead of the filesystem, and it has the clear advantage of keeping everything in the same repo. You can then mix all the code together in whatever way you prefer.
Git is a very flexible tool that allows for individual local workflows independent of how teams collaborate. Finding a personal workflow that works for you is a little investment that pays huge dividends for a long time. Git is a 15 year old tool that is expected to live for 10-30 years more at the very least. I encourage everyone to learn enough Git to not be afraid of it.
I did not read brigandish's comment as advocacy of 'blat and reclone', I took it to be a comment on git's awkwardness.
I have considerable sympathy for the RTFM reply, but I do not think it is the last word that shuts down any question of usability. What seems clear to me is that there are a lot of people using git who probably should not be. In many cases, they do not have a choice, but I also suspect that many of the organizations that have chosen git do not have the issues that it is optimized for.
> I see no reason git needs to be changed in order to cater to people who refuse to read basic documentation or learn from their mistakes.
In my opinion, solving problems and making improvements involves reducing complexity, not defending it. Many people, including myself, have read the Git docs and learnt about the underlying data structures etc etc and still we can make the claim that it could be better, in numerous ways.
Not if it means reducing capabilities of the program in order to add bumper guards.
I can’t think of any software that handles a complex program that doesn’t have a manual, documentation like a manual, or a learning curve. Git is a tool for developers, not casual users who want typical apps.
Again, you wouldn’t make an argument like this for a tool used by a plumber or a mechanic. If a tool succinctly handles a problem, good! But using tools is part of the profession; they have learning curves.
Most issues with git are PEBKAC issues because people refuse to spend 10 minutes of their life reading about a tool they may use for hundreds or thousands of hours. I wouldn’t want to cater to those kinds of people.
Software can cater to multiple types of uses at the same time. You can have a learn-as-you-go experience while keeping your powerful tools that enable more fine-tuned or complex tasks. Easy-to-use vs. powerful is a false dichotomy.
About the plumbing/mechanic analogy, I totally would make the same case! Hammers and wrenches don't require a manual and can be used for very complex tasks, and that's exactly what makes them so well designed and popular. Few people want their hammer to have more features, and if they do, they still want to keep the good old hammer ready, because it's so easy and simple to use.
Especially calling out PEBKAC (Problem Exists Between Keyboard And Computer) - while even most of the expert git users, including the author himself say the interface could at least be made much better - makes me really suspicious that you simply like feeling superior to other people because you know something they don't, and you don't want to lose your "edge" if suddenly everyone can use version control without resorting to manuals.
iMovie vs Premiere/Final Cut. Final Cut X vs 7. Garageband vs Pro Tools. Word vs LaTeX. and so on. It's very difficult to design interfaces that are easy enough for average users that don't impede pros/power users.
> hammer
A hammer isn't a good comparison. Something like a multimeter is what I was thinking of, etc. Git solves a significantly more complex problem than either of these, though.
> including the author himself say the interface could at least be made much better
I don't disagree! Git's interface -could- be better. That has nothing to do with my points above with regards to people refusing to read basic literature about the tools they use, expecting them to just magically do everything for them out of the box, "intuitively".
> feeling superior ... you don't want to lose your "edge"
This could not be further from the truth. I simply have no sympathy for people who refuse to read the manual or an intro to using a tool, and then complain about the tool being hard to use. Yeah.. it's hard because you didn't do any reading! Git is actually really easy if you read about the model that it uses. Most people don't need to venture out beyond ~5-6 subcommands, and even then it's easy to learn new subcommands like cherrypick, rebase, etc.
Adobe Photoshop, as another example, has a learning curve, but that tool is indispensable for professionally working on / editing images. (GIMP is also good, but that's not in the scope of this discussion). A lot of beginner issues are basically PEBKAC because they didn't read the manual. Same with Pro Tools, or probably any other software used by industry professionals. They're harder to use but what you can do with them (since they treat you like an adult, instead of holding your hand and limiting you) is incomparable to the output of apps designed for casual users.
The git "master" branch is just an example of 'convention over configuration': some commands use it as the default argument, just like "origin" is the default remote name. Nothing in git is special or sacred! :)
"git branch -d" doesn't remove a branch if that means losing track of some local commits. "-D" doesn't check that. The "git branch" commands only operate on your local repo, they don't push any changes to remote repos and they don't pull commits from anywhere.
`-D` doesn't involve the remote at all (unless you are using something like the intentional remote:branch syntax which this example isn't). It is a force delete in that if there were commits locally in that branch and only in that branch it should still delete that branch. It should be unlikely you need that force because the first step was to branch everything as is, so it is safer to just use `-d`, but if the intention is to "blat" it from orbit anyway, `-D` is that.
Though you do have to have committed. One of the things I hammer on in my tutorials for work is that if you get confused in git, make sure you commit. If you commit, you can take your problem to the other engineers and we can almost certainly get you straightened away. Fail to commit, though, and you really may lose something.
Also, metapoint about git: While I won't deny its UI carries along some dubious decisions carried over from the very first design, in 2020, basically, if you thing "Git really ought to be able to do [this sensible thing]", it can. It has that characteristic that open source software that has been worked on by a ton of contributors has, which is that almost anything you could want to do was probably encountered by somebody else and solved five years ago. It just may take some searching around to figure out what that is. (And on the flip side, when you read the git man pages and are going "Why the hell is that in there?", the answer may well be "a problem that you're going to have in six months".)
Are you saying that with knowledge of what "git reflog" is? I suspect not. I'd really need to see a sequence of commands that removes committed state from the repo to buy this. If you try to produce it, bear in mind the first thing I'm going to do is run "git reflog" on the result, so if you find your committed state is still there, then I'm going to say it's still saved.
(That's not a git thing. I don't really even want some sort of hypothetical source control system that literally tracks every change I make. It's technically conceivable and should be practical to what would at least be considered a "medium sized" project in the git world, but I'd just be buried in the literally thousands of "commits" I'd be producing an hour. Failing that sort of feature, a source control system can't help but "lose" things not actually put into it.)
As I am not familiar with the details of reflog (I don't recall ever using it) I took a look at the article. I wasn't long until I reached what looks like a caveat: "This command has to be executed in the repository that had the lost branch. If you consider the remote repository situation, then you have to execute the reflog command on the developer’s machine who had the branch." Joe, who works on another continent, quit last week and his computer was a VM...
OK, so we have backups of his VM and we can recreate a clone of it, but will that be satisfactory? Are there any issues with hardware MAC addresses or CPU ids? How far down the rabbit hole of git minutiae do you have to go before you are confident that you can do all basic source-control operations safely?
The main thing that people fail to understand is that commits are immutable and the overall commit graph is immutable (with the caveat that pathways in the graph that don't end in a branch head are subject to garbage collection).
A rebase does not destroy information. It creates new commits and moves the branch head to a different spot on the graph.
The reason git is seen as painful is because you can't claim expertise until you develop the ability to form a mental map of the graph. But once you do this the lights turn on and everything starts to make sense.
This is why the mantra "commit early and often" still holds. The more experienced git user will tell the newer people this, so when they come with a mess it will always be recoverable.
That GC a pretty big caveat! Combined with the fact that unreferenced objects are never pushed.
reflog is like undelete in filesystems, it's a probabilistic accident recovery mechanism for an individual computer (repo checkout in this case) that you can try to use if you don't have backups.
In any case, git garbage collection isn't a common phenomenon. It usually triggers every few weeks, even in repos with high activity. The chance of hitting a GC that deletes an untracked commit you need is extremely small.
How would it be gone? You can't rebase onto a dirty working directory so you can't blow away uncommitted changes accidentally and any state prior to and during the rebase is always recoverable via the reflog.
It takes trying to do anything that's not easily recoverable as long as you commit before you start messing around and don't rm -rf .git.
It won't be gone until you actively prune dangling commits. The commit may not be reachable from the HEADs of existing branches, but go through the reflog and your commits will be there. You can then create a branch to make the commit reachable.
You have a branch named <branchname>
you do your operation that fouls up your branch.
If you check with
git log --reflog --all
you will see with this magical command that git DOESN'T REMOVE any commit. Your old tree is still there, only normally hidden.
The commit that was there before you fouled up your branch is still there.
You now only need to set your branch to the old commit. A branch is nothing else than a pointer in the tree.
You have 2 possiblities to change the commit a branch points to
1. git branch --force <branchname> SHA1 (works only if <branchname> is not the current checked-out branch. Simply checking out with the SHA1 works also as it deteches the HEAD).
2. replace the SHA1 in the text file in .git/refs/heads/<branchname> by the SHA1 where you want the branch to point to.
With that, your repo is in the same state it was before your error.
Git is a complex tool because it’s tackling a complex problem. I don’t see a way of making it “easier” without massively reducing what it can do. It’s like saying we should reduce a formula one car so people can use it without reading up on it, etc.
If something happens once, it happens. If something happens multiple times then it means you’re not evaluating why it occurred in the first place and learning from it. No tool in the world can solve this problem because it’s not a problem with the tool, rather the user.
Git is really not so hard, but it requires a little reading.
Git isn’t something which you can generally be successful using in a shallow way. Most developers will need to devote significant time and energy to mastering it. There really needs to be a better layer on top of it in order to make it easier for developers to figure out how to do what they want to do. Some of the commands and switches don’t seem to be orthogonal and/or intuitive.
Are we seriously going to refer to “reading the manual” as “significant time and energy”? In this case you don’t even have to read the manual, just a primer on how git works. You know, on how the tool that you’re using works. Why are people so allergic to spending even a modicum of time on learning a tool that massively simplifies their life and makes their work possible?
Do plumbers complain about having to read manuals for the equipment that they use? Electricians?
As programmers our tools are easier to learn and use, yet we complain about having to any work at all.
Why even be a programmer? If reading about git is so hard, what about the rest of the field that doesn’t even have documentation?
How about we don’t make tools that cater to the lowest common denominator, in this case people who basically can’t be assed to do anything? RTFM.
Because a lot of us have used tools besides git that enable the workflow we need without that complexity, and without the fragility that often necessitates going to stackoverflow or asking on a slack channel.
I have a way of picking the losing side so I've been using mercurial for everything until now, and until now Bitbucket offered hg. They're decommissioning it so I'm moving over to git and I feel like my workflow has been hampered, not just in the immediate complexity of learning the new tool, but in the ongoing complexity of using a less good tool for my needs.
I'm dealing with it, but the situation you're describing isn't really the one that I and a lot of other whiners are dealing with.
> Because a lot of us have used tools besides git that enable the workflow we need without that complexity
I spend ages unfucking local svn working copies and long running branches on both windows and linux. git needs some serious flaws to keep up with that experience.
A lot of people get by just staging and pushing/pulling commits, myself included. That’s 3 commands, 4 if you count git status. You do not need to dig deep to get a lot of use out of git as a basic remote sync.
> their thing: Sprawling, incoherent, and inefficient
> our thing: Self-contained and efficient
This is not biased in any way and makes me want to continue reading. /s
Also, you can’t claim something to be “efficient” when it’s doing many different things like scm, issues/tickets, a web forum/ui ....
Then you have non-issues like git being installed via a package manager instead of dragging and dropping a binary. Yeah, this is such a huge problem that concerns people, better switch to Better Project (tm).
And then you take Gitlab and conflate Gitlab’s issues with problems with Git. I guess gogs/gitea don’t exist?
This page needs to be rewritten to simply list the differences in neutral language. There are good points but they’re lost in unnecessary epithets like “caused untold grief for git users”. I get it: git bad, our product good. Switch!
—
Personally, I don’t want something that tries to do many different things all at once.
Of course we're biased, but every row in that table corresponds to a section below where we lay out our argument for the few words up in the table at the top.
Now, if you want to debate section 2.2 on its merits, we can get into that.
> you can’t claim something to be “efficient” when it’s doing many different things
We can when all of that is in a single binary that's 4.4 MiB, as mine here is.
A Git installation is much larger, particularly if you count its external dependencies, yet it does less. That's what we mean when we say Git is "inefficient."
But I don't really want to re-hash the argument here. We laid it out for you already, past the point where you stopped reading.
> git being installed via a package manager instead of dragging and dropping a binary. Yeah, this is such a huge problem that concerns people, better switch to Better Project (tm).
It is on Windows, where they had to package 44-ish megs of stuff in order to get Git to run there.
On POSIX platforms, the package manager isn't much help when you want to run your DVCS server in a chroot or jail. The more dependencies there are, the more you have to manually package up yourself.
If your answer to that is "just" install a Docker container or whatever, you're kind of missing the original point. `/home/repo/bin/fossil` chroots itself and is self-contained within that container. (Modulo a few minor platform details like /dev/null and /dev/urandom.)
> This page needs to be rewritten to simply list the differences in neutral language.
We accept patches, and we have an active discussion forum. Propose alternate language, and we'll consider it.
> unnecessary epithets like “caused untold grief for git users”
You don't have to go searching very hard to find those stories of woe. They're so common XKCD has satirized them. We think the characterizations are justified, but again, if you think they're an over-reach, propose alternate language.
> I don’t want something that tries to do many different things all at once.
I haven't used nor looked at fossil in maybe 5 years, but had a couple of questions.
Does fossil now have any kind of email support built in to the ticket manager? I remember when I tried to use fossil for actual production use, there was no way to trigger emails sent when, e.g. tickets were submitted, and one of the devs said to just write a script to monitor the fossil rss feed and send the appropriate email, which seemed like a baroque and fragile (and time-consuming) solution.
And is any more of the command-line behavior configurable (like the mv/rm behavior -- affecting the file on disk as well as the repository, or just marking the file as (re)moved in the repository)?
By the way, the "one checkout per repository" is not strictly true. You can use "git worktree"; this is a lightweight way to reuse an existing git repository and have each worktree use a different branch. It's a nice feature, and I use it daily.
Also, a comment about the argumentation in "test before commit". It feels a bit artificial wrt. what can be done locally, what git commit and git push do and what their relation is in a sane workflow. Certainly, one can push untested stuff to the remote server by mistake; but, even so, this should be OK, because if one can push directly to important branches like master or similar without going through any reviews and other sanity checks, one has a problem... and the problem isn't really Git :)
> By the way, the "one checkout per repository" is not strictly true.
You must be referring to just the table at the top, not to the detailed argument below, which mentions git-worktree and then points you to a web search that gives a bunch of blog articles, Q&A posts, project issue reports and such talking about the problems that come from using that feature of Git.
I suspect this is because git-worktree is a relatively recent feature of Git (2.5?) so most tutorials aren't written to assume use of it, so most tools don't focus on making it work well, so bugs and weaknesses with it don't get addressed.
Fossil is made to work that way from the start, so you can't run into these problems with Fossil. You'd have to go out of your way to use Fossil in the default Git style, such as by cloning into ~/ckout/.fossil and opening that repo in place.
> test before commit". It feels a bit artificial wrt. what can be done locally, what git commit and git push do and what their relation is in a sane workflow.
> Fossil is made to work that way from the start, so you can't run into these problems with Fossil. You'd have to go out of your way to use Fossil in the default Git style, such as by cloning into ~/ckout/.fossil and opening that repo in place.
That's unfortunate. Reading the comments here, switch-branch-in-place is seen as some kind of flaw, but I don't think I would voluntarily use a VCS that doesn't let me easily do that (it's most sensible way for me to work, from way before Git was a thing).
> switch-branch-in-place is seen as some kind of flaw
You're conflating two separate concepts:
1. Git's default of commingled repo and and working/checkout directory
2. Switch-in-place workflow encouraged by #1
Fossil doesn't do #1, but that doesn't prevent switch-in-place or even discourage it. The hard separation of repo and checkout in Fossil merely encourages multiple separate long-lived checkouts.
A common example is having one checkout directory for the active development branch (e.g. "trunk" or "master") and one for the latest stable release version of the software. A customer calls while you're working on new features, and their problem doesn't replicate with the development version, so you switch to the release checkout to reproduce the problem they're having against the latest stable code. When the call ends, you "cd -" to get back to work on the development branch, having confirmed that the fix is already done and will appear in the next release.
Another example is having one checkout for a feature development branch you're working on solo and one for the team's main development branch. You start work from the team's working branch, realize you need a feature branch to avoid disturbing the rest of the team, so you check your initial work in on that branch, open a checkout of that new branch in a separate directory and continue work there so you can switch back to the team's working branch with a quick cd if something comes up. Another team member might send you a message about a change needed on the main working branch that you're best suited to handle: you don't want to disturb your personal feature branch with the work by switching that checkout in place to the other branch, so you cd over to the team branch checkout, do the work there, cd back, and probably merge the fix up into your feature branch so you can work with the fix in place there, too.
These are just two common reasons why it can be useful to have multiple long-lived checkouts which you switch among with "cd" rather than invalidate build artifacts multiple times in a workday when switching versions.
Git can give you multiple long-lived working checkouts via git-worktree, but according to the Internets it has several well-known problems. Not being a daily Git user, I'm not able to tell you whether this is still true, just that it apparently has been true up to some point in the past.
Since no one is telling me those issues with git-worktree are all now fixed, it remains a valid point of comparison in the fossil-v-git article.
> You must be referring to just the table at the top, not to the detailed argument below
Well, not entirely, because in my opinion the detailed argument kind of hand-waves away the entire git worktree. Continuously switching branches inside a single large Git repo is certainly a suboptimal way to work with Git, but most of the time one should be able to avoid that with the worktree (though the worktree stuff is, of course, not a miracle cure for everything).
Getting a "Fossil-like workflow with Git" is not really the point, is it? One could argue that Fossil does not support "Git like workflow". It is not really a good argument either way.
It is not like Fossil's workflow is a global optimum, it's just something Fossil does well.
The core problem with it is that very few people can get paid more by being better at using their [D]VCS, whereas those more skilled with their programming language(s) of choice often do get paid more to wield that knowledge.
Consequently, most people do not fully master their version control system to the same level that they do with their programming language, their text editor, etc.
To be specific, there are many more C++ wizards and Vim wizards than there are Git wizards.
In situations like this, I prefer a tool that lets me pick it up quickly, use it easily, and then put it back down again without having to think too much about it.
You see this pattern over and over in software. It is why all OSes now have some sort of Control Panel / Settings app, even if all it does is call down to some low-level tool that modifies a registry setting, XML file, or whatever, which you could edit by hand if you wanted to. These tools exist even for geeky OSes like Linux because driving the OS is usually not the end user's goal, it is to do something productive atop that OS.
[D]VCSes are at this same level of infrastructure: something to use and then get past ASAP, so you can go be productive.
> I think it's simple and elegant as a data structure, when what people need and want is something that is (at least also) simple and elegant in its UX and most importantly VERY simple and elegant for the 80/20 use cases.
That's what UIs (whether CLIs or otherwise) for standardized workflows like git-flow are, IMO.
It doesn't nearly go all the way there though. Why do people need to use a command line and a gui tool (usually) for git? Because it's fundamentally not written to be used with a GUI. That I think is one of its biggest flaws. Using a GUI with git always feels like you are missing vital information and just trying to poke a cli underneath to do what you want.
Some design decisions also shine through like "no branch is more important than any other branch" which is completely mental considering how people actually use git.
Most of the guis are crap because everybody who builds them thinks that a GUI should just be a more or less a visual representation of the command line.
"Why do people need to use a command line and a gui tool (usually) for git?"
You don't. The reason is that you're using a tool that didn't budget the time to directly work on git data files and it uses the command line under the hood, because that's a hard business case to make for most small tools. This is not fundamental to git; the very top-end git-based tools like Github or Bitbucket all do their own internal, direct implementation of git functionality for this reason. It's not a characteristic of git, it's a characteristic of the GUI tools you're using.
A perfectly sensible one based on perfectly sensible engineering tradeoffs, let me add; no criticism of such tools intended. Git's internals from what I've seen are not particularly difficult to manipulate directly as such things go, but you are simply by the nature of such a thing taking on a lot more responsibility than if you use the command line-based UI.
Another simple tracking thing that git doesn't do that would easily make git much much better is if it tracked when you did a cherrypick of another commit in the commit graph (not just as some kind of metadata comment in the commit message, but as a kind of soft parent); then if you did a rebase, you could actually "reverse engineer" the rebase (if and only if you absolutely needed to, such as to track what happened during a squash, or to automate re-applying the rebase correctly to someone who was tracking one of the prior commits) and it would largely solve the question of "merge or rebase" with "por que no los dos".
I saw a web designer check in a huge hierarchy of empty directories which would be the structure of the new project that their team should work on. They were quite surprised when it didn't show up on any of the other designer's computers after a "pull". They had to go to the "Git guru" for help.
Windows and Mac both have directories as a major fundamental concept. Everyone knows them and is familiar with them. Subversion tracks directories. Git does not.
I also can't figure out why it doesn't: An empty tree object should be sufficient to do the job. I actually had to write extra code in git9[1] to avoid accidentally allowing empty directories.
According to this[1], which I think might be an official FAQ:
> Currently the design of the Git index (staging area) only permits files to be listed, and nobody competent enough to make the change to allow empty directories has cared enough about this situation to remedy it.
Hm. I should see how git behaves when it gets a repository with empty directories. If it doesn't blow up, I may just add support -- it'd be useful for me.
It kind of does, iff you have a file there. In which case it tracks the path to the file, and then creates the relevant directory structure to get to it.
Of course git is also incredibly painful and brittle if you want "exclude/except" behavior on the gitignore involving subdirectories.
Another thing Git does not and cannot even attempt to do - file locking.
The assumption behind Git is, everyone develops on their machines and/or branches, and then things are merged. This only works for files which can be merged.
There are plenty of things pretty much any project wants to track which cannot be merged, for example Word documents (documentation), Photoshop files (source of graphics), PNGs (icons in webapps), and so on.
With a centralized system, that's easy, just go over to using file locks ("svn lock") for those files. With a distributed system, that's impossible.
> Another thing Git does not and cannot even attempt to do - file locking.
That's a seriously hard problem for a DVCS if you're serious about the "D".
This topic turned into [the single longest thread in the history of the Fossil forum](https://www.fossil-scm.org/forum/forumpost/2afc32b1ab) because it drags in the CAP theorem and all of the problems people run into when they try to have all three of C, A, and P at the same time.
To the extent that Fossil based projects are usually more centralized than Git ones, Fossil has a better chance of solving this, but I'm still not holding my breath that Fossil will get what a person would naively understand as file locking any time soon.
> Word documents (documentation), Photoshop files (source of graphics), PNGs (icons in webapps), and so on.
You want to avoid putting such things into a VCS anyway, because it [bloats the repo size](https://fossil-scm.org/fossil/doc/trunk/www/image-format-vs-...). I wrote that article in the context of Fossil, but its key result would replicate just as well under Git or anything else that doesn't do some serious magic to avoid the key problem here.
Instead of PNG, check in BMP, uncompressed TIFF, etc., then "build" the PNG as part of your app's regular build process.
This has the side benefit that when you later change your mind on the parameters for the final delivered PNGs, you can just adjust the build script, not check in a whole new set of PNGs. My current web app has several such versions: 8-bit paletted versions from back before IE could handle 24-bit PNG, then matted 24-bit PNGs from the days when IE couldn't handle transparency in PNG, and finally the current alpha-blended 24-bit PNGs. It'd have been better if I'd checked in TIFF originals and built deliverable PNGs at each step.
> Instead of Word files, check in Markdown or [FODT]
Another fun option is to unzip the DOCX and check that in, since it is mostly a collection of XML files in a zip container. I built a tool to automate zipping/unzipping files like DOCX years ago as pre-commit/post-checkout/post-merge hooks. [1] It's an interesting way to source control some types of files if you can find a way to deconstruct them into smaller pieces that merge better. Admittedly, merging Office Open XML by hand is not a great experience (and dealing with subtly broken or corrupt internal contents is not fun, because programs like Word can fussy when things are even slightly wrong), but you get better diffs sometimes than you would expect.
> You want to avoid putting such things into a VCS anyway, because it [bloats the repo size]
How do you suggest projects like games handle this, where data files are naturally linked to source files? Imagine trying to sort out an animation bug when you only have source level tracking and no idea which version of the animation data corresponds to the animation source files of the bug report. These data files are not 'built' from the 'build' step as they are the product of artists.
I’d guess not one user in 100.000 uses git decentralized (as in, doesn’t have a blessed “central” repo). It’s the disabling of locking that should be the special case! The big problem with git is that you can’t mark a repo as a master repo/blessed repo (which would be the one where lockfiles are stored). A lot of functionality would be helped if the commands could know which end is the important/central one.
> I’d guess not one user in 100.000 uses git decentralized
I understand your sentiment, but the denominator in that fraction is probably much lower than your guess.
Consider even simple cases like the disconnected laptop case. You may work at a small office with only local employees, and so you have one central "blessed" repo, but if one person locks a file and then goes off to lunch, working on the file while at the restaurant, you still have a CAP problem:
CA: Because the one guy with a laptop went off-network, you have no full quorum, so no one can use the repo at all until he gets back and rejoins the network. (No practical DVCS does this, but it's one of the options, so I list it.)
CP: When the one guy went off to lunch, we lost the ability to interact with his lock, and that will continue to be the case until he gets back from lunch. Also vice versa: if someone still at the office takes out a lock, the guy off at lunch doesn't realize there is lock, so he could do something bad with the "locked" file. (This is the mode DVCSes generally run in by default.)
AP: No locking at all, thus no consistency, thus your original problem that inspired the wish to have file locking.
Not sure I understand the problem. If I lock fileX and go to lunch, then I own the lock on that file while I’m out to lunch. It’s basically analogous me pushing the file fileX.lock to the repo next to fileX, with my user id as content. I can only do it if it isn’t there.
Everyone else will only see that lock if they fetch and if they don’t, they might edit their local copy of fileX too, but would be prevented from pushing their version to the blessed repository by the lock. They can push a copy under another name, or wait until I have removed the lock (but probably can’t resolve the conflict anyway because it’s likely a binary document). So they user will remember to never start editing without taking the lock in the future.
It’s not perfect by any stretch of the imagination but it’s all anyone asks for in terms of file locking. It’s what Subversion always did.
Same thing obviously. But this is just a method of communication. It’s instead of emailing/shouting “don’t edit the background image today please” across the office.
An admin can remove the lock. Or you can allow force-pushing by anyone to replace it or whatever.
Not sure why this is seen as so complicated, version control systems have done it since forever. It’s not trying to solve some distributed lock system in a clever way. It’s dumb centralized mutex per file. And yet again this is all that’s needed (and it’s also added to git in git-LFS!).
You can set up custom merge drivers for different file types.
More importantly, if you've got stuff in your decentralised repo that shouldn't be decentralised, that's not the fault of the DVCS you're using, it's your fault. That everything looks like a nail does not speak against the value of a hammer.
> You can set up custom merge drivers for different file types
True, but there are inevitably some files which still cannot be merged, so the problem remains.
> More importantly, if you've got stuff in your decentralised repo that shouldn't be decentralised, that's not the fault of the DVCS you're using, it's your fault.
Indeed, if you want to store the history of your files - the whole software including the icons it uses, so that you can go back to any previous version and build it - and you chose a DVCS like Git, I would agree the fault was yours.
That's basically what I was arguing, that Git is the wrong choice if you have any binary assets like icons (even if those assets have small filesize) due to the lack of locking, sorry if I was unclear.
Being distributive isn’t a feature for the binary files contents. I don’t want 100 historical versions of a huge game texture, just the latest one. The history is distributed however, so I can see who changed the texture even when disconnected. Centralized binaries like git LFS works like any package/asset manager.
There is no one that would want distributed binaries in git. But people also don’t want to switch from git to something else just because they have a 100GB or 10TB repo. Tooling (build tools, issue management) everywhere has decided that git is all that’s needed.
Not putting binaries in git isn’t a solution at all. Binaries are part of the source in many applications (e.g game assets, web site images...). Distributing every version of every binary to everyone is also not a solution.
Maybe for your particular problem it's worthwhile to setup a CM policy for the (topic) branch naming. For example in your case something like: topic/alkonaut/version-1.0/foo or topic/alkonaut/master/bar.
Or use something like: git log --all --graph --oneline
One tip for "zigzag spiderweb" is to always rebase your topic branch to the target branch prior to a fast-forward merge to the target branch (e.g. master). To clarify: while in your branch topic/foobar: "git rebase master", "git checkout master", "git merge --ff-only topic/foobar".
(There's surely a clever shorthand for the above procedure but when it comes to the command line, I like to combine small things instead of complicated memorized things, it's some kind of Lego syndrome)
Rebase + FF solves the spiderweb problem by removing the branches. But some insist on keeping the branches and I don’t get why a “git log” doesn’t have (and default) to showing important branches as straight lines.
Also with dozens of tiny commands but only a handful of actual desired outcomes, the high operations should be explicit commands. E.g “rebase this branch on master and then squash it and commit on master”.
A lot of the local/remote could also be hidden. The number of times I want to rebase on my local master which is behind origin by 2 commits is... zero.
Said 80/20 was the idea behind Fossil, along the "easy to learn because similar to svn". Seemed a good idea. But lost to having a famous user, which the Linux kernel obviously is.
If its interface is not simple and elegant, I don't see how you can call git simple and elegant, since it's how all users will interact through the interface. And personally I prefer a VCS with less ways to shoot myself in the foot than git.
> If its interface is not simple and elegant, I don't see how you can call git simple and elegant, since it's how all users will interact through the interface.
At least in my experience, the interface makes a lot more sense if you understand the underlying data structure, which does have a certain elegant simplicity. (Even if it doesn't work quite the same as traditional source code control systems. Failing to work with directories is a problem of the git approach. Having a nice offline story is a distinct advantage.)
> And personally I prefer a VCS with less ways to shoot myself in the foot than git.
Oddly, the thing I love about git is how easy it makes it to recover from mistakes. Even if there are more ways to shoot yourself in the foot, there are also more ways to put your foot back exactly the way it was before you shot it. (If only real life worked that way!) This is what the immutable content storage under the hood of a git repository gets you.
If you know the commit hash (and there are a bunch of ways to easily keep track of these), you can get back to the state that's represented by that hash. Commands like merge/rebase/cherry-pick make this particularly easy by providing an '--abort' option that means "I've screwed this operation up beyond repair and need to bail out." And the abort works. As long as you had your target state committed, you can get back to it. (And if that's just a transient state that you don't want to persist, it's easy enough to squash it into something coherent.)
>the interface makes a lot more sense if you understand the underlying data structure
Except that I don't have to understand the underlying data structure to use a more basic VCS like Mercurial. What makes git so special that I would have to do that before being able to use it?
And for recovery from mistakes, I meant stashing the changes somewhere, deleting the repository and downloading a clean copy to start again, which I had to do a few times with Git and never with Mercurial (I might had to do it once or twice with SVN, though).
> Except that I don't have to understand the underlying data structure to use a more basic VCS like Mercurial. What makes git so special that I would have to do that before being able to use it?
I don't think it is special. Generally after a while using a given tool, library, etc. I find it useful to dig in a bit and see what's happening under the hood to help understand why it works the way it does. git just happens to be the tool under discussion at the moment.
> And for recovery from mistakes, I meant stashing the changes somewhere, deleting the repository and downloading a clean copy to start again, which I had to do a few times with Git and never with Mercurial (I might had to do it once or twice with SVN, though).
I think we're talking about the same sort of mistakes. It's hard for me to imagine a case where you'd need to blow away a local git repository entirely. Worst case scenario, there should be good refs available in a remote that are just a 'git fetch' away. (If there's no remote, then blowing away the local repo is essentially just starting from scratch anyway.)
You also don't have to understand the underlying structure for a similarly powerful DVCS like bitkeeper. Yes, it isn't open source, but git was a major step back in usability for my group from bk to git.
Yes, this. I actually tried bk before git, and actually used bazaar and then mercurial before git as well. I was stunned at how arcane the UI in git was made (And how arrogant the community of users around it could be, too). Bk was clean and elegant frankly. I'm no idiot when it comes to the concepts -- but git's CLI interface is just awful.
Bitkeeper is in fact open source now, BTW. Too late, but it is.
You're right. The arrogance was hilarious. People with no experience with bk saying, "What's your problem?" Now, git is super fast, because its core is written by Linus, but I think he is just so much better technically and so far into the internal weeds of Linux for so long in so many areas that he had trouble creating an API for mere mortals.
We used bk when I was with the group for 3 years and then switched to git. Been using git for 8 years. I know git, but the ergonomics and basic English semantic meaning of commands is much worse. I have to look up git commands and subflags _all_ the time still for checking out old versions of files to a new file. Looking at tags. Commiting to a new branch, et c. Bk's version of gitk was superior and the usage was nicer. I've used mercurial, svn, cvs, git, and bk. Git is hard, but it is the standard now, so of course I'll continue to embrace it. Just not as ergonomic.
> At least in my experience, the interface makes a lot more sense if you understand the underlying data structure, which does have a certain elegant simplicity.
Yeah. Take an afternoon to read through gittutorial(7), gittutorial-2(7), and gitcore-tutorial(7). Git is a tool, and just like any other tool (car, tablesaw), you will be much better off if you take the time to learn to use it properly. Once you see "The Matrix" behind Git, it becomes an incredibly easy to use and flexible tool for managing source code and other plaintext files.
The fact that you put Git in the same category as tools having a potential of inflicting grievous bodily harm if misused is telling. And why does Git require this whereas other VCS don't? Mercurial was incredibly easy to use nearly right out of the gate, not after an afternoon of work.
> The fact that you put Git in the same category as tools having a potential of inflicting grievous bodily harm if misused is telling.
They're just examples of tools.
> Mercurial was incredibly easy to use nearly right out of the gate, not after an afternoon of work.
I talk about this elsewhere in this thread, but I disagree with this assertion. I find Mercurial baffling and Git very elegant, though it could be an artifact of the order in which I learned the tools.
> And why does Git require this whereas other VCS don't?
It doesn't. It just works better when you take the time to learn how it works. (Which is an experience I commonly have with the tools I use, for whatever that's worth.)
Yes and no. I'm a git guy and a fan, but you can really, really mess things up. Usually, this is only when using features like force push; however, there are arguably legitimate use cases for that.
Buddy had a teammate that almost force pushed references from a slightly different repo. What a mess that could have been! I agree regarding the usefulness of reflog, and think the complaints about messing things up with rebase, reset, ect are overblown. It really isn't an issue for intermediate users.
Hence almost always. It’s not a common situation to delete commits from history, etc.
I don’t see the capability to force push as a negative. There are situations in which it’s necessary, like forcibly removing history (something I had to do just today).
Git gives you the ability to shoot yourself in the foot, so it’s up to the operator to not make a mistake like that without backing up the repo to a different place first, etc. Something something only a poor carpenter blames their tools.
Git is neither easy not is it really elegant. It is useful for projects like Linux™ but for the vast majority of projects way better tools like mercurial or fossil would be a much better fit.
After svn, git was a breath of fresh air; far easier to use and reason about, not to mention much faster.
I don't think much of your all-in-one solution like fossil - that's a competitor for GitHub (without the bits that make GH good), not git.
I tried to use hg at one point in the early days, and found it much slower than git. Git's low latency for commands made a substantial difference, perceptually. In principle I think git encourages too much attention to things like rebases, which fraudulently rewrite history and lie about how code was written, just so the diff history can look neater. Working code should be the primary artifact, not neat history, and rebases and other rewrites make it too easy to cause chaos with missing or duplicated commits in a team environment. So ideologically, mercurial is a better fit, but that's not enough to make me use it.
Fit is a function of an environment; when we say survival of the fittest, we mean fitness as adapted to an environment. Feature set isn't the only aspect; at this point, the network effects of git are insurmountable without a leap forward in functionality of some kind.
(I think git & hg are just as elegant as one another; to me, the elegance is in the Merkle tree and the conceptual model one needs to operate the history graph.)
Can you explain what you mean by fossil being a competitor for github, rather than git? Fossil is a scm with additional features for usage, but (the last I used it, and to my memory) it was just the command line fossil very much like git, and that's how I used it.
What makes it the case that fossil cannot be a competitor to git (or hg), in that they are both a vcs?
edit I haven't had a lot of sleep. What I'm trying to ask, I suppose, is why can't you use fossil just like git and ignore any all-in-one features it provides? (This is not to comment on how good, scalable, fast, correct, or robust it is.)
You can, though I suspect the OP's focus on speed means you'd want to turn off Fossil's autosync feature, which makes it operate more like Git: checkins go only the local repository initially, and then you must later explicitly push them to the repo you cloned from.
This is why Subversion was "slow": your local working speed was gated by the speed of the central repo, which could be slow if it was under-powered or overloaded, as was common with the free Subversion hosts of the day. At least with Git, you can batch your local changes and push them all at some more convenient time, such as when you were going off for a break anyway.
I have never used Fossil, but I used to be a strong proponent of Mercurial. My advice is don't - Mercurial lost, git has won, and fighting against the current is just going to make your life harder.
The main advantage Mercurial has over git is a command line syntax that makes consistent sense. The operations you want to do are easy and as you try and do more complicated things, the new commands will be unsurprising and predictable. If you already know how to use git then this advantage is (mostly) irrelevant.
There are some other features that are interesting - Mercurial has a couple of different types of branches. Bookmarks are like git branches, whereas named branches are a completely different concept which can be useful. 'Phases' tracks whether commits have been shared, and prevents you rewriting (rebasing) them when appropriate.
If you do experiment, note that many 'power user' features are turned off by default. There is a robust extension system, and the default mercurial installation includes a load of standard ones. My config file includes the following to turn on some useful stuff ('record' is the most useful for a staging area like facility):
[extensions]
pager =
color =
convert =
fetch =
graphlog =
progress =
record =
rebase =
purge =
I know Git inside and out, but I had to use Mercurial for a client a couple years ago. I found it to be the most baffling and nonsensical source control experience of my life. It might be a case of cross-contamination. Like you said, each SCM uses similar terms for different concepts, so my Git knowledge may have unfairly colored how I expected similar terms to work in Mercurial.
But stuff like: "hg log" gives you _every commit in the repo_?? When is that ever useful? How do I get only the commits that lead to the current state of the repo? Mercurial doesn't have branches; instead you're supposed to _copy the whole directory_[1] at the filesystem level?? Of course this is ridiculous, so they invented "bookmarks" which are actually Git branches. The extensions thing you mention is also a ridiculous chore. Just have sane defaults. I also found hg's output very dense and hard to understand and read, poorly suited for human consumption.
I dunno. I'm sure Mercurial is fine, many people use it every day, and likely my strong Git bias was affecting my ability to learn Mercurial. But I found it far easier to just clone into Git, use Git to do source control, and then export back to Mercurial when I'm ready to share my work.
Mercurial does absolutely not require you copying at the fs level. You're not the first person to be caught out by that tutorial, which I think would serve us best by being deleted.
The 'original' branching method for Mercurial is called Named Branches. The big difference with Git is that every commit is labelled with what branch it is on. This has advantages - if you imagine looking at the train track of 'master' in git with it's divergence for a few commits and then merge, you can see that the 3 commits were on a branch called 'performance', whereas with git that history is completely lost. See: https://www.mercurial-scm.org/wiki/NamedBranches
Branching by cloning was copied from bitkeeper. It was also early git's only branching mechanism. If you listen to Linus's talk when he introduced git at Google, you'll hear him conflate "branch" with "clone" because that's what he was thinking of at the time.
I associate that particular madness with Bazaar rather than Mercurial. It stopped being standard practice a while ago, and those old tutorials should be updated or removed.
That is exactly the point. For git you need the ecosystem to cope with it's shortcomings and in addition some experts to help you out of the pickles this software gets you into.
I mainly use fossil for personal projects.
Whats nice about it is that it not only is a very capable VCS but also a complete project management tool with tickets/issues, wiki, blog, mailing list and user management. The setup is ridiculously easy and everyone always has everything in the repository.
In addition fossil never looses data, unlike git which can easily destroy branches that are not pushed, delete stuff while stashing or unstashing, delete stuff when rebasing and so on.
And fossil has a sane command-line interface so that everyone in the team is expert enough to work with it. No need for heroes that save the day from git fricking everything up.
> Whats nice about it is that it not only is a very capable VCS but also a complete project management tool with tickets/issues, wiki, blog, mailing list and user management.
That is not nice. That is way more things that might not match me, more attack surface, more irrelevant cruft I'll probably have to look up how to disable. Project management, wiki and issue tracking preferences are very personal and often don't map particularly well to specific repositories. And _blog_ and _mailing list_? Why, you're spending time on stuff most of your users will hate, not because it's bad, but because they either don't need it or would like it different.
> In addition fossil never looses data, unlike git which can easily destroy branches that are not pushed, delete stuff while stashing or unstashing, delete stuff when rebasing and so on.
Which is why Git is successful. That's by design, not accident. We want to, and sometimes _have to_, delete stuff.
>Whats nice about it is that it not only is a very capable VCS but also a complete project management tool with tickets/issues, wiki, blog, mailing list and user management.
Not at all. There's a lot of nice stuff that falls out of Fossil's integration of these features, things which you don't get when you lash them up from separate parts.
For example, if I have a check-in comment "Fixes [abcd1234]" I get an automatic link from that check-in comment to ticket abcd1234 from the web UI's timeline view. If I then close that ticket, the comment in the timeline view is rendered in strikethrough text, so I don't have to visit the ticket to see that it's closed.
Similarly, a built-in forum means the project's developers can discuss things with easy internal reference to wiki articles, tickets, checkins...
A recent feature added to Fossil is the ability to have a wiki article bound to a particular check-in or branch, so that whenever someone views that artifact in the web UI, they get a link to the ongoing discussion about it. This is useful when you have more to say about the check-in or branch than can reasonably fit into a comment box. This solves a common problem with experimental features, where you want to discuss it and evolve the idea before it's merged back into the parent branch.
Fossil's user management features are also highly helpful.
http://fossil-scm.org/fossil/doc/trunk/www/caps/
With raw Git (no Github, GitLab, etc.) it's pretty much all-or-nothing, but with Fossil, you can say "this user can do these things, but not these other things." Thus you can set up a public project, giving anonymous users the ability to file tickets and read the forum but not make check-ins.
These features are as seductive as what Github, GitLab, BitBucket, etc. add to Git, but whereas those are all proprietary services with some of the roach hotel nature to them, with Fossil, you clone the repo and now you've got all of that locally, too. If the central repo goes down, you can stand your local clone up as a near-complete replacement for it.
It's not 100% because Fossil purposely doesn't clone a few things like the user table, for security reasons. You can build a new user table from the author names on the check-ins, though.
>In addition fossil never looses data, unlike git which can easily destroy branches that are not pushed, delete stuff while stashing or unstashing, delete stuff when rebasing and so on.
I have created a lot of feature branches that contain useless commits which I then later corrected with a simple git merge --squash. Preserving those commits sounds like a drag.
The simplest way to do that is to just use it for a local project. Say, you local ~/bin directory, containing your local scripts, or your editor's config files that you want sync'd everywhere, or a novel you're writing on the side.
If you're like me, you'll find yourself increasingly wonder, "Why would I put up with Git any time I'm not forced to by some outside concern?"
It is immensely useful. That doesn't mean that some other tool might not be better for most cases.
I love git, and don't know most other post-SVN version control systems, but I do recognise the complaints people have about git. There's clearly still room for improvement.
To use git you need to know clone, pull, commit, push. For larger projects branch and merge. Those fall into a lot of boxes that say "easy" or "elegant," and I really wouldn't hesitate to recommend git to a lot of projects, big or small, discounting specific needs, but I guess you've got some specific concerns that really don't translate well into simple statements.
I've used mercurial only to get some external requirements or tools going, and never used fossil. Could you elaborate a bit on why git is worse than either of them and why I should consider switching ?
> To use git you need to know clone, pull, commit, push. For larger projects branch and merge
What if you committed to the wrong branch? What if you tried to merge commits from another user and made a mess of it all? What if you pushed something you want to roll back? What if you committed with the wrong commit message and want to fix it? What if you followed the policy of "commit often" but ended up with lots of irrelevant commits, and want to fix this so that it only has meaningful commits. How can you find who committed what? Or which branches contain a commit?
I know how to do all of this. But these are genuine questions a user of git will need to get answered, and git quickly becomes confusing/inconsistent once you're off the "happy path".
You need to know much more to use git in anything involving more than one branch. You need to know git checkout, git stash, you need to know how to fix conflicts, you need to know rebase vs merge and how to understand git log when you use merge, you need git reset, you probably need git cherry pick occasionally.
One of the major day to day annoyances is the fact that (by default) I can't work on multiple branches without committing or stashing all the time, since switching branches, instead of being a simple 'cd' like in other VCSs is instead a destructive operation on my entire repo (also causing re-indexing for all content-aware tools...). And if I want the normal behavior? I need to learn some other commands to set that up, and learn their filesystem restrictions...
Disagree. I could teach a child to use git. I don't agree it's elegant in a face value way. Yet if you've used other version control systems, git has features that you'd dream of (I "invented" some of the features of git on my own). So in a way it really is an elegant solution. I can't think of much that I really hate, or wish to change; and I can't think of any serious proposal to "fix" it.
People have probably been happy with their tools for centuries. Just because one cannot imagine something better doesn't mean there's no possibility for it to exist. If anything, this defeatist attitude may prove the author right.
With the exception of very very singular people, isn't every maximum going to be local? Even then they'll be stuck in a bunch of other local maximum's outside their own area of expertise.
Asymptotes. Sometimes it turns out we can solve a particular problem so comprehensively that "solve this problem better" is never a reasonable step. You can try it anyway, of course, but you're unlikely to get acknowledgement much less praise.
The answer to "Why doesn't my music sound as good as I wanted?" isn't going to be "CD's 44.1kHz and 16-bit PCM isn't enough". It might be "This cable has been chewed by a dog" or "These speakers you got with a cheap MIDI system in your student dorm are garbage" or even "the earbuds you're wearing don't fit properly" but it won't be the 44.1kHz 16-bit PCM.
Likewise, it is plausible that Git is done technology-wise. That doesn't mean there won't be refirements to how it's used, recommended branching strategies, auto-complete, or even some low-level stuff like fixing the hash algorithm - but the core technology is done.
> The answer to "Why doesn't my music sound as good as I wanted?" isn't going to be "CD's 44.1kHz and 16-bit PCM isn't enough".
Yeah, try telling this to a fan of 1960s or 1970s rock. You'll get an earful about rich guitars and fat synths, which only a 100% analog, tube-amp process from studio to ear is capable of replicating.
> Yeah, try telling this to a fan of 1960s or 1970s rock. You'll get an earful about rich guitars and fat synths, which only a 100% analog, tube-amp process from studio to ear is capable of replicating.
And anyone with a basic understanding of electronics should laugh in the faces of these people. The idea that a signal can be carried on a wire or recorded on tape but can not be replicated digitally is absolute nonsense.
If someone wants to claim that their preferred format captures higher frequencies than 44.1kHz sampling allows for, that's at least plausible, but that can be solved by using higher sampling rates like 96 or 192 kHz. At that point you've exceeded the capabilities of all mainstream analog storage media.
If they are looking for specific effects created when pushing the limits of analog hardware, like the "crunch" of a tube amp, that's fine too, but they need to acknowledge that they're treating the amp as an instrument in that case and its output can still be recorded digitally just fine.
I was happy copying my files back in the days for versioning (final_draft, final_draft01, final_draft_absolute_final, final_draft_use_this_one), but that was because I didn't know of anything better.
What I'm saying is that even though we don't see it now, there's probably something better out there waiting to be discovered.
I hope it's the last DVCS, if it can save me the hassle of learning a new one. There is some learning curve, but it works just fine once you know how to use it.
Over time you realize there are no transient states. You cannot neglect, for example, complex install just because it "happens once". Nothing ever happens just once. You will always have to reinstall, probably multiple times. So when people say the same thing about installation complexity, they are being naive.
In the same way, learning is not a transient state either. You will always have to relearn. Those impossible barriers that you eventually got through will reduce to speedbumps - but they will always be there, slowing you down. And if you don't use it enough, you'll have to relearn.
Also, be aware that once you've climbed a learning curve, at least unconsciously you are no longer incentivized to simplify it for those who come after. Why reduce the barrier-to-entry for others, after all? You got through it, so why can't they? And this is why generations of kids learn bad music theory, and generations of physicists learn bad particle names. It's important to be aware of this effect so you can counter-act it.
I agree. I’m always embarrassed to say that I still use darcs but it’s just entirely obvious how to use it. There is no mystery. The choice to prompt the user for thing makes usability insanely high.
Yes it’s slow for large projects but honestly I just deal with that.
>> The possibility of git being the last mass-market DVCS within my lifetime leaves me with warm fuzzy feelings.
Agreed. Technology should converge on a best solution so we can stop chasing things and get work done. Stable open source standard solutions are what we need more of.
Are you referring to the Vulkan API? If so, why do you see it as an anti-pattern example?
I like the API and I think it is a great and necessary improvement over OpenGL. I actually hope to see Vulkan become the ‘stable open source standard solution’ for graphics.
here's me sincerely hoping for pijul and/or darcs to deliver something much better. (obviously if it's only a little better then there isn't much point in switching.)
The interface is arguably the most important part. I.e. it is a tool developed for humans to use, so would ideally have simple, consistent, and by extension intuitive ergonomics. Elegance of internal implementation is secondary.
The interface can be swapped out if the underlying storage is fine. There is nothing preventing you from writing a different front end where “checkout” doesn’t do all the things.
Which is why there are dozens of Git front-ends, fragmenting the market, reducing the benefit of cross-training within a team. "Oh, you can't do that in your TortoiseStudioCodeThingy? I just right-click and select Frobnicate File, and it fixes all that!"
Thus editor wars, language wars...
Alternative: a tool like Fossil where the CLI is sensible from jump so the whole team doesn't replace it with something better, uniquely per team member.
Yup, doesn't sound like a negative to me. People will still try to build simpler ecosystems like GitLab/GitHub/Atlassian, with similarly mixed results.
Every version control system that's become dominant in my lifetime became popular because it fixed a major obvious flaw in the previous dominant system (RCS, CVS, SVN).
From where I sit, Git has a couple obvious flaws, and I expect its successor will be the one that fixes one of them. The most obvious (and probably easiest) is the monorepo/polyrepo dichotomy.
If I were to pie in the sky dream up a replacement for git, I'd have it store the AST of the parsed code instead of a text file. It would solve a lot of problems with refactoring crapping all over the history. Like I said, pie in the sky. Probably never gonna happen.
Personally I don't see git's problems with large binary files and tens of millions of commits as being major issues. Those two alone are way less valuable than git's ecosystem and mindshare.
What you want has nothing to do with git (which is a storage model). You can use arbitrary diff and merge resolution algorithms with git's plumbing, which would give you the AST-aware functionality that you want.
I used VisualAge's Envy for 3 years while working in a Smalltalk project.
Envy does versioning of classes and methods, and you can programmatically access to the model.
It allowed us to build tools around the VCS. For example, we had tools to merge multiple feature branches and resolve conflicts automatically.
We also used the same tools to produce migration scripts for our database (GemStone).
That was 18yrs ago! and today sounds irreal.
You can build tools on top of git, but the versioning “unit” gets in the way. (e.g imagine the possibility to encode refactorings in your change history and reapply or rollback them).
I’m not trying to criticize git. I think it is the best file based DVCS. My point is that many dev tools that we use today are extremely rudimentary, because the lack of good abstractions. And I don’t think that git provides a good model to build those abstractions on top of it.
That's like saying DVCS has nothing to do with VCS -- it's just the server model. Every major advance has been accomplished by increasing the scope of version control.
Arbitrary diff/merge in Git is a great example of the Turing Tar-Pit. It's possible, but prohibitively inefficient for many things I want to do. You can't add your own types, index, or query optimizations.
Today, if I want to store data for my application, I have a choice between good support for rich object types and connections (e.g., Postgres), or good support for history and merging (e.g., Git). There's no one system that provides both.
> Today, if I want to store data for my application, I have a choice between good support for rich object types and connections (e.g., Postgres), or good support for history and merging (e.g., Git). There's no one system that provides both.
I like the way you put this. In case anyone's interested in brainstorming I'm dabbling in this problem with a thing called TreeBase (https://jtree.treenotation.org/treeBase/). It's still a toy at this point, but it stores richly typed data as plain text files to leverage git for history and merging and then can use SQLite (or others) for querying and analysis. A very simple database in the wild looks like this: https://github.com/treenotation/jtree/tree/master/treeBase/p...
Couldn't you store the exported database as sql commands? I'm not familiar with every git hook, but if there aren't enough to automated that I guess you could wrap it.
The slowness of destroying a whole database and then recreate it when checking out should be something you can handle by relying on the diff to generate a series of delete commands and a series of insert commands.
But yeah, I guess committing will be slow if you have a lot of data to export. For the time being, it's a trade off to be made.
[I might consider testing this with my current database project. But I'm using SQLite so I guess that implies a lot less data than Postgres.]
frutiger is correct that the diff algorithm has nothing to do with git itself, in that git can accept pretty arbitrary diff algorithms in the first place for all the commands that take one.
Check out git-diff(1) and --diff-algorithm. --anchored is the one I find the neatest.
DVCS indeed has nothing to do with VCS, it has a lot to do with the data model used by the VCS.
A modern but still centralized VCS like Subversion or Perforce is what you get if you first add networking (CVS) and then atomic commits. Without atomic commits you are pretty much forced to keep a centralized server, and Subversion didn't try to change the server model after adding atomic commits.
DVCS instead is what you get if you start with local revision tracking like RCS, and add atomic commits before networking. Now the network protocol can work at the commit level and is much more amenable to distributed development.
Why do we have byte code? Why not run everything in interpreters? Because parsing pure text takes a lot of work. So we store it in an intermediate mode to economize.
Saying just parse it every time is denying that there are very real costs associated with hat decision.
It would have to be specific to certain languages, which would, in turn, hinder adoption of new languages to some degree if the git-next took off. So, I'd prefer not to have that be a feature. :)
I think you could write it as the ability to a diff on a binary ast without fussing too much about what the ast represents. Then you merely need to write a parser/serialiser combo for your language to the ast as a repo plugin.
Otherwise it won't just be new languages which suffer, but users of supported languages will suffer when there's an upgrade.
Most language ASTs don't encode unfinished or work-in-progress code very well (the difference in AST shape between missing one `{` and the fixed code can be substantial). You may think it better to always only commit working code, but your source control system is also a backup system if you need to save a work in progress branch to come back to it, and also sometimes a communications system if you want to request a coworker examine your code to help you pinpoint bugs you can't find or review work in progress.
Most language ASTs also don't encode useful-to-the-programmer but useless-to-the-compiler information like comments and whitespace. There's been good progress in that (the Roslyn AST system has some neat features), but in practice an AST is always intended more for the compiler than the user/source writer. This also is reflected often in speed, a lot of languages have a relatively slow AST generation (which would add sometimes very noticeable wall clock time to commit time, depending of course on language and hardware).
Plus, of course, all the usual bits that ASTs are extremely varied among themselves (some are weirder DAG shapes than trees, for instance).
An experiment I ran was to the use the "next step down" from full AST which is your basic tokenizer / syntax highlighter. Those designed to deal well with malformed/unfinished/work-in-progress input, and to do it very quickly. Years back I built a simple example diff tool that can do token-based diffs for any language Python's commonly used syntax highlighter Pygments supports. [1] In my experiments it created some really nice character-based diffs that seemed "smart" like you might want from an AST-like approach but just by doing the dumb thing of aligning diff changes to syntax highlighting token boundaries.
You could even use it/something like it/something based on it today as your diff tool in git if you wanted, with the hardest part configuring it for which language to use for what file. (I never did do that though, partly because the DVCS I experimented with this for didn't have a pluggable diff system like git does, nor did it support character based unidiff as a storage format which the experiment was partly to prove both ideas could be useful.)
Look into Unison, a language that stores the AST and immutable history of all functions to provide a combination of package manager, IDE, and DVCS. Once you store the AST and all history, some fascinating side effects happen!
I want an editing environment which operates on the AST of my code (obviously it would have to support every language I wanted explicitly to do this), so that files become entirely irrelevant, I never have to worry about formatting differences or where things are or whether that function is in that file or that file. A bit like working in a Smalltalk image.
If that was then extended into the version control system that'd be even better. Oh yes.
But getting a new language into these things would probably be a nightmare.
Large files and long histories hinder its total dominance in the game and art industries. Because of git's shortcomings polyrepo is a near necessity not simply a stylistic choice. LFS is a bolt on solution that could/should have better support.
> Because of git's shortcomings polyrepo is a near necessity
I'm intrigued by this claim. I've come to the opposite conclusion - that monorepo is near necessity with git because there's no tools for branching/rebasing multiple repos at once.
After using both I can say both have problems. Polyrepos lack tools for working with multiple repos simultaneously, and require more attention to versioning. Monorepos have longer histories, and the large number of objects can hurt performance.
Which one you should use depends on which downsides are less impactful for your use case.
Git stores the diffs in chronological order doesn’t it? I recall reading about someone doing a commercial implementation where the commits are stored in reverse chronological order. I’d been thinking that was github but I’ve never been able to find the article again.
Git's model (which it copied from monotone IIRC) is not diff-based, it's snapshot-based. That is, commits are not stored as a diff to the previous commit, but as the whole state of the tree plus a pointer to the previous commit(s).
As an optimization, when it packs several objects together in a pack file, it can store objects as a delta to other (possibly unrelated) objects; there's a whole set of heuristics used to choose which objects to delta against, like having the same file name. And yes, one of these heuristics does have an effect similar to "reverse chronological order"; see https://github.com/git/git/blob/master/Documentation/technic... for the details.
I don't think that's true. I've been using git's CLI since I started using git a few years ago, and exactly zero of my problems with git could've been solved by a different user interface (be it GUI or a "better" designed CLI). Pretty much all of my problems have been with my lacking understanding of the abstractions that git uses to make all of the powerful things it can do possible.
You are in the minority. It's such a ubiquitous experience, a running joke in the industry. Saying a tool is useful and powerful, is fine and good. That sentiment has nothing to do with usability.
Isn't that the same thing? If you need to be taught the underlying abstraction to be able to understand the UI you've got a text book example of leaky abstraction.
After I had to unfuck a repository for the n-th time, I trialed a switch to Mercurial, we switched shortly after. I can count on one hand how many times I've had to intervene in the last few years.
> It would also be nice to have a repo that isn't language-agnostic. It's too easy to track non-semantic changes, like white space.
In my opinion this is a problem with programming languages rather than version control. Namely we mix presentation and representation when using text as our source code. In the case of whitespace we have an infinite number of syntactic presentations which all correspond to the same semantic representation. Tooling has been created to try to deal with this such as code formatters which canonicalize the syntactic presentation for other tools. Git itself even has to deal with this because of platform differences, i.e. LF and CRLF.
I tend to think that it's fine, because I remember what it used to be like... what we have now is the "easy" UI!
And they do occasionally put some new stuff in that helps. Like the recent version which adds new commands to split out the two completely different uses for `git checkout` (making/switching branches and reverting files).
There is zero doubt in my mind that it's Github that "made" Git. Without it, it would be just another one of many DVCSes. Git's value isn't inherent, it's all down to network effects.
Git submodules have lots of problems. I wanted it to work like a symbolic link to another repository so that I could develop both at the same time on the same machine. Just like how pip allows me to install a Python package in editable mode. If I make a change in the submodule, the superproject should automatically see it.
How about history tracking for file renames doesn't work well? It's hit and miss if git blame --follow works.
Subversion did this better even before Git existed.
Git is completely oblivious to moving code from one file to another. Git blame will never show you the original commit if you just relocated a method to another file. Due to this, refactoring often put additional hurdles into exploring the code history.
yep, nemo got it right. git basically hacks cherry-picks in the same way previous VCS’s hacked branches.
unfortunately there are no patch-theory based VCS’s with a practical level of usability. what git was to monotone, X is to darcs/pijul, where X hasn’t been created yet.
Merge conflicts are not so scary, and are an elegant way to handle distributed changes with simultaneous edits to a single file.
If the conflict is huge, rebasing can help you by "playing" the commits from one branch one at a time so the conflicts are smaller / easier to fix.
At a previous job, a team was forced to use checkout-style VCS due to their manager's unfounded fear of merge conflicts; I couldn't go in that office without hearing one developer shout to another: "Hey, can you finish up and check in that file so I can get started on my changes?"
I’ve spent too much time helping others fix bad merges, and I still catch myself making mistakes. There’s a lot of work that could be done for clarity and error avoidance.
The only way to make that simple is to centralize the version control system so that you can have a single arbiter of who has what locked. To add easy locking to Git, you'd have to turn it back into a non-distributed VCS.
I don't think I need to sell the value of DVCS over VCS, but what seems to get lost is that buys you a certain amount of essential complexity, expressed in the CAP theorem and its consequences.
I'm talking about purely advisory locks. Accessing a locked file would let you know who locked it, and if you unlock it, it would just notify them. So it's just a communications mechanism in addition to regular merges.
The alternate method is that locking a file marks you as an interested party to a merge, allowing you to review the correctness of a merge.
This would be purely to avoid changes being lost during merges.
However, in some cases (like non-mergeable binary files), it is actually better to have a system that allows one user to take a lock on a file and have exclusive editing abilities. The git protocol has no support for those workflows, and so people end up using a Google Doc or something to track who is modifying what file. Definately a place for improvement.
This consideration is actually irrelevant to locking non-mergeable binary files. It doesn't matter what branch we're on or where the file is located, only that you and I both want to edit the logo. Eventually, either your version must be based on mine, or mine based on yours, since they will be merged.
So it's probably better not to have that file in Git, since it doesn't support the workflow around which Git is based.
It's actually right to store your design documents in Google Docs or a wiki and your code in Git, rather than everything in Git.
It is easy to have one filestore to rule them all and in the darkness bind them, but if you want to do different things with them, you have to do different things with them. I'm not sure that it's possible to unify text file and binary doc based workflows, but it seems we don't have to worry because users automatically use the best tool for the job and it's only hackers who tie themselves in knots trying to make git do everything.
I've worked in teams where developers seemed terrified of merge conflicts, to the point of telling eachother not to edit a particular file which seems absurd. Maybe I don't know any better but they seem like part of life.
When at least one person is making changes that touch large parts of the file, it's very sensible to ask others to not touch it, if you don't want to spend hours merging it manually later.
I’ve read a couple descriptions of the internals of git that don’t disagree with the design of SVN, so I’m not unsure why you can’t theoretically check out a single subtree. Either the documentation is too hand wavy or some implementation details have blocked that possibility.
Can you explain more about that? I've worked with both and I feel like monorepo is kinda a pain, but I dont understand where Git fits in to either directly. Seems like it just snapshots files.
It's only a problem at a really large scale. At the scale of Microsoft or Facebook, there are factors that lead to the use of a monorepo being more efficient. At that big of a scale, companies have enough resource to develop internal tooling to deal with the problem (e.g. the use of Mercurial at Facebook).
FWIW, in case of Microsoft at least, it's more a question of product size than company size. Microsoft doesn't use a single monorepo for everything, like Google (so far as I know) does - just look at http://github.com/microsoft/; and that's not even counting all the VSO repos! It uses product-specific monorepos for some large products.
It would be nice if you and everyone else stopped gatekeeping this problem.
We have 600 devs and face these problems. I can assure you we sure as hell dont have the resources spare to reroll git. We're way too busy rerolling everything else.
In this context "monorepo" means a huge repository with many many revisions. Git has several well documented deficiencies in this scenario. See [1] for microsoft's experiences with git.
When people talk about killer features missing in Git, there is more beyond the UX and mono/poly repo.
One thing is code review. There is no code review in Git.
What I expect in 2020 is that I should be able to specify reviewers for the commit (which I pick out of a list of people who can approve it). These people should be able to leave comments on the commit. I should be able to both respond to comments and modify the code before the commit gets checked in. The history of the comments and changes should be maintained.
There is nothing in Git that supports this flow in a natural way.
A replacement for (or evolution of) Git can be a tool that would support this code review flow from the get-go.
And yes, there are external tools for code review. But that all should be a part of version control.
I don't think this should be part of version control, simply because is too tied to the environment and development practices that may not be shared by the whole set of current and future developers of any given project.
The version control should keep the code history, not the paperwork history.
What I think you're looking for could, however, use git as a platform for that. That's what GitHub, GitLab and the likes do using a web interface and there's enough extensibility and power on git for command-line or desktop tools to do it.
The internals of git are very much akin to a filesystem, by the way, and the plumbing gives you more enough access to use that in creative ways.
From the top of my head, maybe a system like this could automatically generate tags for code reviews, use merges and branching for answering to these reviews, and hooks for notifying interested parts on particular areas of code. All while the messages and reviews themselves travel on another data layer, which references git but does not mingle with it.
This separation (and even tag cleaning, for example), would be specially useful on huge distributed projects where a company or small team may have whatever development process it needs internally and sharing only the results without having to completely hide the code history.
I think the distinction you're drawing between code history and paperwork history is more arbitrary than you give it credit.
If all we cared about was code history, a super pure "version control" system would have one trunk, no branches, and a sequentially increasing version number with no commit messages or author information.
But if you can annotate an entire commit with a descriptive message, why not annotate a specific hunk of code changes (comments), or the discussion over those changes in a specific version?
Environments and development practices may change, but the code review that led to a particular code decision at some point in history are a valid representation of the context at that time in history.
Don't get me wrong, I'm not lacking these features in Git, and am happy to get them from other platforms like Github, but I think the comment you're responding to is astute that one can imagine a git replacement that incorporates code review functionality as a first-tier feature.
> I think the distinction you're drawing between code history and paperwork history is more arbitrary than you give it credit.
I don't agree. The responsibility of a version control system is to manage the changes made to the source code. The paperwork that goes with it is an entirely different, and separate, responsibility, just like ticketing systems and keeping track of tasks, epics, sprints, etc.
> If all we cared about was code history, a super pure "version control" system would have one trunk, no branches, and a sequentially increasing version number with no commit messages or author information.
That doesn't make sense because you're ignoring the fact that branches are used to host versions that are being developed independently at a specific moment in time, and merges are used to finally join contributions when they are ready to be added to the main branch. If you look at single branches and ignore the work being developed in any other branch, the history is as linear as you expected it to be.
> But if you can annotate an entire commit with a descriptive message
If you can annotate an entire commit with a message, why not allow annotating parts of a commit (file/span) with a message, and why not allow those messages to form threads?
At which point you get code review that is logged entirely in the history of the repo. Which is a very useful feature, and a huge value proposition of GitHub over raw git - git blame gives you the commit, but GitHub will also gives you the pull request for that commit, and you can go and look at the comments there to understand why it was done the way it was.
You could actually do that today with the Git model if you want to, so, arguably, the question is, why doesn't your code review tool already put its history in git?
(Probably some combination of nobody asking for it, and in some cases, wanting to keep your code review process locked in to the tool.)
I kind of agree with grandparent, though. When the IDE, source control, code review and task tracking work together seamlessly, it's glorious. If your software says "it's not my job", it will be outcompeted by software that says "heck yeah it's my job".
I would like a workflow with all of the features of GitHub (code reviews, issue tracking, etc), but everything stored in Git.
Perhaps every project might have a branch called "issues" or "reviews", etc. Not sure the best setup but generally the more I can do in my code editors the faster I can work.
Perhaps commits should support key:value metadata? (I was about to say "tags", but that means something different here) That would let you support "reviewer:person@foo.bar" or whatever you want, without baking workflow assumptions into the VCS.
I disagree. Git handles this through the power to push/pull and muddying that with a bunch of alternative baked in user flows seems like a mistake. Third party software can handle adding meta-information to pull requests.
> And yes, there are external tools for code review. But that all should be a part of version control.
There are many different code-review workflows, and for teams that do full-time pair-programming, code review happens in real-time as the code is written.
Trying to bake support for all of that into something that is also a good VCS sounds like a recipe for one of those clock-birdfeeder-machete-flashlight-massager tools that you used to see on the back pages of outdoor magazines in the 90s.
This is exactly the issue with it: its author(s) lament that git has not all of these things integrated, but they are ignoring that this is exactly why fossil cannot gain a wider adoption.
It has opinions about how the team and the project should be run / managed / documented. This is not for the tool dev to have these opinions
Code review comments really should live next to the code, and proper editor integration goes a really long way. Traditional review processes make “just fixing the code” have a very high inertia.
I think it doesn't exist because there is no demand for that to be part of the codes history. Nothing you've described sounds very useful after a month or so.
All this talk of Git-sympathetic code review tools and nobody has mentioned Gerrit, which seems at least somewhat close to what is being described. Each patchset of each review is its own Git ref and is often referenced in the final commit.
Separately, I know many including myself who would love for code review comments to more seamlessly be integrated into the code browsing experience.
People also seem to be conflating review comments associated with lines of code with an opinionated code review tool.
You can keep the banter with the code and the go-no-go decision separate, even external. But post mootems have worked better when someone realized that one of the team repeatedly calls out a class of errors that bite us later and they’re being ignored. You have the ability to prevent this error. Wise up or that person will decide rightfully that we are a bunch of clowns and leave.
That's the one major point where Fossil probably falls short: Extending a repository's capabilities would add overhead to every single one of them (and, potentially, to each commit's meta data).
Yes there is. You can block pushing directly to a branch of the repository and demand that things can only be merged through pull requests. You code review the pull requests.
This is very well supported by Atlassian's git tooling, for example. It does everything you mention. It's true that the comments don't become part of the git repository itself, but I'm not convinced they should be.
It is necessary but not sufficient for any new contender to do at least the following to have any chance of taking over:
- Interoperate with the major player(s), currently Git and in many places unfortunately still Subversion. svn2git probably did more for Git adoption than any other feature or tool, because it allowed a fairly painless transition without losing information.
- Solve at least one big problem with the current contenders. Git made it possible to run VCS without a separate server program and sped up VCS operations massively. Both of those were huge. Looking at the Fossil home page[1] it does have some features I personally have wanted in VCSes, such as integrated "bug tracking, wiki, forum, and technotes," but the devil is very much in the details of how that actually works (How easy is it to write your own custom frontend or add business-critical bug tracking fields, for example?), and it's not like we don't already have good bug trackers, wikis etc.
Just as a tangent, some issues (but not necessarily major or fundamental, depending on who you ask) with Git as it works right now:
- Does not use cryptographically secure hashes, and has no clear migration path to a different hashing mechanism.
- Git Annex is not yet built in.
- The command line is complex, including many niche subcommands and tons of rarely used options.
- The command line is inconsistent, such as `git rm` vs `git branch --delete` vs `git remote remove`.
- It is based on a less than ideal theoretical model of patches[2]. IMO this is the most exciting development in VCSes since Git.
Ammending changes the hash of the commit. That doesn't matter if you haven't pushed it yet but if you have, you can't push the change upstream without force-pushing.
And if I want to change a commit message further back in history (regardless of my own or of somebody else), it would branch off at that point from the upstream git history.
What I want is that you could put multiple comments onto the same commit with different timestamps. Like git versions files, it should also allow to version commit messages.
Not necessarily. Fossil's `amend` command works by adding additional information to the repo that the web UI and commands like `fossil info` look at when building up information intended for direct consumption by the user.
In this way, you can edit commit messages, rename branches/tags, add/remove tags, etc. to historical check-ins without breaking the blockchain / Merkle tree of commits.
This allows Fossil to keep all of the historical information about what happened to a given file, commit, ticket, etc. while still allowing a coherent presentation of the current state of affairs to the user.
You can accomplish something like this by using git-notes (https://git-scm.com/docs/git-notes). You'd have to have your own tooling to read/write these in a convenient way of course.
Yes but they are just a patch over the missing functionality.
The key point is "your own tooling". Git has a great tool universe but unfortunately you immediately lose many tools if you go beyond what the core offers.
Everyone's has their own tastes and preferences, of course, and I respect that yours is different than mine. That said, I used and loved CVS and then SVN for years and didn't get why all the kids were fussing around with this new Git thing. I finally made myself try it for about a week. At the end of that experiment, I ported all my repos from SVN to Git and quickly set to purging all Subversion-related knowledge from my brain. There's literally nothing about SVN that I prefer to Git, other than its UI was a little more pleasant.
I actually LIKED having a central repository, which many of us still seem to prefer (i.e. GitHub, GitLab, Bitbucket). I switched to git mainly because my colleagues were all using it. I found it difficult to use, at first, because of my expectation of a central repo. Many years in, however, I see extreme value in having all your history locally. Specifically, never having to worry about a server crashing or your "host" going out of business.
Same. I didn't get what was better about a distributed system for small projects, until it fully clicked with me that my copy was just as "official" as any other sitting around, and that any "server" was a copy that we collectively decided was going to be the one of record. I can't imagine any plausible scenario in which I'd go back to having a central SVN-style server.
Having an official central server is precisely what you want in most development environments. Being able to go back to any prior version of the code is tremendously useful and something git can't do.
For most of us, even with DVCS there is one copy of the repos that is more important than all the rest - the one attached to the CI or deployment tools. It’s only decentralized as long as this copy has only short periods of unavailability. Hours are bad, days are much worse.
Linus seems to have a very different development model than almost all of the rest of us. And if Linus is asynchronous, then so is the codebase he maintains.
If you or I have eight different PRs going it’s because we have eight different promises being made to customers. Rejecting them has real consequences that we feel. For Linux many of those are externalities.
> I actually LIKED having a central repository, which many of us still seem to prefer (i.e. GitHub, GitLab, Bitbucket).
You're confusing a hosting service with being forced to use a centralized repository.
Take GitHub, for example. If git was centralized them you would not have forks, multiple remotes or multihosting, or could even work independently of the remote server.
With Git, you can even set up a repo in a network file system somewhere, or even a USB thumbrive.
Git makes it easier to do a fork but you could certainly fork a subversion repo. I’m fairly certain you could sort of a merge process across multiple upstreams.
The "fork" concept is not native to neither Subversion nor git.
You are probably thinking of branches and tags, and those are used similarly in both systems. They are a bit more convenient in git since they are created in constant as opposed to linear time.
Copies in Subversion are only metadata. Since partial clones are native to the system, it is simple to present both branches and tags as file paths. It was likely considered an easy user interface. Everything is a file and all that. In comparison, git users must learn the git object naming scheme, otherwise things can end up very confusing should you have a directory and a branch with the same name.
No, I was referring to forking projects, in the sense that Git allows for creating brand new and independent repositories that clone the version history up to a point in time and enable to add the origin Git repository as a remote repository.
SVN also does not support branching or tagging, as it actually supports only copying directories around a file system.
Having history available locally also means you can perform interesting operations on history -- like "git blame" -- without making the server do all the heavy lifting.
Right, especially after a "squash", which seems to be the standard way to merge branches in the companies I've been working with recently. (Which is, ironically, also the way Subversion merges branches. With the exception that "svn blame -g" will go into the commits which were squashed if you want. An option which doesn't exist after a "squash" on Git.)
Fossil's opinion on this is that history is an immutable record of project history. It may be messy and unfortunate at times, but it is what happened, and it shouldn't be altered in place any more than you'd do that with an accounts ledger.
If you are their superior, other users disregarding your orders is s social problem, not a technical one. If you aren't, it's a good thing they are able not to do what you want.
Tools being more flexible is strictly a good thing. If they are misused, the person that misused them is responsible. It is that simple.
> Which is a better choice than Git for most projects, to be honest.
That's pretty subjective. Most engineers I interact with personally know how to use Git but not SVN. I'd argue the opposite is true base on my subjective experience.
Really depends on where/how you're using versioning, team familiarity, etc.
It took me maybe eight months before I was willing to perform open heart surgery on a subversion repo. Two years to dare to try the same thing with git, and I was much more anxious about the whole thing. At the three year mark I have the same ambivalence about things working out that I did in less than a quarter of the time with the previous system.
That’s a little too much to explain away with variability.
With time, the standard requirements rise and we get more demanding.
How about a VCS that not only diffs line by line but saves the diffs as editing delta?
Or that understands the syntax of the code and can differentiate a variable renaming from an actual code change?
Or connecting different repositories is hardly solved with Git. Submodules suck, grafting even more so. Or you can't make one repository out of two and keep their Git history intact.
I’ve been wondering if we need a slightly more general tool for tracking changes, that happens to have a VCS commandline as a first class client of that system. The problem is how to do bindings in many languages, or share one binding with some sort of IPC/RPC protocol.
I have some collaboration tools I’d like to write but creating my own edit histories and conflict resolution is daunting.
I am happy to use whatever everyone else starts using, provided it works at least as well. But I also do not have many problems that git won't solve, so I don't feel a burning need to switch. I think git is good enough that source control is no longer a very interesting problem.
Speaking personally, definitely no. Even before I heard of git, I did not like SVN. In particular, it made an error I don't hear many other people mention where directories were checked out to revisions, rather than entire repositories. Consequently it was very easy to accidentally update just a subset of your repository, and have your whole repository in a state that didn't exist in the version control system at all. There were several times there were builds that were completely unreproducible because it turned out that they were, technically, r83726, except this directory was r84713, but this other directory was r78372 except for a subdirectory that was r84299, etc.
That problem I saw even before I saw a replacement.
But once I saw git, having a full copy of the repo was pretty killer. It was also better at merging, partially for having the history local. In fact I recall some SVN people protesting at the time that merging wasn't so hard, but in a nutshell, they were just wrong. It was harder. Source control that can't merge fluidly is pretty limited from the get-go. There's literally entire dimensions of things we do with git that were so impractical with SVN that only the very largest projects could afford to do them, with the whole "workflow" question. Now you don't even hear about the difficulties of merging since all the open source SCMs copied each other and "good merging" is just table stakes now.
The problem is is that git doesn't really have fatal flaws, it has annoyances, and that's not the same thing. "The UI is difficult" is an annoyance for its target audience. Most every other criticism I've seen of it is an annoyance, not a fatal flaw. Fossil doesn't solve a problem I have with git. It has some neat ideas and arguably does solve some problems, but it doesn't solve the problems I have with git; you could lift Fossil's solutions to wiki and bugtracking and just put them in Git and I'd probably be happier with that than Fossil itself.
The problem isn't that Fossil is an SCM competing with git; the problem is that Fossil is a monolith competing with the rapidly-moving git ecosystem. The former is perhaps beatable, the latter is a juggernaut. All of Fossil's other features sitting in git is something that might get somewhere, but if the base source control isn't git-based, I can't sneak it into a work project to try it out. I've got a corporate mandate that all source belongs somewhere standard, and it's a perfectly sensible requirement for someone paying me to do a job.
I would check, double-check, and triple check whatever use case you miss that for. It is a serious misfeature, and I find any workflow that critically depends on it to be very likely a process smell.
Handling of blobs I can at least see preferring the differences, though git-lfs mostly acceptably shims that for me. The need to use git-lfs is definitely an annoyance, but it's another place where it's probably not a fatal flaw.
Or, to the extent that it arguably is, it would be when you're using git as something that it really, really isn't... it's a source code management tool. It's a good enough source code management tool that it's useful for many other things as well, like the way some people directly drive their personal websites with it. But it isn't a tool for managing lots of large files that can't be textually diffed, not a generic "content" management tool. I wouldn't expect git to necessarily store terabytes of video files, but then, it's not like Fossil is going to do that either from the sounds of it. git-lfs shims content blobs well enough to make it just an annoyance for source code management.
In the case you mentioned of "just a few commits" it's of marginal utility, because you can do that manually almost as quickly, but it's a killer when it comes to "I don't really know when we introduced this" and you have to sift over thousands of commits. Unfortunately, it requires that you have the discipline to keep every commit on the relevant branches buildable (enough so to find the bugs), which is something you can do going forward but can't retroactively apply to a code base very well.
None of the other big VCS had fatal flaws. If they had fatal flaws, they wouldn't have been used.
Sure, not being able to rename a file easily in CVS was quite an annoyance but nothing that made it completely useless.
There were always workarounds. But exactly those areas were workarounds or better tooling patches over stuff Git doesn't do are the areas that a possible successor will address.
"None of the other big VCS had fatal flaws. If they had fatal flaws, they wouldn't have been used."
That's just quibbling about definitions in fuzzy English. If you want to insist they weren't "fatally flawed" at the time, fine, but they certainly are relative to 2020 even by your definition.
You're really doubling down on applying deliberately hostile definitions to what I said, aren't you? I get to tell you what I meant. You don't get to pick the definitions I was using, especially after I told you you explicitly you picked the wrong one.
You didn't really define what you meant with "fatal flaws". You just claimed that git didn't have them.
But if your Subversion example is one of those fatal flaws, then I just link to his example of Git usage that probably happens more often than we want to admit https://xkcd.com/1597/
Using your tool wrong is not a fatal flaw of the tool.
I don't think there are many people that considered CVS good enough, even back in the days. For example, there was no way for end-users to retain the history of a file when renaming/copying it. When the FreeBSD project still used CVS, committers always had to file tickets against the admins to ask them to do it server-side through a so-called 'repocopy':
One big thing that DVCS brought to the table were local serverless repos. Now you could version all the things, with very little effort, and promote it to a "proper" repo later.
The other big thing was free cloud hosting for the repos.
Neither of these are unique to Git, obviously, or even introduced by it. It just happened to be the right combination of features and speed at the right time to become the winner of the popularity contest. Kinda like C.
And, just as C is still around, for all its numerous horrible quirks, Git will likely be around for a long time as well.
When I started developing the options people were taking seriously were Mercurial and git. At first I was using mercurial, but since everyone else was using git I switched to that. So those are the only ones I've seriously used.
These fell by the way side as they never had consensus of being better than SCCS. Now git has the weight of the Linux kernel behind it which pretty much EOL'd all other source control mechanisms.
I don’t think Linux kernel matters that much - only a tiny fraction of the total number of developers use it.
From what I saw, it was git vs hg - which had different philosophies. Hg gave you nice, polished workflow for supported tasks. Git gave you building parts that you can make your own system from. It turned out that enough programmers wanted to build from parts.
Not sure what variant of SVN you were using that didn't support branching, everyone I knew used branches.
It did have problems with occasional inexplicable tree conflicts, which I do not miss. But that's the same as the pain of trying to rebase in git when things have diverged and it just starts spewing repeated conflicts for each and every commit. In either case, the easiest thing to do is make a fresh branch, patch your changes over, and go from there vs. trying to reintegrate the broken branch.
Subervsion did support branching but it was so fragile (especially before version 1.5 or 1.6 I don't remember) that some went out of the way just to avoid using branches in SVN. Those dreaded tree conflicts…
I actually prefer SVN and HG style branches where you keep the branch history as a known formal branch, then have one merge back to Trunk.
It is unclear to me how far back you are saying SVN did not have branching.
I don't remember a time when Subversion didn't support branches. I'm sure Subversion 1.0 supported branches (although I can't find evidence of that right now, but I also can't find any release notes for Subversion 1.n containing "now we support branches!")
One can argue about what a branch is, for sure. Subversion branches, Mercurial branches and Git branches all have very different implementations.
But the main workflow of creating a feature branch, developing a feature, then merging it back into the trunk/master, has, as far as I know, always been supported in Subversion.
> I don't remember a time when Subversion didn't support branches.
SVN's official response to branching and tagging was to copy directories around in the repository. Arguably that means it does not support branching, at least according to the concept that has been in place for the last decade.
That's true that there was no command "svn branch", there was only the command "svn copy". It's true that command could have been better named, like many Git commands could have been better named.
However, don't let that confuse you. The "svn copy" command creates a new place where you can do independent development, and where afterwards merging back can occur. So it's the same as a branch by that definition.
An "svn copy" is a lightweight copy, with a reference back to the place the copy was made from.
- "svn merge" does its merge by looking at what commits have been made at the original location (since the copy), and the commits to the copy.
- "git merge" does its merge by looking at what commits have been made on the original branch (since the new branch was created) and the commits on the new branch.
So "svn merge" and "git merge" act in the same manner. (Of course there are differences in the algorithm, but I wish to refute the point that Subversion "does not support branching [at all]".)
And if you never use "svn merge" or "git merge", then a branch is just a copy, in either system.
> That's true that there was no command "svn branch", there was only the command "svn copy". It's true that command could have been better named, like many Git commands could have been better named.
It was not a naming issue. It was a bona fide lack of support for a basic feature. The manual itself states quite clearly that in SVN land you create a branch by copying around the entire working directory within your repository. SVN's manual is quite clear on how SVN actually tracks the state of a file system, and not the state of a source code tree. By copying the working directory around the repository you're creating new revisions of your file system. That' it. Just because you can diff two directories it does not mean copying a directory around in a file system is a branch. And nor is it a tag.
Considering CVS had branches, Subversion probably had them from the start or at least some early development version. Very different under the hood though, as you say.
Bazaar and Mercurial are dying...
Bazaar is dead, and Mercurial is near death.
Linus wrote the Linux kernel and it became the "de-facto standard" for web application stack servers (and phones, and watches, and Chroemcast and all sort of `Internet things`)
Linus wrote git and it became the "de-facto standard" for version control.
If a world had 23 more Linuses, we would have full control over our SaaS and "Clouds" and not locked-in as we are at the moment.
It's likely that there are many Linus's but they are not in positions (third-world, poverty, poor parenting, dictatorships) to achieve the same. That's why I think the biggest progress in tech/innovation will come with lifting as many people as possible out of those situations
Facebook[1] and Google[2] have both publicly stated that they are using Mercurial internally, so "near death" seems like an exaggeration. Mercurial has some significant advantages over Git if you want to implement a scalable backend for really large repos.
Fossil's new(ish) semi-automatic bidirectional interaction with Git mirrors has finally made Mercurial replaceable for me. However, it is hard to ignore that the IT world has mostly decided to settle with whatever is the most commonly used right now, no matter if it is actually the best solution - so at least when it comes to DVCS, the war seems to be over.
Does not Fossil’s “no rebases” philosophy bother you?
IMHO, the ability to “commit early, commit often” and to squash/rebase original work later into clean and understandable commits is one of the best features of Git.
Right - and I think that point 6 and 7 are exactly why I use rebase daily, and why I will never use Fossil.
The version control history is made for reading by other people, so it is a story. After all, we only write the commit once, but people will read it many more times. (This is especially true if there are code reviews involved)
I am an imperfect programmer. My work-in-progres is often broken, and even fails to compile. Sometimes I will refactor interface only, and will want to checkpoint my work before I go and refactor implementation as well. Sometimes I will choose a totally wrong approach and revert it later. Sometimes I will disable/break large part of system on purpose, to make testing easier. And I often do stupid data-destroying mistakes, so I want an ability to save/store all the past versions, even if they are completely broken.
My “raw” commits may look like: “start on feature X”, “refactor interface Y”, “more work on feature X”, “wip commit”, “fix tests”, “fix performance”, “fix more tests”. Does any future reader care it took me 3 commits to get the tests right? Do they care that I discovered the need to refactor only while I was halfway in feature X implementation? Do they want to see a repo that won’t even compile? Do they want to hav to cherry-pick dozens of commits to get the tests to pass? I don’t think so.
The final version will only have two commits, “refactor” and “feature X”. It would be obvious to everyone which lines of code are associated with which change. Each revision will be buildable, and will pass all tests - so bisect will actually work.
(If rebase support is missing, it is possible to “fake” it by having multiple checkouts and manually copying files around. But this is much more error prone and dangerous. I have spent plenty of time with SVN/CVS, manually copying files and applying patches - and I can tell that having this integrated with version control is much more pleasant)
But this doesn't hold for things like languages, adoption of new tech over old (no matter how prominent it is) happens all the time, but it usually starts at the small scale, like start-ups. It definitely takes a hell of a long time, and it would probably never usurp 100% of gits usership (or even 50%), but it will probably be used in some capacity. Of course Java's still as big as ever, but we have seen an uptick in relatively new languages still coming into use (I think of things like Elm/Elixir).
There was always an "uptick in relatively new languages coming into use", as long as I remember. The problem is that most of them die before they mature, and some achieve a "peak cool" but then fade into obscurity, or find a relatively small niche. Very few new languages ever get popular enough to get into the top 10 on a permanent basis, and even fewer displace a large player that was previously popular. Meanwhile, most code running on any given computing device is still written in C, C++, or Java.
The article is basically arguing that we're likely to see the same state of affairs in VCS land. Sure, there will still be new ones coming into use, and fading out, or finding small niches - but Git will remain dominant for decades to come.
Exporting to Git: You set a "Git mirror" once, optionally with a remote URI, and call fossil's export routine. Example: I wrote a cronjob that runs once every night that does nothing but update my Git mirrors from the current Fossils.
cd $FOSSILDIR
for f in *; do
/usr/local/bin/fossil git export -q -R $f
done
It feels that way, and it's heartbreaking that our entire species ended up locked into a tool with such a horrible interface.
In a way, the git monopoly is worse than Windows or x86 or IPv4, because it's not just a piece of technical infrastructure. Its arcane commands and its branching model have infected all of our brains. You can choose a different editor, you can choose a different operating system, but for as long as we all live, we will never escape the fact that "git reset" does a half a dozen confusingly different operations, or that renaming cannot be tracked, or that most users don't fully understand most of the commands they regularly use.
> 1. Metcalfe's original Ethernet has been replaced a bunch of times...
These replacements were seemless to users. New Ethernet adapters were compatible with at least the previous spec. The Git import/export of Fossil is not seemless at all. It actually adds quite a bit of complexity if you want to introduce it your regular workflow.
> 2. Microsoft's long-term stalwarts Windows and Office are dying...
Citation needed.
> 3. Adobe's having a hard time hanging onto its old market...
Citation needed.
> 4. IPv4 still won't go away...
There is a lot of hardware out there that only works with IPv4. The costs and risks of switching your org's internals, product or services from IPv4 to IPv6 are phenomenally higher than switching your org from Git to Fossil or adding Fossil support to your product or service.
If Fossil is truly superior to Git and people are not switching to it, then there's no hope they'll switch to IPv6. Not until the cost of not switching is greater.
Wifi mostly replaced ethernet for user facing applications.
There was partial protocol compatibility between ethernet and wifi, but the user experience was very different. Some features were lost or degraded (eg speed, reliability, security, configuration complexity) but a pain point (cables) was fixed.
The problem with Git is that the software and API aren't separated. Like all modern software, there should be an API or interface to which all distributed source control engines comply, allowing the specific DVCS engine to an implementation detail. Want the old-school one written in C by Linus? Fine. Want a revamped version of Mercurial that uses the same commands and creates the same repository format? Cool! Want to Show HN your ability to implement the same thing in Rust with lower memory and better safety? Awesome!
But with Git this isn't possible because there's no concept of the API or interface for distributed version control being separate from the implementation -- and the discussion of "could the API be better than what it is now?" has never been had in a meaningful way.
Of course it has, the other implementation that comes to mind now is JGit [1], a git implementation in Java. A lot of IDEs, build plugins are using it for interfacing with git without the mess that is JNI
Git has a low level plumbing layer and a high level porcelain layer. The design allows for writing alternative porcelain layers. I believe magit is an example of this.
For what it's worth, the linked page is itself running on Fossil, which has of course build in forum facilities and several other suchlike goodies - being a complete solution, packed into a single smallish executable with no significant dependencies apart from SQLite. The Fossil site runs on very moderate hardware, and gets no hiccups from hitting the HN frontpage.
I like this kind of untroubled minimalism, and so far have never encountered a reason not to run every personal project on Fossil. The real world will occasionally force me onto Git territory, but I can't really say I have ever enjoyed the experience.
Git is here to stay for any foreseeable future, of course. And while I do understand points often made about the benefit of one de facto standard to rule them all, monolithic dominance always tends to unsettle me. My SE friends in general simply use 'Git' as a given synonym for 'version control'. And it does annoy and somewhat worry me that they've never even heard of Fossil until I roll out my sermon.
We use Fossil as well @work and we love it, although I don't think it's not the best choice for huge codebases such as the FreeBSD ports tree which I tried to import once to see how it scales and gave up after >2Gb and a hour or crunching. Maybe importing into nested repositories would work better?
I'm working on a git replacement, in a way. The thing that makes git powerful is that its just text. Git is probably the most powerful thing for code as text. When code no longer is just text (and by "text" I mean bytes on disk, not that we're switching to coding with emoji or VR) you get to do more powerful stuff.
Our plan in Dark (https://darklang.com) is to combine all the different ways that people "branch" (deployment, feature flags, git branches, staging/dev/prod environments) into a single concept. And then we also plan to combine all the ways to "comment" (PRs, slack messages, commit message, code comments) into a single concept.
Not sure if you'd call that a git replacement, but it's a displacement of sorts - the function of git is replaced by non-git.
The second one is just an idea right now, suggestions welcome. The observation is that comments on a particular line of code are spread in as many as a dozen places (a google doc, slack, trello, the code itself, an old version of the same code, PRs on github, comments on commits on github, commit messages, another place in the codebase referencing this one, the docs folder in your repo, another repo that uses this API, your 3rdparty docs on README.io, etc). This is weird and bad, and it must be possible to do better.
For comments, fully agree that we don't need several isolated systems (source files, docs, commit messages, ...)
How about one system which is a hypermedia system and can hold many different kind of objects. So annotation objects can reference 'code objects' such as types, fields, functions or even blocks directly. Not sure if Dark gives each of these an identity (it should) which will make it possible to refer to them via hard links rather than text snippets. Once you have all code and comment objects in one hypermedia system, creating views from that is about multiple projections of the interconnected objects.
Wouldn't it be great if I can use a query to refer to 'all functions that reference this type' inside some docs? Or list all annotations that reference a function? These could be embedded inside annotations as well. Gtoolkit does something similar.
Perhaps even a 'branch' can be thought of as a subset of the hypermedia graph. E.g. using a code block X2 instead of a code block X1 within the same function.
It’s not irreplacable. But it’s very well put together, and a great ecosystem of tooling exists around it. So any potential replacement would have to be a lot better for people to move.
And for those of us that remember the dark days of CVS and the pain of migration (I helped migrate drupal.org from CVS to Git), that is a good thing.
I don’t miss the days where you had to make difficult SCM choices when starting a new project, whether bzr, Mercural or Git was the right choice.
Git has become the lingua franca of SCM, and that is great. So many great developer tools exist that integrate seamlessly because it is safe to assume that 99% of the audience for it will be using Git.
I really doubt Git is going anywhere for the foreseeable future, but I could imagine a more approachable VCS catching on. Git is extremely opaque to most new developers and even for experienced devs looking up a new command. Sometimes I look up how to perform an unfamiliar task with Git and find 4-5 competing answers on Stackoverflow with no real clear explanation of why one is better than another.
If an easier VCS caught on enough to be used in schools and boot camps, many younger devs would start with that one and just keep using it as they progress in their career. The new VCS wouldn't have to be better or more powerful, just easier to use for the basic stuff that makes up 95% of Git usage, in order to gain traction. That said, Git works perfectly fine and is so entrenched that I don't see it going anywhere.
Git's UI is unbelievably bad. However, it's practically impossible to avoid learning how to use it these days, so "much better UI than git" will never be a compelling selling point for an alternative VCS: almost its entire target market has already paid the cost of learning git.
That problem could be gotten around if there was some enormous pool of potential VCS users who aren't using VCS currently but would if there was a better one. I don't think there is.
However, it's practically impossible to avoid learning how to use it these days, so "much better UI than git" will never be a compelling selling point for an alternative VCS
I think this is true if you think the target market for VCSs is professional software developers only. But there are other people writing code that might care less about the fact that Git is the standard for professional software development and don't want to pay the cost of learning or using Git: data scientists, scientists in general, new media artists, most high school students, etc. I've had a hard time selling Gitless (https://gitless.com) to undergrad CS students, but it is easy to sell to non-CS students that write code.
That said, GitHub is a big thing so any new VCS probably needs some story for Git-compatibility. Even tools that have built-in version control like Overleaf have some way of synchronizing with Git repos.
Thanks to Github, git is the standard for non-professional software development as well. My kids are in high school and they're using it.
Gitless looks great. It looks like you nailed all the main issues I have with git. If you ensure that people never have to use the regular git interface, maybe it'll take off. I hope so!
Thanks to Github, git is the standard for non-professional software development as well. My kids are in high school and they're using it.
Yes, but maybe they care less about using Git compared to another VCS as long as the other VCS interacts with Git and they can put their repo on GitHub. The question is:
If you can use a VCS that is easier to learn/use than Git and that is compatible with Git so that you can put your repo on GitHub if you want to, would you use it? If no, why?
My guess is that most professional programmers would answer "No, because Git is an industry standard and I need to know Git to get a good job", while other people that write code but have no intention on becoming professional programmers are much more likely to answer "Sure, why not!".
If you ask your kids, I'd love to know what they said :)
Gitless looks great. It looks like you nailed all the main issues I have with git. If you ensure that people never have to use the regular git interface, maybe it'll take off. I hope so!
The obvious pool of people who don't currently use VCS's is everybody who mostly deals with files that aren't plain text. Using git to collaborate on Photoshop documents, Word docs, Jupyter notebooks, videos, or any other non-plain-text format is a frustrating nightmare, and a VCS solution that provided revision tracking and collaboration for all those filetypes would open up a new set of users.
It doesn't have a UI. Its just a program that takes instructions and does what you tell it. If you want a nice fancy GUI for Git there are plenty of reasonable options. Fork is the one my coworkers seem to be enamoured with at this point in time. I myself don't see the need to use any sort of GUI for Git the vast majority of the time.
A better Git UI is possible, like Gitless. A GUI that just has the functionality of the Git CLI doesn't solve the problems I care about, like the staging area being unnecessary and complex.
Bob Martin claims at various points in time that half of all developers have less than five years experience, and attrition and expansion of CS degrees has maintained this.
Five years after introducing a replacement you could have half a team that never used it at all, just like SVN is now.
Yes. I've been programming for nearly 40 years and I only had to use Subversion a couple of times; I never really learned it. OTOH I have to use git for all kinds of projects. Learning git is essential in a way that SVN never was.
I think part of the reason is that SVN never took over from CVS as completely as git took over from everything else. I think another part of the reason is that the number, size and scope of open-source projects exploded since git and Github appeared, and they mostly chose git.
Also relevant: the only project that I recently interacted with that was using SVN is LLVM, and it was faster/easier to use the Github mirror than to use SVN.
I wonder how possible it is to retain the Git data model, but completely replace the CLI? Git gets some flak for its conceptual decisions (e.g. what "branch" means), but IMO most of the real-world friction comes from numerous little inconsistencies and general weirdness.
I would probably think the same if I were stuck in my Linux bubble, but $WORK allows me to observe the average computer worker. They're incredibly deep in the Microsoft ecosystem and not at all interested in switching.
SVN seemed irreplaceable not so long ago and CVS before that and RCS before that... When someone comes up with something demonstrably better and gets the right people to start using it, people will switch over.
SVN also had serious difficulties and performance limitations which git does not.
Git is somewhat confusing to use, but not enough so that anyone really cares all that much (besides a few people who really care) and that is not a recipe for easy replacement.
There were/are slightly less confusing version control systems (mercurial) but they didn't catch on for whatever reason.
I would love to see a merger of Pijul and Darcs again. Having two diverging patch-based approaches (one declining, one not lifting off) is not a good thing.
Git doesn't work very well when trying to version control things which aren't text and are large binary files. There's an acceptable workaround which is only a little awkward with git-lfs.
The general consensus is that putting big files in git means you're doing something wrong and the problem is with your environment not git. (or there are special-purpose tools for your kind of workflow which handle the specifics of your use case, like CAD/CAM/etc.)
Workflows like that generally don't fit into nice little boxes anyway the way source code management does.
> The general consensus is that putting big files in git means you're doing something wrong and the problem is with your environment not git.
This is not a “general” consensus. It’s a consensus among hardcore proponents of git. I love git, it has made my life better. I still think it’s large file support story is shitty/suboptimal, and there are valid use cases where a general purpose VCS is used to track large binary assets along code and git would do well to be a general purpose VCS. It’s a limitation of git. It’s not a fatal limitation, and git still has enough benefits (which include availability and mindshare), but it is still an unfortunate limitation and somewhat ironic for a tool born in a world where everything is just a “sequence of bytes”.
I mean so is Unix and it's still around all these years. Heck there's nothing especially obvious about a for loop, yet every language has them. Once everyone's accustomed to the weird interface they are just going to demand everything else have the same weird interface they're used to. This is one of the lamentations of the Unix Hater's Handbook.
If you’re talking about a “for (initialize; condition; step) { body }” for loop, most newer languages don’t have them AFAICT.
I would say the other kind of for loop, a “for x in y { body }” loop, is very intuitive and obvious to most English speakers. We use idioms like “for each A, do B” all the time.
Git is probably destined to be Unix for our generation that was born after Unics/Unix invention. It is not a coincident that the designer/programmer of the most popular Unix version/clone of our generation is the same guy who designed and programmed Git.
I find it ironic that in the same post the author complains about a git monopoly ("Git effectively has a global monopoly on DVCSes") while wishing the same thing for fossil ("Fossil's world domination will not come from...").
"world domination", specifically, refers to a long-running joke about the corporate-sponsored capitalist competitors to FLOSS. See [0] for an example of the joke being used to title a talk and frame business decisions. [1] contains a history of the phrase in the Linux world. Note that Linux and git share the same original author/designer.
Git can easily be replaced in my company if there was business need for it. For now it does do its job, but there is no reason we wouldn't migrate if there was a better alternative. Right now there isn't one.
Perforce is still in use in content heavy industries because Git still struggles with large binaries. Something with git branch semantics that could handle those files would be huge.
Another feature of Perforce I liked was view specs. If your dealing with massive repos (as for example games tend to have) you often only want some small portion of the entire repo. You may also want to alter the repo layout on your local machine and a client view spec can allow you to slice and dice the content in many interesting ways.
This is also very useful for working with external contractors. Not only can a user specify their own view specs, there is admin control for client views that can make portions of the repo read-only or effectively invisible on a per-user basis.
I really miss being able to check in chunks instead of full files, and streams are a poor replacement for git branches but it does let you do monorepo. P4V is a buggy mess though.
I find it amusing that the author mentions Git and Excel in the same breath. One is a GPLv2 piece of software that you can submit patches to, fork, or build your own copy and sell it (so long as you release the source); while the other is a closed-source piece of software that provides largely static utility to many people, yet keeps increasing in price (latest is ~$100/year for an "Office 365" subscription if I remember correctly).
Imagine if you had to pay $100/year to use Git. I would probably hand it over, begrudgingly. But the fact that I don't is probably one of the more important triumphs of Free Software over proprietary in recent years.
Does Fossil do sub-modules correctly? Specifically does it let you compose a project of several repositories and do configuration management on that repository? That is something git kinda sorta does but it gets out of control easily.
Fossil's intended use case is different from Git's; it was originally built as the VCS for SQLite. In terms of its target audience, it's basically GitHub-in-a-box for small or medium-sized teams. Implementation/usability-wise, it does some things better than Git, does other things worse, and yet others simply different (in a way that some may find better, some may find worse).
I like the idea of storing commit history in a database. Why write all of the bits from scratch?
But I don’t like how difficult databases make it to represent graphs of data. To pull a subtree cheaply, you need a graph. not unlike SVN’s data structure.
It is something that I prefer to have in a VCS. At one end of the spectrum you treat a repository as a object which a containing repository can subclass with specific changes. Such a system would allow things like basic driver frameworks for a bunch of devices to be in their own repository and the 'critical bits' that make them work in a system of type 'x' would end up being a that repository + changes. You can kind of do this with git where you create a branch for each system x, y, and z in the repository and then your submodule is driver repo branch x but that doesn't put the maintenance burden in the right place (the system X maintainer).
I wish there was a version control system that didn't need user input beyond a "push". Only branches and pull requests, nothing more. No commit messages, you have one master branch and however many sub-branches you need.
Whatever code is in your branch is what it is. And then there's a layer of magic on top of it all, maybe a UI or command line tool or both, just to be able to easily rewind time. Either per file, per line of code, or folder, or just a folder but not recursive, or the entire branch.
Think of Apple's Time Machine, it should be that simple.
Honestly, I rarely–if ever–read commit messages to begin with. The way I navigate older code is never done by searching for commit messages. That's not reliable.
Instead I simply go to a point in time where I think the thing I'm looking for might be. I'll look at the code, recognise its state, and continue the search up or down the timeline.
And that would cover the needs of most projects I'd say. And it would save us a shitload of time.
Hell, I've been working with git for almost a decade now. I never needed to rebase or merge things, until I recently did need to do it. It's too arcane to make intuitive sense, to me anyway.
Git is good, I'd just welcome a breath of fresh air...
Do you work on a team? Or in a job were you need to have an audit trail for your work? Commit messaged and many other features of Git like branching and tags are indespensible for sharing and collaborating on code.
Flip your questions around. Might someone who isn't on a team, and doesn't need an audit trail, want something simpler?
Versioned files systems, including Apple's Time Machine and Dropbox's version history, have no extra UI to save versions. A small, short-lived project with at most a few collaborators (eg, working on a small scientific paper) might find those more useful than git or other VCS.
I would have to agree with this. I have been programming for decades and I haven't had to work on a team for a project. "Branches" have not happened, nor have I ever felt the lack of them.
I would not mind something simple. I find the arguments, flags, whatever in git to be rather opaque. That, coupled with the lack of need for its features, have not thrilled me.
It seems to be pretty effective if you have tons of programmers working on a single project, but at the bottom end of the scale, I find it baffling.
I particularly enjoy IntelliJ's local history, which will go so far as to tell me whether the tests were passing or failing (and how many failed) on a specific file at a specific point in time.
Bit Git isn't great for audit trails given that you can go back and alter previous commits, in contrast to pretty much any other VCS, including Subversion and Fossil.
(I know, technically, altering commits creates new commits in Git, but it comes to the same thing.)
Eventually git will be replaced by something better. It might not be replaced everywhere, but eventually it will be replaced in most places. This is true of almost all things in tech.
I expect that git's eventual replacement will initially boast compatibility with existing git repos.
Taking the question literally, obviously not. If git became a problem due to some unforseen licensing issue or whatever mercurial does the job just fine right now and has for years.
As a heads up if you're stuck on cvs, svn or other stupid vcs due to "old-codgers" in your office, mercurial has a shallower learing curve and easier ui to get the same job done which may make it easier to switch.
Either git or mercurial, who cares? Either of them until there's something better. Never deal with SVN & CVS branches and merges again. Feel the immediate team productivity boost which will pay for the initial learning curve costs by day 2. Seriously.
Unfortunately, you have to settle on some workflow for that. And this settling will take some experience and discussion. So "day 2" sounds very overambitious to me.
Use the existing svn/cvs workflow with the trivial changes for git/mercurial. Central repo, push when you're done to master. Branches are all but unusable on cvs/svn anyway - if you do use them, feel the win and laugh with glee. When everyone is on top of that basic workflo then you do something incrementally better. Really it's day 2, really. I've seen this more than once now. The fight to get to that point where the changeover happens is getting easier but I'm sure it still sucks.
GitHub is what mainstreamed Git to what it is today. Who knows what our industry would look like (with regards to source control) if GitHub never existed.
Probably still SVN and Team Foundation Server (shutter).
If you take a look at Subversion vs. Git's interest on Google [1] and you agree it has some correspondence to the technology adoption lifecycle [2] ... Git's got a long way to go.
Of course Git is replaceable. When a new tool that provides a substantial benefit over using Git comes around, it will be replaced.
If that never happens, then it's a pretty strong indication that Git is working perfectly fine -- so why worry about whether or not you can replace it?
Absolutely not. Give me something that can more elegantly handle sub-projects and sub-repositories but still works as well as Git, and I'll stop using Git today.
fossil is a really interesting project, but the switching cost from git is pretty massive. The two killer issues for me:
* community/history: any issue I have with git, someone else probably has had. Being a pioneer is very expensive, time wise.
* ecosystem/tooling: aside from git host provided tooling, there're integrations for vscode, slack, CI, etc.
One major fundamental issue with git that nobody has brought up is that its data model is close-to-incompatible with some of the current legal and moral requirements around data privacy today. For example, as far as I can tell, GDPR allows any European citizen who has ever committed to the Linux kernel to request that their name be permanently expunged from their contributions, and everyone with a git clone of the kernel is legally obligated to perform a rebase to do so. It's possible to make a DVCS where that is an easy operation, but git very much isn't it. That means it can't be used to store personal data, and even standard corporate policies of "we delete old historical records so they don't bite us in court" aren't supported.
There are definitely times you want to store the full history forever, but it would be nice to have a DVCS that gave other options.
The legal requirements in the EU are at odds with the moral requirement. Changing or erasing history in Git is an anti-feature. Checked in secrets should be assumed to be compromised and changed immediately. Removing contributions should be treated as any other form of censorship.
Those who forget (or rebase away) history are doomed to repeat it.
I'll believe it when I see it and it's an actual improvement. It's not like "cloud" is some sort of Platonic embodiment of goodness and anything that is "cloudier" than an alternative is automatically better. All the attempts to put code in "databases" I've ever seen make easy things sorta easy and make medium things way harder and you can just forget about hard things; if I can't do the equivalent of "perl -pi -e 's/OldVariable/NewVariable/g' then it just isn't going to displace a file system for me. Not because I need that exact commmand or perl, but for what it represents, the ability to just do things to my code as needed, without needing to wait for someone to implement an API or sluggishly do it over a network. Another example would be just throwing a code analysis tool at my code without needing some sort of permission from my "cloud code provider".
File systems haven't been disappearing, they've just been getting hidden from users. I have seen exceedingly little evidence that anything beyond merely hiding them is actually happening in the industry. "Stick all the files in databases" seems to be seeing no penetration beyond music collections, which at this point, I think we can call a "mere use case" rather than the vanguard of a revolution, since it's a good 10-15 years old minimum.
I agree with the first half of the statement: we indeed do not have a cloud-first version control system. It could bring some real benefits of tight integration with continuous build and other things considered to be essentials nowadays.
I don't see files going anywhere. I would like to listen to examples of "code-in-database is already halfway here", if you happen to have them.
Low-code is very suddenly going to be a big thing in the next couple years and when it happens there are now 750M more "coders" and those types of people are not interested in git pull --rebase --fucked --whatdidido
The git internals are good for some kinds of projects but for the actual majority of projects git is a very bad fit. Most people don't get that because they are blind for the much better tools like mercurial and fossil.
The git UI/UX on the other hand is the worst piece of crap ware known to man. This piece of shit has probably destroyed more data than any other tool ever written. People who think that git is any good are obviously wrong and can easily proven wrong. It's sad that git has become the default.
It's ironic that Git was popularized in the same era as monorepos, yet Git is a poor fit for monorepos. There have been some attempts to work around this. Google's `repo` command is a wrapper around Git that treats a set of smaller repos like one big one, but it's a (very) leaky abstraction. Microsoft's GVFS is a promising attempt to truly scale Git to giant repos, but it's developed as an addon rather than a core part of Git, and so far it only works on Windows (with macOS support in development). GVFS arguably has the potential to become an ubiquitous part of the Git experience, someday... but it probably won't.
Git also has trouble with large files. The situation is better these days, as most people have seemingly standardized on git-lfs (over its older competitor git-annex), and it works pretty well. Nevertheless, it feels like a hack that "large" files have to be managed using a completely different system from normal files, one which (again) is not a core part of Git.
There exist version control systems that do scale well to large repos and large files, but all the ones I've heard of have other disadvantages compared to Git. For example, they're not decentralized, or they're not as lightning-fast as Git is in smaller repos, or they're harder to use. That's why I think there's room for a future competitor!
(Fossil is not that competitor. From what I've heard, it neither scales well nor matches Git in performance for small repos, unfortunately.)