1) Difficult to track changes to the code I'm interested in. Every day there are hundreds of changes in the repo and almost all of them have nothing to do with what I'm working on.
2) all sorts of operations take longer (pulling, grepping source, etc.) to support code I couldn't care less about.
3) Frequently have to update the world at once. Unless the repo can store multiple versions of the same module, then all the consumers have to be updated at once, even if it's inconvenient. Sometimes migrations are better done gradually.
4) Encourages sloppy dependency management. There are frequently unclear boundaries between software layers.
I'm sure people will say "if you're having those problems, you're doing it wrong" but the same thing could be said to people who find the distributed model problematic.
Note that even where Google are forced to use git (e.g. Android, Chrome) they use a many-repo approach.
Everywhere I've seen mono repo, mono repo was better than multi repo.
They all built special tooling and have dedicated teams to support it.
The only downside is that it's not open-source, and as a result has a much smaller community. It's free for up to 5 users, then "email us" for any more. But if a very flexible VCS model is something you need, it's the same as anything else you need to pay for.
Google used to use Perforce until they hit a certain scale, so it's likely it'll work for you until you hit that scale and can build your own tools too.
- it requires a certain discipline: we need branching in our workflow and this is handled mostly by convention in a subversion repository. We have "branches" that were created by less careful colleagues by copying subdirectories of trunk to the branches folder.
- all the tooling developers fled to work on making git bearable. It seems that there is good money in sugarcoating got and none in making good tools for Subversion (awareness of branches in Jenkins, decent code review...). We have a budget, but that does not compensate for the lead that git has in that regard.
Other than that, subversion fits our needs. It just works.
Subversion has been used in basically each and every open source project as a replacement for the previously most used CVS.
Subversion was better tjan cvs, but still bad in many aspects, slow synchronization and bad branching and merging support come to my mind.
Because of these shortcomings and because of the idea of decentralized versioning coming up, many systems like git, mercurial, and others came up then, and git seems to be the most successful of these by now
Should have: Access control to view and change file on a subdirectory basis. Everyone can see the repo so you can't permissions users per repo anymore. It's optional but these companies have that.
Recommended: Global search tools, global refactoring tools, global linting that can identify file types automatically and apply sane rules, unit test checks and on commit checks available out of the box for everything and that run remotely quickly, etc...
It's regular tooling that every development company should have, but only big companies with mono repos have it.
It's not that the tooling is needed to deal with the mono repo, it's that the tools are great and you want them. But they can't be implemented in a multi repo setup.
Think of it. How could you have a global search tool in a multi repo setup? Most likely, you can't even identify what repo exists inside the company.
Makes me realize. If I ever go back to another tech company, the shit tooling is gonna make me cry.
Global refactoring seems a lot less necessary if you have clean separation among your processes. Maybe this is me coming from a more microservices perspective, but I'm inclined to say that needing to do a refactor that cuts across several different functional areas is a sign that things are becoming hopelessly snarled together.
Or if they've found some bad pattern, would pull it too. I do remember when certain java hash map was replaced, and they replaced it across. It broke some tests (that were relying on specific orders, and that was wrong) - and people quickly jumped and fixed them.
This level of coordination is great. And it's nost just, let's do it today - things are prepared in advance, days, weeks, months and years if it had to. With careful rollout plans, getting everyone aware, helping anyone to get to their goal, etc.
It's also easy to establish code style guides, and remove the bikeshedding of tabs/spaces, camel braces or not, swtich/case statement styles, etc. Once a tool has been written to reformat (either IDE, or other means), and another to check style, some semantics - then people like it or not soon get on that style and keep going. There are more important things to discuss than it.
Regarding global refactorings think new language features or library versions.
Support for punctuation in search is something we knew wasn't ideal when we first added code search. As with all software, there were some technical constraints that made it hard to do.
We plan to have support for full stops and underscores in a future version and are exploring how to best handle more longer term. Our focus, based on feedback, is on "joining" punctuation character to better allow searching for tokens. Support for a full range of characters threatens to blow out index sizes, but if we get more feedback on specific use cases we're always happy to consider them.
Being a self-hosted product we have to make tradeoffs for the thousands of people operating (scaling, upgrading, configuring, troubleshooting...) instances. In short, we try to keep the system architecture fairly simple using available technology and keeping the broad skillsets of admins in mind.
It was a somewhat difficult call to add ElasticSearch for it's broad search capability, but being used for other purposes helped justify it. Adding Hound or similar services that were considered would have added more to administrative complexity and wouldn't provide for a broader range of search needs.
We continue to iterate on search, making it better over time.
I mean, when you get big, sure. But until you're big, git is fine. Working at fb, I don't use some crazy invocation to replace `hg log -- ./subdir`, I just do `hg log -- ./subdir`. Sparse checkouts are useful, but their necessity is based on your scale - the bigger you are, the more you need them. Most companies aren't big enough to need them.
> Should have: Access control to view and change file on a subdirectory basis. Everyone can see the repo so you can't permissions users per repo anymore. It's optional but these companies have that.
Depends on your culture (and regulatory requirements). I prefer companies where anyone can modify anyone's code.
> Recommended: Global search tools, global refactoring tools, global linting that can identify file types automatically and apply sane rules, unit test checks and on commit checks available out of the box for everything and that run remotely quickly, etc...
I'd bump this up to `should have`. The power of a monorepo is being able to modify a lib that is used by everyone in the company, and have all of the dependencies recursively tested. Global search is required, but until you're big, ripgrep will probably be fine (and after that you just dump it into elasticsearch).
This is still true at Google, except for some very sensitive things. However, every directory is covered by an OWNERS file (specific or parent) that governs who needs to sign off on changes. If I’m an owner, I just need any one other engineer to review the code. If I’m not, I specifically need someone that owns the code. IMHO, this is extremely permissive and the bare minimum any engineering organization should have. No hot-rodding code in alone without giving someone the chance to veto.
Having something understand syntax when indexing makes these tools feel blunt. SourceGraph is making a good run at this problem.
Agree that any small to medium company could have a mono repo without special tooling. Yet they don't.
There are companies that care about development and there is the rest of the world.
Unless designed to search source code most search tools will be lacking.
The startup I'm working for now is roughly half ex-googlers, so it is a different story. Of course we can't afford Google level infrastructure, but there is at least a strong cultural value around internal tooling, and a belief that issues with repetitive or error-prone tasks are problems with systems, not the people trying to use them.
Build systems, release systems, integration tests, etc. - everything works easier - as you refer to things just by global path like names.
Blaze helps a lot - one language for linking protobufs, java, c++, python, etc., etc., etc.
Lately docs are going in it too, with renderers.
Best features I've seen: code search, let's you jump by clicking on all references. Let's you "debug" directly things running in servers. Let's you link specific versions, check history, changes, diffs.
GITHUB is very far away from this, for nothing else - but naturally by not even be possible to know how things are linked. Even if github.com/someone/somelibrary is used by github.com/someone-else/sometool, GITHUB would not know how things are connected - is it CMake, Makefiles, .sln, .vcxproj. It maybe able to guess, but that would be lies at the end... Not the case at google - you can browse things better than your IDE - as you can't even produce this information for your IDE (a process that goes every few others updates it, and uses huge Map Reduce to do that).
Then local client spaces - I can just create a dir, open a space there, and virtually everything is visible from it (whole monolithic depot) + my changes. There are also couple of other ways to do it (git-like include), but I haven't explored those.
What's missing? I dunno... I guess the whole overwhelming things that such a beast exist, and it's already tamed by thousands of SREs, SWEs, Managers, and just most awesome folks.
I certainly miss the feeling of it all, back to good ole p4, but the awesome company that I'm in also realized that single depot is the way to go (with perforce that is). We also do have git, but our main business is game development, so huge .tiff, model files, etc. files require it.
Also ReviewBoard and now swarm (p4 web interface and review system) is so far nice. Not as advanced as what google had internally for review (no, it's not gerrit, I still can't get around this thing), but at going there.
Another last point - monolithically incremental change list number would always be easier than random SHAxxx without order - you can build whole systems of feature toggles, experiments, build verifications, around it, like:
This feature is present if built with CL > 12345 or having cherrypicks from 12340 and CL 12300 - you may come up with ways to do this too with SHA - but imagine what your confiuration would look like. It's also easier to explain to non-eng people - just a version number.
From my time at Google the first thing that came to mind was citc. But I couldn't remember if citc was publicly known, so I did an Internet search for "google citc". The first search result was this article.
"CitC supports code browsing and normal Unix tools with no need to clone or sync state locally."
"I work at Facebook, and can confirm we keep all code in a monorepo".
Google uses many-repo approach for Android and Chrome because you cannot fit everything in a single git repo (well you can, but it will be a pain in the ass to work on that repo). Git is just not designed for huge repos. Google is also working on tools to make the many-repo of Android or Chrome work like a monorepo.
To upgrade format G2 we must change both software A and B.
First, software B version 2 must accept both G1 and G2. To do this we may need to build software A version 2 and try them in a sandbox environment to gain confidence that ∀F1 we produce the correct G2. If F1 is complete, we may be able to do this exhaustively, but if F1 is sufficiently diverse, monte carlo simulation might be used.
Then, if there's a 1:1 relationship between A/B we can upgrade pairs.
If there's a N:M relationship, we need to upgrade all of the instances of software version B1 to B2 (at least within a shard). If you're running in a non-stop environment, this might have it's own challenges. Only then, can we begin the upgrade from A1 to A2.
Something, somewhere needs to record what and where we are in this journey. It is relatively straightforward how to do this with a monorepo, but it is very unclear how to do it with a distributed repository:
Almost everyone I know punts and uses some other golden record (like a continuous integration server, or a ticketing system, or an admin/staging system), and like it or not: that's your monorepo.
If you’re doing the multirepo strategy it’s best imho to make the projects truly independent, as if they were developed by different companies. That way every project only needs to think about its own dependencies and consumers, and how to do migrations, without needing to have the big picture mapped out.
This can be impractical if G is a database table that is very large.
> it’s best imho to make the projects truly independent, as if they were developed by different companies.
One of our systems might cost £300k, so completely desynchronising them so that code paths can build both G1 and G2 simultaneously (allowing B to develop separately) means "simply" doubling the costs. That might put our team at a disadvantage against someone who figures out another way.
If this is true, you have no choice, and must run things side by side while you convert to G2. Or shut everything down to make the migration atomic, which is increasingly not an option.
It depends on your database.
If you imagine a simple postgres or mysql server and an "alter table G..." then you're right.
If (however) G1⊂G2 then a document store or a column-based database can usually partition the table somehow.
Unless if the new format G2 is a super set of the old format G1. Read protobuffers best practice to know more about this.
If A and B has nothing to do with each other - other than for some circumstantial reason they consume data from each other, then why would we care if A or B starts to support a new output format?
If we want to do a format change for some reason, maybe it'll allow better security/traceability, then sure, make a project and track the tasks (like make A able to produce/consume new format, make B able to produce/consume new format, deploy A2 and B2 to test environment, promote to prod), but I don't see why would you track that on the source code versioning level.
A and B has separate tests to ascertain that they can deal with the new format, and then you do the integration testing, that might catch problems, that then should be covered by unit tests in A or B. (Or in a fuzzer for said format.)
> If A and B has nothing to do with each other - other than for some circumstantial reason they consume data from each other, then why would we care if A or B starts to support a new output format?
First, even if the coordination between A and B is recorded in the ticketing system, the coordination between F and G is probably not.
> I don't see why would you track that on the source code versioning level.
Pretend F and G are tables in a database (or other data storage system) if that makes it easier.
Where is the schema stored? Who records the migration path?
Many people like to record migrations in a version control system, but it is tricky to link those migrations to the (otherwise) independent A and B.
If these are file formats where exists the code to consume and produce them? Or network formats? The problem remains the same -- do we break this up into additional libraries?
That there's a very real ordering between release of A and B that isn't properly encoded, we're relying on process diligence (as opposed to tooling) to be correct.
If they are independent tables, then I don't care, show me the API between the projects.
If these are file/network/serialization/wire/in-memory/binary/codec formats, then there are conformance checkers (passive and active, like fuzzers). Those are separate projects, but they can be used like tools during testing and development.
Rely on tooling to make sure that the stated goal of the project is reached. (It now supports F or G or X,Y,Z formats. It supports output-format G by processing input-format F. If that's a project requirement, test it in that project.)
You can use a top level repo for the integration tests. But it's no need to make it one flat repo.
Lock-stepping two otherwise unrelated applications because they both share support for a data structure is silly at best, and often impractical, especially if development for only one of the projects is "in-house". Consider the possibility that "A" is a commercial product produced by another company.
Anyway, it's my experience most software upgrades don't involve a schema change, so it's worth optimising for the common case, and supporting the difficult case.
Versioning through branching and tagging, while having some drawbacks - at least the fact that you have to DO an operation and that this is not automatic - seem to solve this problem, and are not, in my eyes, a form of monorepo. You globally get more flexibility at the cost of a bit more repo management work.
If the problem is retrieving the right version automatically, externals or submodules should be able to solve this problem. If A and B have no clear dependency direction, a top level repo might help.
This is the way I generally do it: A repository that represents my system/environment that has submodules for A1, A2, B1, and B2, and scripts for updating the environment.
Git only allows to check/commit/view to the entire repo at once. Then, some git operations are superlinear with the number of files or revisions, they are slow on large repo to the point of being unusable.
It's mandatory to have operation on a per file or subdirectory level in a mono repo approach. Companies that have mono repos all built tooling to support it. CVS/SVN used to do that out of the box but everyone hate them now.
It's "CVS" and it lacks a concept for a repository-wide version (except, maybe, a timestamp). A repository-wide version is –I guess– the single best reason to have monorepo in the first place.
SVN is okay.
Also, `git log -- subdir/`.
Building a monorepo out of submodules should solve these problems, or not?
Also, there is subtree.
Once you done that, they är great!
In general / light use, yeah, they're great. Unfortunately, they have a very large number of edge cases where they essentially require either a) everyone to be experts in the edge cases, or b) tons of new tooling (because existing tools won't take these steps for you).
I like sobmodules, but the whole feature needs a lot more polish before it's widely adopted.
Wouldn't the right tooling be able to show you changes to the slice of code you're interested in? I remember SVN would allow you to checkout just a single subdirectory, for example.
> all sorts of operations take longer (pulling, grepping source, etc.) to support code I couldn't care less about.
Makes sense about the pulling, but again, wouldn't grepping be configurable to only search where you need to?
> Frequently have to update the world at once. Unless the repo can store multiple versions of the same module, then all the consumers have to be updated at once, even if it's inconvenient. Sometimes migrations are better done gradually.
I'm not sure I understood you here, can you expand on that? Do you mean all the devs have to update their module? Why not use tagged/branched version of libs instead of working off trunk?
What kind of tooling do you use for the distributed approach?
The only solution I can think of is to create a copy within the monorepo to create de facto branches without regular VCS support. This would be kind of terrible.
I still do not see the appeal in mono repos since you're heavily dependent on discipline to not introduce spaghetti dependencies, where you fix one bug, but introduce 4 new in unrelated parts of the code. Now you solve that with an if statement, and thus we introduced a great deal of technical debt.
I’ve never worked in a monorepo, so may be wrong, but this point presumes a dependency on “latest” at all times. I’d assume the components in the mono repo still release versions to the various package management systems (maven, pypi, npm, etc), allowing dependencies to be more stable.
Has that not been the common experience by those that have worked in them? I see a lot of merits of having to update everything at once (less code rot, hopefully) but it does seem to have drawbacks (many have commented on these as well).
While at google, we used that kind of development for the project I was in. Someone would push source code changes for new features, but prefferable behind a flag (normally a command-line flag, driven to a configuration, like the one ksonnet has). The confiuration file would say - enable this flag, only the binary was compiled with this CL version, and/or these cherrypicks, or some other rule.
This also allows a feature to be quickly disabled by SRE, SWE, or other personnel if it's found to be not working well.
> 1) Difficult to track changes to the code I'm interested in.
What's wrong with 'git log $PATH'?
> 2) all sorts of operations take longer (pulling, grepping source, etc.) to support code I couldn't care less about.
A different format could help here, as can different tools (e.g. ripgrep or ag instead of grep). The time spent on those operations has to be balanced with the time spent updating your code to deal with someone else's incompatible library changes, again, when the other person is on vacation and you have no idea what the new philosophy of his library is. And you don't have any choice about updating, because another one of your dependencies that you really must update has already been updated to rely on his changes.
> 3) Frequently have to update the world at once.
IMHO that's a feature, not a bug. The person or team responsible for breaking the world is responsible for fixing it, rather than getting to break the world, then pop off down to Barton-on-Sea for an extended holiday while everyone else in the company gets to update his code to use an entirely different idiom.
> 4) Encourages sloppy dependency management.
My experience has been that multiple repos tend to encourage sloppy dependency management, while a monorepo tends to encourage deliberative, collaborative, professional dependency management. That's just my own experience, and of course different organisations will differ.
> I'm sure people will say "if you're having those problems, you're doing it wrong" but the same thing could be said to people who find the distributed model problematic.
My own experience has been that multirepos tend to be like dynamic typing and monorepos tend to be like static typing: multirepos can in theory be done right, but in practice they never are, while monorepos work, but at the cost of people having to colour within the lines. Which makes sense for any particular organisation may actually be a function of its maturity: if a place is trying to move fast and break things, maybe multiple repos make sense; if it's trying to deliver quality software, maybe a single repo makes sense.
3 and 4 are pretty fundamental though, especially 3 - if you don't want to force everybody to keep up with head, you probably don't want to use a monorepo.
My team owns a framework and set of libraries that are widely used within the Google monorepo. We confidently forward-update user code and prune deprecated APIs with relative ease — with benefits of doing it staged or all-at-once atomically.
It's imperfect, but maintenance in distributed repositories is infinitely worse. Still, I remember the earlier days of the monorepo and keeping Perforce client file maps; that was a pain! https://www.perforce.com/perforce/r15.1/manuals/dvcs/_specif...
Some languages and ecosystems are more tolerant of this problem than others. That said, incremental cleanup still has advantage with bisecting regressions.
As I said, it is not perfect but broadbased change quickly is relatively easy.
In my time maintaining open source, I never had these luxuries, which is why I said the monorepo is infinitely easier. Another consequence: if global cleanups are easy, perhaps that reduces the barrier to experimentation. Perfect is no longer the enemy of the good and the good enough. For me, I felt in open source where I had zero control over dependent code and its callgraph, the reverse was true: hesitance to publish something for fear of cost.
Interestingly, they only do that for java code. Java has good analysis and refactoring tools.
Not sure what you mean. I work at Facebook, and can confirm we keep all code in a monorepo (or, rather, one of two big monorepos) rather than just Java code.
This lets us easily do React API changes: we can deprecate an API internally, and update all JS code that references the old APIs in a single commit.
Other languages are much harder to process.
I can definitely see this.
Edit: And although this is a multi step process, it still allows you to de-couple modules and work on them separately.
i.e. what stops your current tools from `for each repo, run...`, or how is monorepo fundamentally more capable than building automated library management / releases / etc with the same level of tooling?
In a multi-repo world, people are probably linking against old revisions of your library, and against certain tags/branches etc. There is probably no overarching code search to find all users of the API. You're gonna have to grep the code and hope to find all uses. You might miss some repos/branches. Everyone has their own continuous integration/testing procedures, so you can't easily migrate their code for them. You're gonna have to support both API's for probably months until you have persuaded every other user to upgrade to the latest 'release' of your code which supports the new API before finally turning off the old API. The work involved in the migration is spread amongst all the project owners, which is probably much less efficient.
As others have said, it's the fully integrated version consistent codesearch with clickable xrefs across gigabytes of source code, cross repo code review, cross-repo testing, etc. which really makes a monorepo work well.
With the exception of cross-repo code review (I hadn't thought of that one - would be useful for multi-repo too, but I've honestly never seen a multi-repo tool for this, thanks!), this is all just the benefits of standardization, plus a massive injection of tooling enabled by the standards.
Standardization of projects brings huge benefits when it's done right, absolutely agreed. But that's entirely orthogonal to mono vs multi.
for repo in ls repos
Imagine I have Repos A,B,C. A is a base repo. B and C depend on A, and C also depends on B. If I modify some API in A, and also update all the callsites in B and C, I also have to bump the version of A depended on by B and C, and also bump the version of B that C depends on, otherwise I'll get version mismatch/api compatibility breakages.
To make this work that means that nothing can depend on latest, everything has to have frozen dependencies, and you either need to manually, or via some system, globally track all of the dependencies across repos, and atomically update all of them on every breaking change.
In other words, you reinvent blaze/bazel at the repo level instead of the target level, and you have to add an additional tool that makes sure you're dependencies can never get mismatched.
The monorepo sidesteps this issue by saying "everything must always build against latest".
No, you cannot. That's my entire point. Here's a minimal example:
Repo one contains one file, provider.py:
import provider # assume path magic makes this work
assert provider.five() == 5
if __name__ == '__main__':
Now I want to change `five` to actually be `number`, such that `number(n) == n`, ie. I really want a more generic impl. What sequence of changes can I commit such that tests will always pass, at any point in time?
There is no way to atomically update both provider and consumer. There will be some period of time, perhaps only milliseconds, but some period of time, at which point I can run my build script and it will pick up incompatible versions of the two files.
This is a reductive example, but the function `five` in this case takes the role of a more complex API of some kind.
but yes, cross-project commits are dramatically easier in a monorepo, I entirely agree with that - they essentially come "for free".
Consequences of this are, for example, that you cannot run all affected tests at every commit.
My point here is that you're describing a known problem with known solutions, and saying it's impossible. I'm saying it requires work, as does all this in a monorepo.
edit: to be technical: yes, you're correct, it can't always build at latest at every instant. Agreed. I don't see why that's necessary though. Simplifying, sure; necessary? No.
The value from this is the ability to always know exactly which thing caused which problem. If you know things are broken now, you can bisect from the last known good state, and find the change that introduced a breakage. With multi-repo, you can't do that, since it's not always a single change that introduces a breakage, but a combination.
Ensuring that everything always builds at latest allows you to do a bunch of really cool magical bisection tricks. If you don't have that, you can't bisect to find breakages or regressions, because your "bisection" is
1. now 2 dimensional instead of 1
2. may/will have many false positives
In any case, unless you have atomic deploys across all services, this is generally untrue. Bisecting commit history won't give you that any more in a monorepo than in a multirepo.
To your second point, nothing I've said has anything to do with deployment. We're still entirely in the realm of continuous integration.
I'm asking because i wouldn't know how to setup a mono repository at my 50 people Startup even if we deemed this to be necessary.
Sorry if this is a really dumb question. If you only have 50 people I'm assuming your codebase isn't that big, so why can't you just make a repo, make a folder for each of your existing repos, and put the code for those existing repos into the new repo?
I imagine there's a way to do it so that your history remains intact as well.
For a huge, non-open codebase there are some pretty large downsides to a fully distributed VCS in exchange for relatively few benefits.
It's important to stress that Google uses Perforce and not git (at least for that monorepo, they use git/gerrit for Android).
A monorepo this size would simply not scale on git, at least not without huge amounts of hacks (and to be fair, Google built an entire infrastructure on top of Perforce to make their monorepo work).
You are exactly right that git doesn't scale though, go see the posts on git that Facebook's engineers made while trying, only to be met with replies to the extent of "you're holding it wrong, go away, no massive monorepo here", at which point they made it work with mercurial instead. Good read though, lot of good technical details. Can't find the link at the moment though :(, but it was from somewhere around 2012-13 ish.
Edit: here, looks like the original thread is deleted but here's the hn pointer: https://news.ycombinator.com/item?id=3548824
If all those things continue I think the only reason to use git over hg would be github. How long until they decide to support Mercurial too and people abandon git?
Yes. End of story. People will abandon things that don't support them for things that do and those that want to continue using something that fits their application will do so. Nothing to see here; we get it, you don't like git -- don't use it if it doesn't fit your needs. However, don't expect those who do like it to go out of their way in a way they don't want to please you. Just because there is a community developed around something and that something is open source does not mean they are required to accept whatever patches come their way -- often the best projects know what to keep out as much as what to let in. In this case, the git community has decided it doesn't want to do those things; more power to them.
I think you nailed the problem with Git here: it was created by one guy to support his pet project and as long as it works well for him all the other feature requests are low priority.
Edit: scratch that, that works but has no threading. Take two.
"The Google codebase includes approximately one billion files and has a history of approximately 35 million commits spanning Google's entire 18-year existence. The repository contains 86TBa of data, including approximately two billion lines of code in nine million unique source files."
It prompted me to do a quick afternoon experiment with how git would handle a billion lines of code:
And there are other non perforce like Piper interfaces.
What are the other ones and the main differences, really curious
Was pretty much used exclusively back when I was in gamedev, not sure if that's still the case.
Now, imagine you're a huge corporation. Your code consists of millions of files that have been edited millions of times. It's never going to be released to the public. It's never going to be forked, much less by a stranger. You're going to have only one main branch and main build ever, except for maintenance branches. The complete history of everything that has ever happened on that repo is would take up many gigabytes, and developers are probably only ever going to need to look at and/or build locally 0.01% of that code themselves.
If you were going to design a version control system from scratch for the latter scenario and you had never heard of git or any other existing VCS, how would you design it? Would you come up with something like git? Probably not. People would just have local copies of the minimum of what they needed to get their work done, anything else would call some server on the VPN they were always on. And you would probably come up with some whole specialized server architecture with databases and such that wasn't that similar to a corresponding client architecture that it would also need.
A lot of companies don't use git.
I think one aspect of Git that is really important is forking, and having your own local commits. Merging commits and patches in svn were awful. You wouldn't ever allow someone random to join your svn repo, but if they can reasonably provide a patch, you could take it. Git makes that massively easier.
For me the main feature was distributed nature. SVN is OK on a gigabit corporate LAN with dedicated people to manage & maintain the servers + network. Anything less than that, and it becomes slow and unreliable.
Builds can (and usually do) depend on things that aren't part of your local checkout.
I'd say CitC is a much more accurate representation of the way Piper and blaze "expect" things to work.
Until very recently there was a versioning system for core libraries so those wouldn't typically be at HEAD (minimizing global breakage). Even that has been eliminated now and it's truly just the presubmit checks and code review process that keeps things sane.
also rollbacks :)
Another issue with git monorepos is access control, does anyone know of good solutions for this, does GVFS solve this also?
The same is true of svn, which many people like to bash nowadays, even in this discussion.
1) Transparency. I can see what everybody else is doing and if somebody has an interesting project I can find it quickly. You can also learn a lot from looking at other peoples changes.
2) Faster. To check out the source code for the project I now work on takes an hour in the distributed system, while it only took 5 minutes in the centralized system.
3) Always backed up. All code that is checked into the central repository is backed up. It has happened twice that employees have left and code was lost because they only checked it in locally.
Many have only used CVS or SVN, which are horrible. I rather use Git or Mercurial, but Perforce is really good.
This doesn't require a single central repository, just that all repositories live in a common location.
> 2) Faster. To check out the source code for the project I now work on takes an hour in the distributed system, while it only took 5 minutes in the centralized system.
What distributed repository management system do you use, and what centralized system did you use?
> 3) Always backed up. All code that is checked into the central repository is backed up. It has happened twice that employees have left and code was lost because they only checked it in locally.
As with point 1, this doesn't require a single central repository, just that all repositories live in a common location.
It's more of a matter of your tool to visualize change set history.
This again is an issue with the tool quality. There needs to be meta git repos. Groups in Github and Gitlab attempt to create a shallow sense of that.
Always push. That's not an issue that is resolved by a single central repo.
Even better, if every project includes a DOAP file (or something similar) and/or you publish commit messages using ActivityStrea.ms or something, you could easily have an interface that shows project activity around the organization, regardless of how many repositories and/or servers you use. Of course it's probably easier if all the repositories live in a common location...
There is no comparison. But let me count the ways
a) checking out stuff
It is faster than just downloading a directory using SVN.
b) just trying something out (ie. branch)
Creating a branch, making a few changes takes me seconds, and does not require me to change paths like it does for the svn victims I work with. Throwing it back out again takes seconds, and all operations are reversible for when I fuck up (which is often).
Git's merging. Oh my God. In half the cases I just have to check stuff over, if that.
We use code review. Unlike most of the subversion folks I can easily have 5 co-dependant changes in flight (5 changes, each depending on the previous one) without going insane, and I have gone up to 13, not counting experimental branches. I observe around me that it takes a good developer to manage 2 with subversion. 5 is considered insane, I bet if I showed them the 13 were in flight at the same time they'd have me taken away as a danger to humanity.
2) always backed up
Subversion doesn't back up until you commit and people don't commit anywhere near quickly enough ... The way people lose code around here 99.9% of the time is by accidentally overwriting their in-flight code contributions (the remaining 0.1% involves laptop upgrades and overenthusiastic developers. Even then cp -rp will just copy my environment and just work, and yet the same is absolutely not true for the subversion guys).
Now with Git, I commit every spelling fix I make, every semicolon I have forgotten, on occassion separately, other times with "--amend". And only then make my share of stupid mistakes, after committing, something that's technically not impossible on subversion but not practical, mostly because of code review ("just commit it" on subversion takes ~5 minutes in the very fast case (that requires a colleague dropping everything that very second, AND can't involve any actual code changes, as that trips a CI run that takes 3 minutes assuming zero contention), and 20-30 minutes is a more typical time (measured from "hey, I'd like to commit this", to actually in the repository). Committing on git takes me the time to type "<esc>! git commit % -m 'spellingfix'". The subversion commit time means that developers often go for weeks without committing. Weeks, as in plural.
I get that a git commit isn't the same thing as a subversion commit. But it does allow me to use the functionality of source control, and that's exactly what I'm looking for in a source control system. Subversion commit doesn't allow me to use source control without paying a large cost for it, that's what I'm getting at.
So I have backups guarding against the 99.9% problem (and an auto-backup script that does hourly incremental backups for the 0.1% case). The subversion guys are probably better covered for the 0.1% problem. Good for them !
3) actual version control
Git's branches, rebase, merge, etc mean I can actually work on different things within short time periods in the same codebase.
The fact that other developers are using subversion means I can have my own git hooks that I use for various automated stuff. Some fixing code layout, some warning me about style mistakes, bugs, ... (you'd be surprised how much your reputation benefits from these). Some updating parts of the codebase when I modify other parts, ... you have to be careful as these are part of the reason subversion is so slow (esp. the insistence on CI, I hear a CI run at big G, which is required before even code review can happen, takes upwards of an hour on many projects with some taking 8-9 hours)
Certainly, CI takes a long time for certain changes, but those are changes that affect everything. You'd have the same problem in a multi-repo approach if you updated a repo that everything else depended on. At some point, you have to run all of the tests on that change.
Chained code review changes, I refuse to believe that in Google version control (which is perforce according to Linus' git talk at Google) chained changes are easy. Branching in perforce is literally worse than SVN, it's a bit more like the old CVS model, and they've sort-of tried to get the SVN copy-directory model forced into the design afterwards. Also the tool support (merges ...) is bad compared to subversion and stone-age compared to Git's tools.
The one reason I keep hearing for using perforce is that perforce allows the administrator to "lock off" parts of the repository to certain users.
I've done branches and merges in Git, Subversion and CVS (and I've had someone talk me through one in Perforce, but I don't really know). Google's branch/merge experience is very likely to be somewhere between SVN and CVS, and those can accurately be referred to as "disaster" and "crime against human dignity". It's certainly not impossible, but it's very hard and you can't expect me to believe (normal developer) people can reasonably do that in Perforce.
Also: what would happen if you send out 20 chained commits, 10 of which are spelling corrections, 5 of which are trivial, compile-fixing bugs (forgot semicolon, "]" that should have been ")", etc ...), 2 of which are small changes to single expressions and 3 of which introduce a new function and some tests. Perforce, like subversion and cvs doesn't have any way of tracking stuff unless you commit it and you can almost never commit without CI and code review, so would you track changes like that, or would you just leave them in your client untracked until you're ready for a code review ?
Well, like I said, its possible to do modify things that have a lot of dependencies, at which point you run a lot of tests, but that would be truish anyway. Consider the hypothetical situation where you're changing you're modifying the `malloc` implementation in your /company/core/malloc.c`. Everything depends on this, because everything uses malloc. If you have a monorepo, you make this change, and run (basically) every unit and integration test, and it takes a while.
Alternatively, if `core` is its own repo, you run the core unittests, and then later when you bump the version of `core` that everything else depends on, you run those tests too, but now if there's a rarely encountered issue that only certain tests exercise, you notice that immediately when you run all the monorepo tests, and can be sure that the malloc change is the breakage. If you don't do that, then you notice breakages when you update `core`, or maybe you don't notice it, because its only one test failing per package, and it could just be flakyness. So noticing it is harder, and identifying the issue once you've decided there is one is harder, and now you need to rollback instead of just not releasing.
>Chained code review changes, I refuse to believe that in Google version control (which is perforce according to Linus' git talk at Google) chained changes are easy. Branching in perforce is literally worse than SVN, it's a bit more like the old CVS model, and they've sort-of tried to get the SVN copy-directory model forced into the design afterwards. Also the tool support (merges ...) is bad compared to subversion and stone-age compared to Git's tools.
Google no longer uses perforce, we use Piper (note that this is a google develped tool called Piper, not the Perforce frontend called Piper, yes this is confusing, afaik, Google's Piper came first). Piper is inspired by perforce, but is not at all the same thing. (See Citc in the article). The exact workflow I use isn't piblic (yet), but suffice to say that while Piper is perforce inspired, Perforce is not the only interface to Piper. This article even mentions a git style frontend for Piper.
>Google's branch/merge experience is very likely to be somewhere between SVN and CVS, and those can accurately be referred to as "disaster" and "crime against human dignity". It's certainly not impossible, but it's very hard and you can't expect me to believe (normal developer) people can reasonably do that in Perforce.
Suffice to say you're totally mistaken here.
>Also: what would happen if you send out 20 chained commits, 10 of which are spelling corrections, 5 of which are trivial, compile-fixing bugs (forgot semicolon, "]" that should have been ")", etc ...), 2 of which are small changes to single expressions and 3 of which introduce a new function and some tests. Perforce, like subversion and cvs doesn't have any way of tracking stuff unless you commit it and you can almost never commit without CI and code review, so would you track changes like that, or would you just leave them in your client untracked until you're ready for a code review ?
So, Piper doesn't have a concept of "untracked". Well it does, in the sense that you have to stage files to a given change, but CitC snapshots every change in a workspace. Essentially, since CitC provides a FUSE filesystem, every write is tracked independently as a delta, and it's possible to return to any previous snapshot at any time. One way to think of this concept is that every "CL" is vaguely analogous to a squashed pull request, and every save is vaguely analogous to an anonymous commit.
This means that in extreme cases, you can do something like "oh man I was working on a feature 2 months ago, but stopped working on it and didn't really need it, but now I do", and instead of starting from scratch, you can, with a few incantations, jump to you're now deleted client and recover files at a specific timestamp (for example: you could jump to the time that you ran a successful build or test).
>Also: what would happen if you send out 20 chained commits, 10 of which are spelling corrections, 5 of which are trivial, compile-fixing bugs (forgot semicolon, "]" that should have been ")", etc ...), 2 of which are small changes to single expressions and 3 of which introduce a new function and some tests.
I'd logically group them so that each resulting commit-set was a successfully building, and isolated, feature. Then, each of those would become its own CL and be sent for independent review.
I always try to have the absolute minimum in my client specs, but sometimes you do need to operate over the world.
The perforce docs are generally well written, worth looking at them.
That way, you can still have a distributed repository (Git, Mercurial, etc.) if you want. Even if some code exists only in some developer's local repository, it's presumably not that big of a deal since that code can never have made it to production.
When people talk about Google's monolithic repo they're talking about Google3. This excludes ChromeOS, Chrome and Android, which ard all Git repos that have their own toolchains. Google3 here consists of several parts:
- SrcFS. This allows you to check out only part of the repo and depend on the rest via read-only links to what you need from the rest.
- Blaze. Much like Bazel. This is the system that defines how to build various artifacts. All dependencies are explicit, meaning you can create a true dependency graph for any piece of code. This is super-important because of...
- Forge. Caching of built artifacts. The hit-rate on this is very good and it consumes a huge amount of resources given the number of artifacts produced. Forge turns build times for some binaries from hours (even days) into minutes or even seconds.
- ObjFS. SrcFS is for source files. ObjFS is for built artifacts.
This all leads to what is usually a pretty good workflow like the ability to check out directories if you want to modify them and just use the read only version if you don't. You can still step through the read only code with a debugger however.
Now Facebook I have less experience with (<6 months) but broadly there are four repos: www, fbobjc, fbandroid and fbcode (C++, Java, Thrift services, etc). At one point these were Git but for various reasons ended up being migrated to Mercurial some years ago.
The FB case (IMHO) highlights just how useful it can be to have one repo. Google uses protobufs for platform independence. FB uses GraphQL at a client level and Thrift at the service level.
So one pain point is that, for example, you can modify a GraphQL endpoint in one repo but its used by clients in others (ie mobile clients). There are lots of warnings about making backward-incompatible changes, some of them excessively pessimistic because deterministically showing something will break some mobile build in another repo is hard.
Google3 has less of these problems because the code is in the same repo. On top of that, Google has spent a vast amount of effort making it so the same build and caching systems can handle C++ server code as well as Objective-C iOS app code. Basically if you're working on Google3 you basically compile very little to nothing locally.
Engineers on Android, Chrome and ChromeOS however compile a lot of things locally and thus get far beefier workstations.
At FB the mobile build system doesn't seem to be as advanced in that there is a far higher proportion of local building.
IIRC the Git people seemed to reject the idea of large code bases. Or, rather, their solution was to use Git submodules. There was (and maybe is?) parts of the Git codebase that didn't scale because they were O(n). Apologies if I'm misspeaking here but I peripherally followed these discussions on HN and elsewhere years ago as someone from the outside looking in so I'm no authority on this.
The problem of course is that Git submodules don't give you the benefits of a single repo and I've honestly not heard anyone say anything good about Git submodules.
Just to stress, the above is just my personal experience and I hope it's taken as intended: general observations rather than complaints and definitely not arguing that one is objectively better than the other. There are simply tradeoffs.
Also, there are definite issues with Google3, like the dependency graph getting so large that even reading it in and figuring out what to build is a significant performance cost and optimization issue.
First, like in other areas, I see companies that want to "google
scale" and blindly copy the idea of monorepos but without the requisite tooling teams or cloud computing background / infrastructure that makes this possible.
Second, I worry about the coupling between unrelated products. While I admit part of this probably comes from my more libertarian world view but I have seen something as basic as a server upgrade schedule that is tailored for one product severely hurt the development of another product, to the point of almost halting development for months. I can't imagine needing a new feature or a big fix from a dependency but to be stuck because the whole company isn't ready to upgrade.
I've read of at least one less serious case of this from google with JUnit
> In 2007, Google tried to upgrade their JUnit from 3.8.x to 4.x and struggled as there was a subtle backward incompatibility in a small percentage of their usages of it. The change-set became very large, and struggled to keep up with the rate developers were adding tests.
I even worry about coupling among related products.
I could see monorepos working out well for a company that just does SaaS, and is able to get away with nice things like maintaining a single running version of the app, and continuous delivery.
Having mostly worked in companies that do shrinkwrap software or that allow different teams or clients to manage their own upgrade schedule, though, monorepo seems to me like a recipe for a codebase that is horribly resistant to change. Not just in the "big bang upgrades like JUnit4 are awful" ways described above, but also in a, "We never clean up old stuff, because most of the time when we try it breaks a bunch of other teams' code and we just nope out of that whole hassle, so barely-supported code sort of collects continuously, like dead underbrush in a forest that's never allowed to burn, until eventually it all explodes in a horrible conflagration," sort of way.
Seeing the list of things that Google keeps in a monorepo, vs things that Google keeps in Git repos, it seems like they might be thinking similarly. They've really only got a precious few products that typically run on non-Google-owned hardware, and apparently the major ones live outside the monorepo.
Breaking changes would then lead to a discussion with your team, rather than your fruitlessly trying to binary search to find the commit that broke you.
Over time, the culture at Google became that all teams need to write tests at the unit, functional, and (usually) integration level.
This depends on what you mean. Most/all consumer android applications don't run on google-owned hardware, but are in the monorepo.
That said, you're right that the whole "keep things up to date" thing is important. That's where tools like rosie and even bots come in.
I always thought it was more that the things which take open source contributions are hosted in Git while the internal things would be hosted in Google3.
Second, source level dependencies bs binary level dependencies has s a choice and a commitment.
Even at large companies release schedules can really hinder you.
I didn’t hear about the JUnit issue but I can believe it. With code bases this large you have to get really good at static analysis (dynamic languages are your enemy here), tooling for refactoring and just general hygiene of the code baee.
If anything, the stage of company where it makes most sense to have many small repos is when you have a large company with multiple unrelated products, services, teams, etc.
Mono repo will work fine for most small and medium companies without issue, even on top of git.
The need for special tooling and performance issues will only pops up when you have millions upon millions of lines of code.
This used to be true, but today these are all in fact the same hg repo (www as a possible exception, I'm unsure). The "sparse checkout" machinery disguises it, but for engineers working cross platform (e.g. React Native) it's routine to make commits that span platforms.
(All of it is kind of depressing when one's comparison is non-tech companies, though.)
I haven't heard much about Microsoft or Amazon, though I do know from a friend working at Apple their toolings are not always consistent from team to team. I would appreciate if we have someone from these other big tech companies to discuss their development workflow.
As a SRE/DevOps, I love working on internal toolings because I get to feel like creating my own programming language - I can be creative but focus on solving problems in my domains.
Yes, Google runs hosted Jenkins internally.
Startups don't need most of the things that big companies use. Trying to use them before you need them seems like an absurd waste of time.
Google could hand you the source code, and you still wouldn't be able to implement what they have and compete against them.
Execution is more important than almost anything else.
Occasionally you need to cough up a really clever idea. Those days are really rare, though.
Google has put a lot of money and effort to make their system nice. Working in a mono repo without that much effort is very frustrating, doubly so because there's nothing individual teams can do about it. It's especially worse if you can't even make team specific branches on the mono repo to try and isolate yourself from the steady stream of breaking changes elsewhere.
However if you're a team lucky enough to get out and do most things on your own git repo, then you're now the only ones responsible for making that better or worse. Fortunately there's a ton of open source to learn from and use, so taking control of your own team's destiny to get to a point better than before doesn't have to mean much work.
Sure there is. They can architect code in a way that it doesn't break heavily when other people do things.
IE abstract things reasonably. They can test things well.
and they can complain when other teams aren't doing either and it's making them less effective.
This is indeed true, the various efforts to upscale git (msft gvfs, etc) run into this and try to upstream things, but it is slow going.
This was kind of cumbersome to maintain TBH, and the fact that changes to different repos can be dependent on one another seems to strongly suggest that the code should be together in the same repo. Personally, I opt for mono-repos until I'm forced to change for whatever reason.
To me, Piper is a monolithic version control system which is geared towards good engineering practices.
As far as I know there are only two such systems in use today and the other one is very dated and older than a lot of things out there.
When people say they have worked in a monolithic repo, then they typically mean a 1 repo under one of the open source version control systems, but none of these actually do or support what is needed when working with a monolithic repo AND modern/good engineering practices.
For that specialised VCS is required and there is very few examples of that, none of which is in open source software.
Git probably could be made to do this kind of stuff, but it would require some extensions to the DAG as well as extend on it's already verbose command line set. But I think it is doable.
The question is who can do it? Most are probably under some strict NDAs.
I suppose you could do it if you had a very strict rule where absolutely everything that could affect a "unit" was inside its own directory (but never and nothing higher up than that "project root").
So you could check which sub-directory is affected within a commit and so on.
$ bazel query 'rdeps(//foo/my:target, //...)'
Of course, this query in the monorepo will take a long time or not work, because the target universe of "//..." is far too large. This is where other systems come in.
However by deriving from first principles, yes, there is no reason to re-query the transitive closure of unchanged targets' reverse deps, so caching can happen here.
For surface level changes this is often quite small. For changes to core libraries, well, you run a lot of tests.
It's usually quite nice though, and changes that break lots of projects are rolled back quite fast. Usually.
If there has not been any changes to your commit/cl and there is an already passing submit, it will just skip and submit.
Still, seems you could keep a handful of integration test environments always running? Time spent waiting your turn for one of them could well be less than time spent spinning up a whole bunch of servers.
I don't think all tests should be hermetic - the effort to make such things happen usually do not overweigh the time it takes to do them, but hey - that's what we are doing.
At least in our project, each integration test has a certain amount of overhead. Some backends are fakes (when I request X, you provide Y), some are actually booted up with the test, e.g. persistence.
Multiply this across N integration tests, have lots of demand for the same CPUs, and you're up to 30-40 minutes of integration test time.
Though, that said, some integration tests can be crazy long if they have a lot of "waitFor" style conditions. "Do this, then wait for something to happen in backend Z. Once that's done, do this, and this, and this..."
But in theory with enough servers all the integration tests could be run in parallel. So it would only take as long as the longest single test.
Parallelizing tests has diminishing returns unless you manage to dramatically reduce the setup time.
I find the larger factor is what kind of test has to run. Feature tests can take a while on CI if you're spinning up lots of embedded services (dependent services, MySQL, storage layers, etc).
Google today has separate repos (android, chrome, chromeos, google3), each with its own build system: Gradle, gyp/ninja, Portage, Blaze. There's hysterical raisins, but I wonder if Google considers it to be a good thing these projects are so different, or a wart they would prefer to fix?
in an open source system http://bitkeeper.org
What nested brings to the table is the semantics of a mono repo with the advantages of a multi repo. The whole thing walks in lockstep, if you have a 3 week old version of the kernel and you add in the testing component (subrepo) then you get the 3 week old version of the testing component, all lined up with the same heads in the same tip commit.
I get it that git won but at least steal the ideas.
Edit: BTW, bk has a bk fast-import that usually works (it doesn't like octopus merges but other than that....)
Here's my problem:
I can be working on multiple projects at the same time. Each project has multiple modules (core, api, www, admin, android, etc -- I use microservices on Google App Engine). Sometimes, some modules have "feature" branches. Oh, did I tell you I work on both Desktop and Laptop?
The problem is syncing. Before traveling, I need to make sure the projects/modules I'll be working on the go all have latest commit from Desktop.
Is there some "dashboard/overview" for all Git projects? So I can quickly tell, "Ok, all projects are at latest commit, and oh I'm working on feature branch for project X and Y."
I think if you want to make sure that all your changes are in, you need to do what most programmers do and learn to finish a programming session with a commit and push, just like you finish a sentence with a dot. Once you are used to it, the chance of forgetting are really low.
(the desktop app does, but you need to check one by one for each project)
I agree with regards to committing regularly, but sometimes life happens and one can forget.
I’ve been considering writing a python script that checks my local repos for uncommitted/unpushed changes and - now I think about it - perhaps also runs when I start a new terminal session just for good measure.
Stuff that just made you say, "Great, I wouldn't have been able to do that if it was in separate repositories!".
These sorts of small, general, large scale cleanup commits are quite common at Google, and they're encouraged. They help keep the codebase healthy. There are special groups that review them so that all of the individual teams affected don't have to bother, and there are tools to manage the additional testing and approval requirements for such a change.
At my previous company, making such a change would have been a major undertaking. I never would have considered a refactor of that scale without a critical need. They had thousands of packages, each of which had its own repository and an incredibly complex web of build and runtime dependencies. It was a nightmare, and fiddling to find a working sets of versions of internal dependencies took up way, way too much of my time each day.
With a fragmented large code base you're in a world of hurt because you're dealing with versioning. There is no guarantee of when every other dependency will migrate to the latest code path.
But again, if you're in a 1-10 person team working on some trivial codebase, a monorepo might not be helpful. If you have 500 engineers working on a single codebase, tradeoffs change.
 - https://buckbuild.com
2. Having no friction to change anything makes you far more productive and ambitious.
3. Scripting at a org-level means you can automate things more easily and more in depth.
We run an entirely node stack so Lerna enables this in the first place. Given that, I'd never move to more than one repo if possible. It's almost all downside: more overhead/fragmentation, less control, more wasted time/mental overhead moving between things, API friction that reduces ambitious change.
Only downside of monorepo is Github not supporting them well. If you want to release some sub-packages as OSS, or want to use GH to track issues you're stuck using one big repo to handle everything. I'd bet Github fixes this within the next year or so though.
Let's say you introduce a breaking change in lib A that is used in libs B and C. First problem is visibility, that A does not necessarily see that it is used in C and D. Second, you the build should break immediately and not until someone build C/D.
How are my changes shared with the reviewer, if I there is no feature branch? Is my local code uploaded to that review tool mentioned in the article? And then what happens if the reviewer requests changes?
I probably did get this completely wrong, so thanks in advance for pushing me in the right direction.
This architecture assigns each line of code a nested history: the public commit log, and also the sub-history of each commit, which evolved during code review.
IMO it would be better if the code review changes were manifest in the public commit log (e.g. via feature branches), instead of being tracked separately. The code review layers add duplicative complexity.
Unsubmitted changes at Google usually come in one of two flavors, short-lived (abandoned or submitted within a few days) or perpetual. The latter flavor is often for "I think we might want this". It's not uncommon for those to be completely rewritten if they're actually needed. There's usually a preference for submitting useful things (with tests!) and flag gating them to cut down on bitrot.
I have seen exceptions -- I reported a bug in a fiddly bit of epoll-related code and an engineer on my team had a multi-year-old fix -- he hadn't submitted it because he wasn't confident he'd found an actual bug. The final changelist number was more than double the original CL number (unsubmitted changes get re-numbered to fit in sequence when they're submitted -- the original number redirects to the final submitted version in our tooling).
It's pretty much the same as GitHub pull requests - the changelists are supposed to be decently short lived and if the master code changes it's up to you to resolve conflicts and get it into a mergeable state again.
when the CL is first uploaded or sent for review the OCL is assigned, the CL number is assigned on commit.