Hacker News new | past | comments | ask | show | jobs | submit login

Is it just me, or are a lot of people here conflating source control management and dependency management? The two don't have to be combined. For example, if you have Python Project X that depends on Python Project Y, you can either have them A) in different scm repos, with a requirements.txt link to a server that hosts the wheel artifact, B) have them in the same repo and refer to each other from source, or C) have them in the same repository, but still have Project X list its dependency of project Y in a requirements.txt file at a particular version. With the last option, you get the benefit of mono-repo tooling (easier search, versioning, etc) but you can control your own dependencies if you want.

edit: I do have one question though, does googles internal tool handle permissions on a granular basis?




The key here is reverse dependency management. “If I change X, what would influence this change?”.

This can be achieved with single repo better than multi-repo due to the completeness of the (dependency) graph.


Exactly this. Or at least it's a way this can be achieved, assuming solid testing & some tooling in the mix.

For folks unfamiliar with it, the issue is something like:

1. You find a bug in a library A.

2. Libraries B, C and D depend on A.

3. B, C and D in turn are used by various applications.

How do you fix a bug in A? Well, "normal" workflow would be something like: fix the bug in A, submit a PR, wait for a CI build, get the PR signed off, merge, wait for another CI build, cut a release of A. Bump versions in B, C and D, submit PRs, get them signed off, CI builds, cut a release of each. Now find all users of B, C and D, submit PRs, get them signed off, CI builds, cut more releases ...

Now imagine the same problem where dependency chains are a lot more than three levels deep. Then throw in a rat's nest of interdependencies so it's not some nice clean tree but some sprawling graph. Hundreds/thousands of repos owned by dozens/hundreds of teams.

See where this is going? A small change can take hours and hours just to make a fix. Remember this pain applies to every change you might need to make in any shared dependency. Bug fixes become a headache. Large-scale refactors are right out. Every project pays for earlier bad decisions. And all this ignores version incompatibilities because folks don't stay on the latest & greatest versions of things. Productivity grinds to a halt.

It's easy to think "oh, well that's just bad engineering", but there's more to it than that I think. It seems like most companies die young/small/simple & existing dependency management tooling doesn't really lend itself well to fast-paced internal change at scale.

So having run into this problem, folks like Google, Twitter, etc. use monorepos to help address some of this. Folks like Netflix stuck it out with the multi-repo thing, but lean on tooling [0] to automate some of the version bumping silliness. I think most companies that hit this problem just give up on sharing any meaningful amount of code & build silos at the organizational/process level. Each approach has its own pros & cons.

Again, it's easy to underestimate the pain when the company is young & able to move quickly. Once upon a time I was on the other side of this argument, arguing against a monorepo -- but now here I am effectively arguing the opposition's point. :)

[0] https://github.com/nebula-plugins/gradle-dependency-lock-plu...


> So having run into this problem, folks like Google, Twitter, etc. use monorepos to help address some of this.

I think you’re retroactively claiming that Google actively anticipated this in their choice at the beginning of using Perforce as an SCM. They may believe that it’s still the best option for them, but as I understand it, to make it work they bought a license to the Perforce source code forked it and practically rewrote it to work.

Here’s a tech talk Linus gave at Google in 2007: https://youtu.be/4XpnKHJAok8

My theory (I wonder if someone can confirm this), is that Google was under pressure at that point with team size and Perforce’s limitations. Git would have been an entirely different direction had they chosen to ditch p4 and instead use git. What would have happened in the Git space earlier if that had happened? Fun to think about... but maybe Go would have had a package manager earlier ;)


> I think you’re retroactively claiming that Google actively anticipated this in their choice at the beginning of using Perforce as an SCM.

Oh I didn't mean to imply exactly that, but really good point. I just meant that it seems like folks don't typically _anticipate_ these issues so much as they're forced into it by ossifying velocity in the face of sudden wild success. I know at least a few examples of this happening -- but you're right, those folks were using Git.

In Google's case, maybe it's simply that their centralized VCS led them down a certain path, their tooling grew out of that & they came to realize some of the benefits along the way. I'd be interested to know too. :)


Maybe Google’s choice for monorepo was pure chance. However, on many occasions the choice was challenged and these kinds of arguments were (successfully) made in order for it to stay.


There's a subtler, and potentially more important thing that can crop up with your scenario:

Library A realises that its interface could be improved, but it would not be backwards incompatible. In the best case scenario, with semver, there is a cost to this change. Users have to bump versions and rewrite code, maybe the maintainer of Library A has to keep 2 versions of a function to ease the pain for users. It may just be that B, C and D trust A less because the interface keeps changing. All this can mean an unconscious pressure to not change and improve interfaces, and adds pain when they do.

Doing it in a monorepo can mean that the developers of A can just go around and fix all the calls if they want to make the change, allowing for greater freedom to fix issues with interfaces between modules. And that is really important in large complex systems with interdependent pieces.


This is my biggest gripe in discussions like this as well, dependency management and source control are two completely different things. It should be convenient to use one to find the other but they should not necessarily be 1-1 coupled together with each other.

1. A single repo should be able to produce multiple artifacts. 2. It should be possible to use multiple repos to produce one artifact. 3. It should be possible to have revisions in your source control that don't build. 4. It should be possible to produce artifacts that depend on things not even stored in a repo, think build environment or cryptographic keys etc. An increase in version number could simply be an exchange of the keys.


Number three I disagree with. Bisection depends on build (and test) always working on trunk.


Single repo is one design that coherently addresses source control management and dependency management.

The key is to let the repo be a single comprehensive source of data for building arbitrary artifacts.


> The key is to let the repo be a single comprehensive source of data for building arbitrary artifacts.

By that do you mean it's one way of doing it, or that it's the only way?

Seems clear to me that it's not the only way. For instance .Net code tends to be Git for the project source + NuGet for external dependencies. It works pretty well.


It's one way. There isnt any problem that can only be solved in one way.


I don't know what this means.

How is "single repo" a "design" and how does this design dictate dependency management?

Yes, if you have a single repo then that would be a single source of data for building your stuff. That seems redundant.


See Bazel, you have the depes manifested as source controlled the data, then you can build everything as deterministically as possible.

Then you can manage dependency as part of the normal source control process.


A single repo makes it a bit tricky to use some library in version A for project X and version B for project Y.


Correct.

You can consider that a bad thing or a good thing.

Most language's package composition (C/C++, Java, Python, Ruby) don't permit running multiple versions at runtime. The single-version policy is one way of addressing dependency hell.


I think that's actually a good thing. Allowing different projects to use different versions of a 3rd-party package may be convenient for developers in the short term, but it creates bigger problems in the long term.


It depends on the industry. In some places changing a dependency, no matter how trivial the change, entails a lot of work. Think for example about embedded systems where deploying is a lot harder than pushing a Docker image somewhere. It is often far cheaper to analyze whether the fixed bug can be triggered to avoid upgrading unless necessary.


In those situations, why not go ahead and keep the code up-to-date and consistent, and simply not deploy when you don't need to?


Because that costs money now that could be spent on something that actually produces a profit.


If I recall, in Google's build system, a dependency in the source tree can be referenced at a commit ID, so you can actually have a dependency on an earlier version of artifacts in source control.


No, that's not true since at least 2013 (the year I joined Google).


Yes, Google's internal tool handles permissions based on directory owners.

They use the same OWNERS-file model as in the Chromium project [1], the only difference being the tooling (Chromim is git, google3 is ... its own Perforce-based thing).

[1] https://chromium.googlesource.com/chromium/src/+/lkcr/docs/c...


I can't comment specifically on Google's tool, but I know it's based on perforce. perforce does have granular permissions - https://www.perforce.com/perforce/r15.1/manuals/p4sag/chapte...


> The two don't have to be combined.

They do have to be combined in some way, at least to be reproducible. Your requirements.txt example is one way of combining version control + dependencies: give code an explciit version and depend on it elsewhere by that version.

Google has chosen to do combine them in a different way, where ever commit of a library implicitly produces a new version, and all downstream projects use that.

> googles internal tool handle permissions on a granular basis?

Not sure what you mean...it's build tool handles package visibilty (https://docs.bazel.build/versions/master/be/common-definitio...). It's version control tool handles edit permissions (https://github.com/bkeepers/OWNERS).


It is very tempting to believe that a monorepo will solve all your dependency issues. If you have a project that's say pure python consisting of a client app, a server app, and then a dozen libs, that might actually be true, since you force everyone to always have the latest version of everything, and always be running the latest version. Given a somewhat sane code base and smart IDE, refactoring is really easy and and updates everything atomically.

In reality you often have different components, some written in different languages, at a certain size, not everyone has all the build environment set up and might be working with older binaries, and now it's just as easy to have version mismatches, structural incompatibilities, etc. So you need a strong tooling and integration process to go along with your monorepo. The repo alone doesn't solve all your problems.


Maybe this is a reflection of modern tools using the version control system to store built artifacts, like npm and "Go get" do. Anyway, depending on the programming language, you can have a monorepo and still bind your modules with artifact dependecy, not necessarily depending on the code itself.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: