
We need a new generation of source control - ariehkovler
https://www.rookout.com/cant-git-no-satisfaction-why-we-need-a-new-gen-source-control/
======
hliyan
Are we mistaking a dependency control problem as a revision control problem?

In a previous life, before microservices, CI/CD etc. existed, we did just fine
with 20-30 CVS repositories, each representing a separate component (a running
process) in a very large distributed system.

The only difference was that we did not have to marshal a large number of 3rd
party dependencies that were constantly undergoing version changes. We
basically relied on C++, the standard template library and a tightly version
controlled set of internal libraries with a single stable version shared
across the entire org. The whole system would have been between 750,000 -
1,000,000 lines of code (libraries included).

I'm not saying that that's the right approach. But it's mind boggling for me
that we can't solve this problem easily anymore.

~~~
brightball
You’re not wrong. Part of it is the willingness of people to reach for a
dependency that amounts to a few lines of code to avoid.

It would be nice if there was a tool that could help you identify just how
much of each dependency you actually depend on so you could trim it.

~~~
joshuamorton
These things all exist if you use something like bazel/pants/buck to manage
your dependencies. When you can construct a DAG of the entire dependency
structure you can see exactly how much you depend on any given thing (and get
fun dot-graphs of it!). But that requires being precise with dependency
declaration in a way that a lot of people don't want to be.

~~~
geezerjay
> But that requires being precise with dependency declaration in a way that a
> lot of people don't want to be.

Some programming language stacks already fix that problem in a transparent
way. Take Microsoft's .NET Core+Nuget stack. Developers can add packages to a
project without specifying a version number (implicitly it's the latest
release) and dependencies are checked when all dependencies are restored.

IIRC Rust's cargo also follow a similar approach, and so do npm and yarn. So,
that's pretty much standard at this point.

------
jrockway
The source control system is not the piece of the equation that matters to
most people. The build system is the important part. That's what prevents you
from rebuilding the repository when you only change one Kubernetes config
file, or what causes 100 docker images to be built because you changed a file
in libc.

I think the tooling around this is fairly limited right now. I feel that most
people are hoping docker caches stuff intelligently, which it doesn't. People
should probably be using Bazel, but language support is hit-or-miss and it's
very complicated. (It's aggravated by the fact that every language now
considers itself responsible for building its own code. go "just works", which
is great, but it's hard to translate that local caching to something that can
be spread among multiple build workers. Bazel attempts to make all that work,
but it basically has to start from scratch, which is unfortunate. It also
means that you can't just start using some crazy new language unless you want
to now support it in the build system. We all hate Makefiles, but the whole
"foo.c becomes foo.o" model was much more straightforward than what languages
do today.)

------
arianvanp
The argument of why monorepos suck seems to largely rely on "CI Sucks" in this
article. But I beg to differ. Monorepos only work in combination with a build
system that tracks dependencies carefully.

I contribute a lot to Nixpkgs, which is a monorepo with almost 50000
subcomponents [1], but because the build tool and CI track changes through
hashes, changing a package only triggers rebuilds of other packages that
depend on it and builds are super quick. It accomplishes this by heavily
caching previous builds and sharing those between all builders.

No, monorepos are not going to work with a CI and build tool that always
builds everything from scratch and does no caching. Instead, you should pick
the right tool for the job, and go with a build system like Nix, Buck, Bazel
or Please which were designed with monorepos in mind.

I think the second point the author makes, but only very briefly, is way more
important to look at. Is git itself up to the job for such large repositories?
One problem I've started running into in nixpkgs is that `git blame` takes
considerable time to even execute, due to the enormous volume of commits in
the repository. I would love to see a version control system that is optimised
for storing lots of loosely connected components, and has better support for
partial checkouts. I haven't found it yet, and I would love to hear what
others are using for this.

I hear facebook has some modification of mercucurial. And Google probably
created something themselves in-house. But is there anything open-source that
supports these workflows at scale?

[1]
[https://repology.org/repository/nix_unstable](https://repology.org/repository/nix_unstable)

~~~
ljm
I agree with the first part of this. If by CI you mean something like Circle
or Google Cloud Build or Travis, then your CI is pretty much limited to
whatever you can fit in a YAML file, and what the CI service will support in
that.

YAML in and of itself is not the easiest thing to parse when you have multiple
layers of nesting and a lot of lines.

I don't really want to see what a CircleCI config would look like for Nixpkgs.

Once you get to the point of scaling your CI you're looking at tailored
infrastructure to make sure you're only building what needs to be built.

~~~
arianvanp
I'm honestly surprised that Google Cloud doesn't offer a "CloudBazel" product!

~~~
jsty
Something along those lines seems to be in the works:

[https://blog.bazel.build/2018/10/05/remote-build-
execution.h...](https://blog.bazel.build/2018/10/05/remote-build-
execution.html)

~~~
jingwen
More information on how to get access to Remote Build Execution for Bazel on
GCP: [https://docs.bazel.build/versions/master/remote-
execution.ht...](https://docs.bazel.build/versions/master/remote-
execution.html)

(Disclaimer: I'm a engineer on the Bazel team)

------
mikece
I like the idea of creating a source control __protocol __that can be
implemented with any number of tools rather than having wars over particular
implementations of source control products.

(And would Git really have beaten Mercurial if GitHub had been HgHub instead?
GitHub's success was more about process than the technology of Git, IMO.)

~~~
weberc2
> And would Git really have beaten Mercurial if GitHub had been HgHub instead?
> GitHub's success was more about process than the technology of Git, IMO.

Hg is a much better user experience than git, that's for sure. Git won because
of Github, which may have beaten any HgHub simply because Git has an actual
API while Mercurial's "API" is "use subprocessing". In other words, if
Mercurial gave a damn about the _developer_ experience earlier on, it might
well have won the war.

~~~
masklinn
> which may have beaten any HgHub simply because Git has an actual API

Git doesn't though. A bunch of shell scripts calling shell scripts calling a
few native binaries is pretty much "use subprocessing". libgit came much
later, it wasn't part of the original git.

 _However_ what git _did_ provide was an open, stable, fairly simple and
officially supported physical model with which you could easily interact
directly, and protocols which either worked on that (file and "dumb http") or
a relatively simple exchange protocol (the "pack protocol"
[https://github.com/git/git/blob/9b011b2fe5379f76c53f20b964d7...](https://github.com/git/git/blob/9b011b2fe5379f76c53f20b964d7a3cecb9d8c79/Documentation/pack-
protocol.txt)).

Hell, if anything hg's always provided more _API_ than git, the extension
model wouldn't be possible without it e.g. stdout coloration could be an hg
plugin while it had to be implemented in each git command.

~~~
weberc2
It looks like you're right about the history. According to the git repo,
libgit's first commit was in October of 2008 while Github was incorporated in
early 2008 (according to Wikipedia).

Github's popularity was probably due to Git's popularity in the Ruby community
which may have been due to the official support of the physical model and
simple protocols.

That said, an "officially supported physical model" _is_ an API even if I
originally had libgit in mind. Also, none of this invalidates the broader
point, which is that Git won because of Github, not because of user
experience.

------
zdragnar
The title felt a bit misleading- this is more a gripe of git and the
approaches to mono and multi repos with git.

I'm not following the call for something new, though:

> A source control that treats CI, CD, and releases as first-class citizens,
> rather than relying on the very useful add-ons provided by GitHub and its
> community.

I'm not a die-hard every-tool-should-do-exactly-one-thing-the-way-the-unix-
gods-intended type of person, but in this case, I really feel that source
control should stick to being source control. Hooks and add-ons are great
precisely because things like CI and CD came after, and who knows what the new
rage will be 5 or 10 years from now.

Building everything for today's workflow into a single tool means that by the
time it's ready, the "today's" workflow won't be cool anymore, and we'll have
other newer tools and processes that this new source control can't support :/

~~~
sanderjd
You can already adapt mercurial to do different stuff very much like the
author suggests. As a user (but not developer) of such adaptations, I think it
works really well.

------
rabi_penguin
I don't know if I really follow the conclusion from this blog post although I
sympathize with the complaints. Let's take one point: " In fact, not only will
Git CI tools rebuild and redeploy your entire repo, they are often built
explicitly for multi-repo projects." This seems patently wrong. On buildkite,
which we use, you can explicitly set up build steps to trigger based on
directory patterns.

In my experience on teams at growing companies, I've seen pain points around
continuous integration, configuration management, integration testing,
dev/prod parity, feature flagging and releasing, provisioning staging servers
in terms of pure tech/infra issues. Beyond that, I've seen more pressing
general organizational issues around tech debt, software design collaboration,
architectural debt, code review processes -- these are all pressing and valid
concerns. But I just find the conclusions of this blog post flat out wrong. To
conflate an unsatisfactory CI choice and configuration (which is totally
reasonable) with a failure of version control is a pretty serious one. It
doesn't fully disprove the thesis, but it certainly doesn't lend it support.

If you've installed a wheel onto a poorly set up suspension and get handling
issues, does it mean you should reinvent that wheel, or does it mean you
should check if your suspension may need some tuning?

------
apostacy
This is such a pain point for me.

I would LOVE something between subtrees and submodules.

I have explored this many times, and if I had the ability to write something
like this, I would.

I would love it if I could have a child repo that did not require an external
remote and could be bundled and stored within a parent repo, unlike a
submodule. But I would also like it if it could be more decoupled from the
history unlike a subtree.

I can get most of what I want from submodules and subtrees, but not really
enough.

It might be possible without even having to change git. Perhaps if there were
a way to have branch namespaces of some kind, and I could have a subtree have
completely separate history, but have it checked out within the same working
tree. Many of my projects that are submodules only make sense within their
parent repo, and it is really redundant to have an external repo for them. But
I also don't like to have to do expensive surgery to deal with subtrees, and
it would be nice to not have it be completely merged.

My dream is to be able to drop a repo inside another repo and have git just
treat it as if it were part of the parent repo. And then to be able to bundle
the child repo to the parent and push it.

I know that it is mostly possible to do this already, but it is not easy or
intuitive.

~~~
Rotareti
_> My dream is to be able to drop a repo inside another repo and have git just
treat it as if it were part of the parent repo. And then to be able to bundle
the child repo to the parent and push it._

I'm not sure if I understand you right, but I think I made what you describe:
[https://github.com/feluxe/gitsub](https://github.com/feluxe/gitsub)

It's a simple wrapper around git, that allows nested git repositories, with
almost no overhead.

I use it for a private library (the parent-repo), which itself contains
modules (the child-repos) that I open sourced on github. It works fine for my
use case. I wrote it, because I found "submodule" and "subtree" too
complicated. 'gitsub' is still in alpha.

~~~
apostacy
Thank you for sharing that! That's really cool!

I'm just very attracted to the idea of bundling repos together. I frequently
use git-annex and datalad, and try to keep binaries and helper scripts in
different repositories.

------
lucozade
I know it's wrong of me. Genuinely, I know. But when your second paragraph
states that Google and Netflix pioneered horizontally scalable processes...

It makes it so hard to read the remainder untainted by a certain amount of
scepticism.

Fortunately, he's not actually saying anything much in the article so I don't
think my irrational reaction to ignorance will mean I've missed something
important. But still...

------
hashkb
I'm not sure the article supports its thesis with anything concrete. It's the
author's opinion that submodules and some scripts are inadequate, but in my
experience, a variety of reliable and flexible developer experiences are
possible.

Many devs barely scratch the surface of what git can do anyway. Onboarding
them on a few extra scripts seems better than an entirely new scm tool.

------
booleandilemma
I’m just now getting comfortable with Git, please don’t do this to me.

~~~
intertextuality
Your opinions should not be swayed by one article. Have some resolve in what
you do.

Git is great. It also has issues. Scaling has issues. Changing a tool won't
solve scaling issues.

Anecdotally I think submodules work just fine, although the git submodule tool
is not intuitive. Then again, I work in a very small team on small projects
compared to these mammoths being discussed with monorepos and the like.

~~~
majewsky
> Your opinions should not be swayed by one article.

Opinions should ideally not be swayed by the volume of text making arguments,
but by the coherence and logic of those arguments.

~~~
intertextuality
Sure. But in order to see all sides of the argument and make an informed
decision, one should read at least a few different sources, no?

------
zamalek
> Take no prisoners: mono-repos suck too

We are transitioning to multi-repos because we have been burned so hard by our
mono-repo. Builds used to take 3 hours on the monster and we managed to get
them down to 30 minutes, but we are truly at the bottom of the barrel. God
help us if there is a build failure, every subsequent build fails while we
scramble to identify the problem (and we can only sample success or failure
every 30 minutes). It's a house of cards and it's horrible.

> Shots fired: multi-repos suck

We've already had debugging woes with this combined with internal package
feeds (you have to pull down the code, build it, remove the package and
replace it with the local code), which has made us very bearish on code re-
use. That rigmarole sucks way less than mono-repos.

> You can’t have your cake and Git it too

Combine version control and package managers IMO. Go does one half of this. If
you work under GOPATH with _all_ your code, you can easily jump across repos
to make changes and have those changes immediately propagate to the initial
repo. Your hard-disk becomes the mono-repo. What Go doesn't do is pull
binaries down from package feeds. There needs to be some simple mechanism to
switch between builds and code.

------
sanderjd
Isn't the argument here that Git is not good at mono-repos rather than that
mono-repos suck? This seems true to me, but there are already other options
that suck less if you want the advantages of a mono-repo.

I would also suggest that mono-repos work better with statically typed
languages with module boundaries and visibility control. The problem of
anything being able to touch anything else is not so bad when you can hide
implementation details behind small APIs.

I have definitely felt some pain with having Ruby projects in a single repo
using git, but much less so with Java projects using Hg.

------
dclowd9901
I don't really understand the problem with multi-repos. Maybe because I'm a FE
developer, I'm shielded from some of the pain of launching an external service
locally, but I find the process of needing to update an external module to be
rather similar to open source dev: clone, link, fix, PR, approve, merge,
update package.json. It's that simple. The only part that can present itself
as somewhat tricky is how your environment handles linked dependencies, but
that can be resolved if you've configured your webpack or whatever correctly.

------
joshjb17
The answer to your problem is Nix:

[https://nixos.org/nix/](https://nixos.org/nix/)

~~~
majewsky
Fun fact: "Nix" is a colloquial variant of the German word for "nothing".
Whenever we talk about NixOS at our hackerspace, hilariousness ensues.

------
netheril96
Google uses a monorepo for most of its code and I find it a much better
experience than what I’ve had in the past. But that good experience is
predicated on a lot of Google only internal tools. If Google open sources
enough of those such that people outside can have the same experience, maybe
the debate will end decisively in favor of monorepo.

~~~
khazhou
What kinds of tools?

~~~
grillermo
You can read about all the tools in this article
[https://cacm.acm.org/magazines/2016/7/204032-why-google-
stor...](https://cacm.acm.org/magazines/2016/7/204032-why-google-stores-
billions-of-lines-of-code-in-a-single-repository/fulltext)

------
majewsky
> Each side of this debate classifies the other as zealous extremists (as only
> developers can!), but both of them miss the crux of the matter.

I take it the author has never had an interest in politics.

------
1023bytes
I think that Git submodules are to blame. I think that the idea is great, but
the implementation is cumbersome. If they were better to use it would solve a
lot of these problems.

~~~
jrockway
I think it's irrelevant. If your build depends on inputs from two
repositories, it's the same level of complexity as having the inputs all come
from one repository. You might have two .git directories, but if the code is
coupled, it's just a monorepo in two directories.

Ultimately the problem is in scaling the number of build inputs, not the
number of .git directories.

------
niftich
The field does suffer a bit from version control, dependency management,
language compilers, and build and packaging tools all being single-purpose
tools that are layered, where one can't introspect others beyond the public
API, and manual effort or simplistic not-always-true assumptions have to be
used to bridge information from one to the other.

It's tempting to imagine an integrated system where making changes to a piece
of source code automatically commits every change, every commit will attempt
to compile and build, every successful build auto-packages into a new artifact
with a new build version. The language and the build system would ensure that
all builds are reproducible. Because of this, all builds can be addressed by
identity (content hash) too, not just a name and a build number within some
namespace.

When any dependency of the current project has newer builds, one could choose
to pull up an interactive diff experience to step through the code of newer
versions. This would aid in selecting a different version on which to depend,
if desired. If a different version of a dependency is picked up, a new build
gets triggered too, and a successful build gets a new build version.

The strong linkage between source code revision and build version, the
deterministic builds, and content-based artifact addressing work together to
ease the traceability of changes and the reusability of artifacts, and
sidesteps concerns about the hosting and namespacing of source code and build
artifacts interfering with the project's "single source of truth", because any
copy of an artifact, known by any name, irrelevant of its location, will share
the same hash.

There will still be usability problems with such a system too. There would be
no way to strip data out. A shelve, replay, and cherry-pick frontend would be
necessary to allow the doctoring of input before it's committed permanently --
but in such a system, only permanently committed code can be built. The
workflow to prepare a project for public consumption would be to author and
test all the changes in a 'scratch' project that doesn't auto-disseminate its
build artifacts elsewhere, and cherry-pick the changes into a public project.
Public projects could only have public dependencies.

Configuration files, data files, and pieces making up a larger environment may
need a different approach. Nonetheless, a lot of these problems take the same
shape: some input should deterministically produce some output, and a running
system may choose to alter its own state by interfacing with a stateful
outside world (e.g. load or write files, communicate through a network). The
sensible places of drawing a boundary between the inside world and outside
world will differ for every use-case.

------
decebalus1
What a bullshit pointless article. And oh god the cringey git wordplay... I'm
starting to feel this is basically blogspam.

~~~
kadendogthing
It IS blog spam. The article doesn't say anything besides "everything sucks."
There is not a single constructive point being made besides some fantasy world
where our tools are drop dead gorgeous and the build pipelines are well oiled
and never have to be paid attention to.

------
kadendogthing
As I've stated in another post on here, what's the point of these articles? It
just says everything sucks, but doesn't really dive into why or how we could
possibly fix any issues they may directly point out. Also it kind of sounds
like the author really doesn't have any idea what GitLab is or does, so maybe
he should check it out.

But allow me to retort these bald assertions presented in the article:

Monorepos are great.

Multirepos are great.

Git is the best source control system ever. And if you think it could do
something better, well have I got news for you. It's completely open source
and extendable with various script entry points and an easily accessibly API.

Thanks for reading my blog.

~~~
klodolph
> Git is the best source control system ever.

To be clear, I'm not disagreeing. But it is simply not good enough. Any new
generation of source control needs to be able to do things that are difficult
with Git, and Git simply isn't extensible enough. Microsoft has a Git VFS, and
there's Git LFS, but this just doesn't go far enough.

There are good technical reasons why you would use Perforce or even Subversion
these days.

The people who made Git made it for working on large, but not huge, open-
source code repositories with a traditional model. It doesn't work so well for
vendoring, it doesn't work well for artists, it doesn't have locking, it
doesn't have access controls (and there's only so much you can add). You can
argue that these features don't make sense or we're using Git "wrong" or I can
write a bunch of hooks but at some point I just want them to work and I'm
tired of fighting with Git to make it happen.

Just personal background, these days I work with closed source and open
source, monorepos and multirepos, Git, Subversion, and Perforce all on a
regular basis (and sometimes use weird custom setups). Git is by far the most
familiar of the three, and I've published some tools for Git repo surgery.

~~~
hashkb
> There are good technical reasons why you would use Perforce or even
> Subversion days.

Can you say more? What are some of those reasons? Or link to some data or
examples?

~~~
Tempest1981
With a monorepo, how do you avoid the situation where almost every time you
want to commit, you have to pull-and-rebase first? Because somebody has always
pushed a change, every minute or two.

~~~
majewsky
In a repo that large, you don't want to have random people pushing to master
anyway. Have people commit to branches, and then automation merges the
approved branches into master. ("Automation" may be as simple as the "Merge"
button in Github's UI, or more complex if necessary.)

