
Git submodules revisited - fanf2
https://dev.to/dwd/git-submodules-revisited-1p54
======
alexhayes
I'm unsure why most folks seem to get into a pickle over git submodules but I
think it's essentially due to the mental model of how they think about the
structure of the "root" project and it's relation to the submodules. This in
turn seems to govern how they work with the root project and it's submodules.

I kind of think of the relationship as a pointer into what should be a
entirely separately managed project. Changes to that project should happen in
that project and only then should the "pointer" be modified. I usually go as
far as checking out the submodule separately (to my "projects" directory) when
I need to work on it, which of course entirely unnecessary, but for me helps
keep that separation.

Another approach I've found that helps to combat issues some people have with
them is to not have them littered throughout a project but have a very clear
delineation between what are submodule and "root" project git managed files -
ie. an appropriately named, usually top level, directory.

In practice I find them an extremely useful way of getting stuff done
effectively and efficiently!

~~~
pknopf
I agree. I use them extensively with Yocto and the different layers
(submodules) that make up my operating system. People have an adversion to
them, but I find them easy to work with and very useful.

------
lilactown
git submodules is just /begging/ for tooling.

On the surface, the commands are fairly verbose. And (as the article pointed
out) the documentation on it could use some love.

However the biggest problem I have encountered while using it, is that what
code I actually have checked out is opaque without doing the song and dance of
actually checking out each submodule & inspecting the dir. This sucks when
evaluating the codebase without cloning e.g. in code review on GitLab/GitHub.
And the commands/UX for updating them is just painful and error prone.

Really, the difference between putting something in a git submodule and
putting something in a private npm/maven/etc. repo is that I can look at a
file and read:

    
    
        my-dependency v1.1.5
    

And my human-brain can kind of know what that means.

Whereas if I look at a repo with a submodule, I just see a URL and a SHA. And
while actually I think that this is a better model for keeping track of
versions, it's terrible from a UX perspective.

There's also a whole host of tooling around your languages chosen
artifact/package/dependency management that git submodules don't have yet.
It's often supported by the same tools we use for compiling and other task
running. Git submodules require another thing that lives outside of that
ecosystem.

I kind of wish that a mainstream language would just adopt git submodules as
part of their de facto package management strategy and build the tools on top
of it we need to make it livable.

~~~
CMCDragonkai
You can use tags for human readable names of commit hashes. But never rely on
tags for security!

~~~
lilactown
I don't think the tag actually is a property of the submodule steps itself (as
in, it's not written in the .gitmodules), but yes you can manually check out a
tag.

------
Too
_" Git's submodules are so universally derided that there's practically an
entire industry devoted to providing alternatives for managing dependencies._"
Stop right there and repeat after me:

Git is not a dependency manager, git is not a dependency manager, git is not a
dependency manager, git is not a dependency manager.

Seriously, use the package manager provided by your build system. Some of them
can point to git repos if you don't have a proper package registry, this is a
better solution than plain submodules, for many reasons; Transitive
dependencies, diamond dependencies, semantic versionion, can't forget to
submodule update --recursive, etc, etc.

~~~
usr1106
>Seriously, use the package manager provided by your build system.

Git is about source code. How would you use the package manager to manage
source code?

The original author seems to he worried that some part of the system is not
being updated to the newest available source before being built. I don't know
of any system that would somehow automate that process.

The suggested submodule system might work as long as branch names don't
change. But in real life they do change, old branches go out of maintenance.
So in the end there must be a human making sure you don't miss anything.

~~~
tatersolid
> The original author seems to he worried that some part of the system is not
> being updated to the newest available source before being built. I don't
> know of any system that would somehow automate that process.

Subversion with _svn:external_ does exactly this by default, and generally you
only peg a revision in an external dependency when you tag for release.

------
newnewpdro
The worst thing about using git submodules is it's no longer a simple matter
of `git clone url://to.repo` to clone a project. Then, if the person doing the
clone is unfamiliar with submodules, it's unobvious how they go about fetching
the submodules separately.

The next worse thing about submodules is when you add a submodule to a
project, if that submodule has submodules of its own it's completely unobvious
how to perform the submodule addition recursively. The `git add module`
command doesn't recognize `--recursive`. IIRC the way you work around this is
via the magic incantation `git submodule update --init --recursive` after
adding the submodule having its own submodules.

I really like submodules conceptually but the current UX surrounding their
implementation is _awful_.

~~~
keithnz
try using [https://github.com/ingydotnet/git-
subrepo](https://github.com/ingydotnet/git-subrepo) that gives you the ability
to simply clone a repo.

~~~
newnewpdro
The point is that git doesn't do the right thing.

A random git user with just the URL for my submodule-using repository isn't
going to know to use some special thing to clone the repository. They're going
to run `git clone URL` and then be frustrated by the results.

~~~
keithnz
my point was, that if you use git-subrepo instead, then you get exactly that.

~~~
newnewpdro
git already supports `git clone --recursive[-submodules] URL`, there's no need
to use git-subrepo to achieve the recursive clone.

The problem is the requirement of prior knowledge about the repository's use
of submodules. The plain clone won't even report any sort of feedback about
the submodules being present and skipped, nor any hint as to how to retrieve
them: `git submodule update --init --recursive`.

The presence of git-subrepo and `git clone --recursive` is of little
consequence from the perspective of the many users who are now familiar with
the ubiquitous `git clone URL`.

Does github tell people to use git-subrepo to clone a given repository? Hell,
does github even tell people to add `--recursive` when a repository uses
submodules? I haven't checked, but don't recall ever seeing it do so in the
past.

These are not difficult technical issues, it's just the sad state of the
submodules UX in git. I presume it will improve eventually.

~~~
keithnz
with git-subrepo, you just clone as normal, you don't even need git-subrepo
installed. It just looks like a normal repo. You require 0 prior knowledge.
You can update all the code in all the subrepos as a normal repo. Never ever
knowing it has subrepos. The only people who need to know and have git-subrepo
installed are those that need to sync the subrepos.

~~~
newnewpdro
Thank you for the clarification, sounds like it's worth taking a closer look.

------
speedplane
Git is already too horrendously complex. At this point, any feature whether
worthy or not, has to be weighed against the learning curve required by new
git users (already formidable).

Can't even count the number of times that we brought on a reasonably decent
programmer, wrote decent working code, but that didn't have a clue about git,
which resulted in the work being a mess of commits across multiple branches
and forks.

To make git more powerful, the developers should make it easier to learn, not
add power-user features.

~~~
koolba
I have yet to meet any programmer I’d consider “reasonably decent” that didn’t
easily learn about branching, merging, and rebasing.

On the flip side every single person I’ve met that is either too stupid or too
ignorant to learn the basics of using git also writes terrible code.

~~~
speedplane
Totally disagree. The general flow of fork (sometimes?), branch, commit,
merge, rebase, and/or squash is ridiculous. Eventually you'll also need to
learn stashes, tags, remotes, merge requests, .gitignore, and who knows what.
Most engineers just want to get their code into the main working repository. I
use git every day, but it comes with so much baggage of vocabulary and ways
things can go wrong. I get that git is powerful, it's just far too powerful
for the vast majority of projects.

> every single person I’ve met that is either too stupid or too ignorant to
> learn the basics of using git also writes terrible code.

With git, a beginner programmer struggling to bang out some code now has to
learn a whole additional system, just to save and share their code. Now,
instead of teaching them on the language/product, you have to spend your time
teaching them git. Of course experts know how to use git, but everyone else
has to spend a week-plus learning a system that's not directly related to
their job responsibilities.

It's somewhat similar to lawyer learning Microsoft word, an accountant with
Excel, or teaching a draftsman how to hold a pencil... sure the good ones know
how to do this already, but people would move so much faster if they could
jump into things faster.

~~~
chiefalchemist
> "It's somewhat similar to lawyer learning Microsoft word, an accountant with
> Excel, or teaching a draftsman how to hold a pencil... sure the good ones
> know how to do this already, but people would move so much faster if they
> could jump into things faster."

Agree. However, the difference being __all__ those (each) have a universal
defacto UI. Even when you write code, you see the result.

On the other hand, (CL-based) git requires you keep a bunch of extra stuff in
your head. You have to know why questions (read: commands) to ask in order to
"see" what's going on, etc.

We live in a UI / UX world. In that context (CL-based) git feels like a fax
machine. Couldn't / shouldn't there be something better? If disruption is such
a great thing, why does git get a free pass with "this is how we've always
done it"?

The irony baffles me.

~~~
s_kilk
> shouldn't there be something better?

Mercurial?

> why does git get a free pass with "this is how we've always done it"?

I'd argue that the git monoculture is a much more recent phenomenon than we
remember, and driven largely by the hegemonic status of GitHub in pop-dev
culture. Now the possibility of using varied tools seems remote because
everyone wants to be in the same place as everyone else.

Even the way we talk about teaching and onboarding new devs is couched in
terms of GitHub, and GitHub alone.

~~~
beevai142
> Mercurial?

It's not.

The basic feature set is pretty much the same, and requires similar effort to
grok. Some things are much harder to do in Mercurial, and some things are just
confusing (multiple heads, pushing branches, multiple branching models, the
"tip", etc.). Online platforms are worse (Bitbucket vs Github) in practice.

------
sligor
The best alternative to submodules and subtrees I found is git-subrepo:
[https://github.com/ingydotnet/git-
subrepo#benefits](https://github.com/ingydotnet/git-subrepo#benefits)

------
rwmj
I finally found a use case for git submodules: We have a git repo which
provides a wrapper around compiling and packaging the Linux kernel for RISC-V,
and we use a submodule to link to a specific release of Linux.
([https://github.com/rwmjones/fedora-riscv-
kernel](https://github.com/rwmjones/fedora-riscv-kernel))

However this also reveals the awkwardness of submodules:

* Everyone who clones the git repo either forgets or doesn't know that they have to do 'git submodule init' (and maybe update too? even I'm forgetting ...). So everyone asks us on IRC why it "doesn't work" and has to be told what to do. IMO git clone should also clone the submodules and set things up.

* You cannot add downstream patches to Linux this way. For a while we needed a small bug fix which wasn't in the Linux submodule (it's controlled by another company) so we had an awkward workaround in the build system to copy the whole submodule and apply the patch before building.

* The link is to a non-fast-forward branch of Linux, so the commit hash sometimes becomes invalid, which we don't notice until someone is trying to build it from scratch. You'd think that because both the module and submodule are hosted on github, that github wouldn't garbage collect the old commit hash, but that's apparently not how it works.

I don't know if there's some tool which solves this use case better, but git
submodules are what we have.

~~~
Arnavion
>they have to do 'git submodule init' (and maybe update too? even I'm
forgetting ...)

`git submodule update --init --recursive` to combine init and update and also
recursively.

------
SOLAR_FIELDS
submodules always seem good in theory, but fall apart anytime they need to be
updated on a regular basis as part of a dependency ecosystem. We tend to use
them as shorter term solutions when we have something we know we want to be an
“official artifact” eventually but don’t want to spend the overhead setting up
a build/publish for that library.

In the sense of working with git submodules, it can be quite painful
sometimes, as most HN readers here know. The process of installing and
updating them, in my experience, does not follow the principle of least
astonishment. When dealing with a project using multiple submodules and
switching between different branches, it’s difficult to get the right
combination of submodule dependencies on the first try.

submodules are a tool like any other and should be used as such. For us that
usually means that we want to make a semantic separation between two different
projects but in reality that only one project uses another for awhile and so
it makes sense to have the less overhead of the submodule dependency vs “real”
dependency.

~~~
haolez
Submodules simply ask you to explicitly set the revision of the module that
you want to work with, especially when you need to upgrade them. That seems
reasonable.

------
swsieber
We have a situation at work where for security concerns, we want to grant a
set of employees access to only a subset of a monolithic git repo. We
investigated using subtrees, but it requires a little more manual intervention
than we'd like.

So now I'm working on a project to write a single binary that can installed as
a server-side git hook which would publish a subdirectory as its own repo, and
then syncs commits and branches between the two.

Of course, it's not perfect. It won't be able to handle signed commits or
tags. And the repository have to be on the same box (it uses symlinks, but
that could in theory be worked around at the cost of a more complex setup
script )

But it should be completely transparent to the repository users.

~~~
perfmode
Check out copybara

~~~
swsieber
Oh, that does look cool. I can see how that would be pretty useful, but it
does look like it's a manual process that needs to be triggered, and not
automatic. Still, it looks to be more widely applicable than what I'm
building.

------
keithnz
I use [https://github.com/ingydotnet/git-
subrepo](https://github.com/ingydotnet/git-subrepo) I find that much nicer

means for most people using the repo they don't have to worry about the other
repos.

------
Myrmornis
My experience is that submodules are useful and unproblematic. cd into the
submodule and treat it as a normal git repo is the easiest way. Then when
you’re done, cd .. and commit the new submodule commit hash.

------
carapace
Git submodules is just some hack apenwarr banged together one day. It is what
it is.

Same with Pip (the Python package installer thingy) BTW: it's just a hack
somebody put together to scratch an itch.

~~~
icebraining
I think that's git subtrees. Unless he did both?

~~~
carapace
D'oh! You're right. My bad.

It's too late to delete the comment. Eternal proof of my pitiful human
fallibility.

