
Why your company shouldn't use Git submodules - aiiane
http://codingkilledthecat.wordpress.com/2012/04/28/why-your-company-shouldnt-use-git-submodules/
======
saurik
There are serious problems with git submodules. This article, however, is
simply concentrating on "you can forget to do X or not understand Y, in which
case you can cause yourself minor irritations", which is just silly: if you
understand how to use submodules all of the problems in this article go away
and get replaced with more serious issues like "the submodule update mechanism
doesn't get rid of obsolete submodules", "submodules can only exist in the
root folder", "the mechanism for migrating between different upstream sources
of a submodule (which will happen: this is a distributed version control
system) require coordination with people using the code", and "for many
people, who are attempting to use this in a context of a unified company, the
lack of a solution to moving code back/forth _between_ submodules makes moving
to git a major step down from Subversion, where the subtree repository support
made it entirely reasonable to store an entire organization's worth off
projects, with binary art assets, together in a single master repository".

~~~
ryanpetrich
git submodule add <remote> <subpath/into/the/project> will create a submodule
in a subfolder.

~~~
saurik
Ah, touche: I forgot that that one was just an issue with the tool and not a
fundamental limitation. (Sadly, the same kind of "that isn't a real problem:
just a 'you don't know enough'" that I'm complaining about from the article
;P.)

------
exDM69
I've said this before but I'll say it again. Stay the hell away from Google's
Repo tool. It's a half-baked badly maintained piece of ad-hoc software. It
will completely destroy your git workflow. You'll also be married to the
crappy review tool called Gerrit.

Repo was made prior to Git submodules to do the exact same thing for Android.
Now that Git has submodule support, Repo is useless. It does pretty much the
same thing as submodules, but it does it in a very crappy way.

For example, you cannot go back to a specific set of versions of
subdirectories with Repo. In other words there's no "global" git bisect as
there would be with submodules. This stops you from automatically finding a
problem in one of the submodule repos if there are dependencies betweeen the
problematic repo and other repositories.

~~~
valley_guy_12
Repo seems to work the way it does on purpose:

"...the reason we made repo was because we didn't want to deal with commits in
the super project, or trying to merge concurrent branches in the super
project. Instead we wanted each subproject to use a floating branch as the
revision it is tracking."

[https://groups.google.com/d/msg/repo-
discuss/ZpqOOE5mLXo/Sw0...](https://groups.google.com/d/msg/repo-
discuss/ZpqOOE5mLXo/Sw0NHh8VibYJ)

~~~
exDM69
If you ask me, it was the wrong choice. I work on a huge project that's hosted
on Repo (full Android OS + big proprietary driver tree), and the lack of this
feature has really cost me lots of time doing manual labor that could be
automated.

Another decision the Repo designers made that I disagree with is the silly
Change-Id added to every commit message(!). Since when do globally unique
identifiers (GUIDs) solve problems instead of creating more?

~~~
duskwuff
That's an artifact of Gerrit, not Repo.

------
buddydvd
Git's submodule feature definitely has rough edges, however, I think the
benefits outweigh the cons.

One of the best benefits I see is that submodules make embedding forks more
manageble. For example, when you include an open source library in your own
project, it's common that you'd want to modify the library in some way. If you
commit the modified library into your own project's repository, you'll have a
harder time absorbing bug fixes/features from upstream later on. Instead of a
simple merge, you'd have to check out the updated library somewhere else and
use a diff tool to compare the changes. And, if the library changed much, you
may need to find the specific commit that your modification were based off so
you can understand how to rebase your modifications.

In addition, submodules makes it easier to: contribute bug fixes/patches,
share modified open source library across different projects, and identify
bugs introduced in updated submodules (since the history is preserved).

~~~
alexchamberlain
If you want to make changes to the upstream, I suggest adding an extra mirror
in between the upstream and the repo you want to add the submodule in. That
way, you can maintain your version without worrying about how your other code
is affected. Only after it is tested do you then merge it into your other
repo.

~~~
buddydvd
The projects I add as submodules are all hosted on Github. For those that
require modification, I would fork (mirror) the project, commit the changes to
my fork, and add the fork as a submodule to the parent project I'm working on.
So, yes, I do add an extra mirror in between but it's really because I lack
contributor permission to the main repository.

Also, if repository A and B reference repository C as a submodule and I update
repository C for A, B would not be affected since submodule references are
just commit IDs. There's no need to create two additional mirrors (C-for-A and
C-for-B) if that's what you meant.

------
aoprisan
So "if you forgot to run git submodule update, you’ve just reverted any
submodule commits the branch you merged in might have made"... Yes, if you
forget to type the correct commands then undesired effects will happen. Now
that's a git design flaw? Give me a break.

~~~
brazzy
So you'd also say that a car that explodes if you forget to press the "don't
explode" button after refuelling is a perfectly good design?

Even if you think you're perfect and never make mistakes, it's not a good idea
to add easy opportunities to make mistakes.

~~~
exDM69
Unlike an exploding car, Git has undo. See git reflog.

------
bryanlarsen
It appears that one of the solutions he recommends, git-subtree, is going to
be merged into git soon:

[http://git.661346.n2.nabble.com/git-subtree-Next-Round-
Ready...](http://git.661346.n2.nabble.com/git-subtree-Next-Round-Ready-
td7404309.html)

This is excellent news. I've been using git subtree for a couple of years now
without incident, and highly recommend it.

~~~
harshreality
Recent discussions about submodules and the subtree script on the mailing
list:

[http://thread.gmane.org/gmane.comp.version-
control.git/19648...](http://thread.gmane.org/gmane.comp.version-
control.git/196484)

[http://thread.gmane.org/gmane.comp.version-
control.git/19560...](http://thread.gmane.org/gmane.comp.version-
control.git/195604)

------
irrationalidiom
having a controlled state for foreign repos is absolutely essential.

having worked with all-in-one repos, where external stuff is thown in... then
rots... submodules are a better way, making keeping external code up to date.
simple yet controlled.

~~~
andywhite37
Yeah, I agree, submodules prevent the copy and paste rot that can happen when
you copy library code directly into your repo. We've been using submodules for
most of our projects, and I don't think they are that bad as long as you are
just a little extra careful with managing them.

------
moe
Very good article. The question is, will it ever be fixed?

As it stands submodules don't even work sanely for the most trivial use-case
of tracking a (slow changing) vendor-repo.

Whoever designed this (Linus?) had a real brainfart here.

~~~
exDM69
> As it stands submodules don't even work sanely for the most trivial use-case
> of tracking a (slow changing) vendor-repo.

That is not the use case it was designed for. And submodules still work just
fine for that. The fact that you have to do a separate submodule update does
not make it broken. In fact, it makes it better when you start thinking about
all the corner cases there are.

If you truly want to build a software out of subrepositories, you need to have
a sane way of tracking working sets of revisions of all the submodules there
are. And that is what git submodules does.

If you were blindly tracking the head of a repository, it would be useful
_only_ for the use case you mentioned. Tracking slowly changing external
vendor repos. That would be a whole lot less useful.

~~~
moe
The problem, as I see it, is that it was designed for a rather complex/obscure
use-case.

Most people need only a fraction of that (akin to mercurial subrepo or even
svn externals) and don't want to be exposed to all the corner cases and
terrible usability.

~~~
exDM69
> The problem, as I see it, is that it was designed for a rather
> complex/obscure use-case.

Managing software version dependencies is a complex problem and will
inevitably have a complex solution, you can't make it any simpler by dumbing
it down. If it was simple, there would be even more annoyed people when Git
takes a long time checking for new versions in external repositories when
doing something unrelated. And finally when the "simple use case" of tracking
the remote HEADs and silently updating would give you an incompatible set of
dependencies, someone would get furious and write an angry blog post.

What most people are asking here (get rid of "submodule update") is solvable
by adding a shell alias. Or with the .gitignore and simple clone method
discussed in this thread.

I don't think "git submodule update" is terrible usability. Complaining about
is like complaining about having to "git add" before a commit. Removing that
feature would make git worse, not better (albeit a little simpler). And as
said earlier, if you don't like the "usability" it's a matter of adding a
shell or a git alias to do what you want.

------
etherealG
I think git subtree seems like the most sane solution here, the ability to
split out updates to the subtree back upstream makes them just as useful as
submodules without all the pain.

------
jacobr
Submodules are pretty annoying if you modify the submodule repo a lot, but a
pre-commit hook would help against the "forgot to push in submodule" issue,
no?

~~~
exDM69
Or a simple git or shell alias which would do the submodule update when you
want it. That would solve most of the sources of complaints in the article and
in this thread.

However, most likely you don't want an automatic submodule update because of
all the issues there are. It would only be useful for tracking a very stable
slow moving external dependency.

------
sharken
Agree with most of the posters, that Git submodules are very hard to work
with.

The key to submodules is that you should not update them on a regular basis. A
good example is the Gitflow project that uses the shFlags repo as a submodule.

A small gotcha is that you need to use --recursive when cloning the repo, so
that you get the submodule cloned as well.

------
rogerbinns
I've also tried and failed several times to use submodules. Live would be a
lot easier for my situations if the parent could always point to the head of
the child instead of a specific revision.

~~~
exDM69
No it would not be a good idea for the submodules to be always pointing to the
HEAD of their repos. It might make sense for syncing with some very stable
external projects but that is a very limited use case.

I need to have a consistent set of all the submodules I'm working with. I need
to reliably get the exact same versions of all the modules in the big
repository. This allows me to do a "git bisect" to search for problems in
submodules.

Doing "git submodule update" is not as bad as the OP suggests it is. It makes
perfect sense to have it as it is.

~~~
etherealG
I think there's usefulness in being able to specify both things. I like the
way it is now for a default, but like you say for some stable branch of
another repository, it would be nice if submodules could be updated
automatically.

~~~
exDM69
The problem in automatically running "git submodule update" is that it might
require network access, which may be down and/or slow. And the latest head
might not be compatible with the rest of your software. Since there is no
smart way of managing these problems, it's best not to do that implictly.

~~~
etherealG
I think you misunderstand my intent. I'm not saying to run that automatically,
I'm saying to point a submodule to the head of a branch of another repository,
not a specific commit.

------
cpt1138
gitslave probably is closer to what you want

~~~
zoul
The problem with all wrappers and extensions is that you're no longer using
vanilla Git, which probably breaks other tools like GUI clients. Plus the
wrappers and extensions come with their own caveats, so it's still perfectly
possible to get the repo into some broken state, only this time your scenario
is even more esoteric. I _so_ wish submodules got a better treatment in the
vanilla Git.

