Hacker News new | past | comments | ask | show | jobs | submit login
Why your company shouldn't use Git submodules (codingkilledthecat.wordpress.com)
64 points by aiiane on April 29, 2012 | hide | past | favorite | 45 comments



There are serious problems with git submodules. This article, however, is simply concentrating on "you can forget to do X or not understand Y, in which case you can cause yourself minor irritations", which is just silly: if you understand how to use submodules all of the problems in this article go away and get replaced with more serious issues like "the submodule update mechanism doesn't get rid of obsolete submodules", "submodules can only exist in the root folder", "the mechanism for migrating between different upstream sources of a submodule (which will happen: this is a distributed version control system) require coordination with people using the code", and "for many people, who are attempting to use this in a context of a unified company, the lack of a solution to moving code back/forth between submodules makes moving to git a major step down from Subversion, where the subtree repository support made it entirely reasonable to store an entire organization's worth off projects, with binary art assets, together in a single master repository".


git submodule add <remote> <subpath/into/the/project> will create a submodule in a subfolder.


Ah, touche: I forgot that that one was just an issue with the tool and not a fundamental limitation. (Sadly, the same kind of "that isn't a real problem: just a 'you don't know enough'" that I'm complaining about from the article ;P.)


I think that the issue is that <subpath/into/the/project> always has to be relative to the root directory. You can't do something like:

  % cd subpath
  % git submodule add <remote> <into/the/project>


  > a major step down from Subversion, where the subtree
  > repository support
You could always checkout git-subtree. The functionality is there under the hood in git (git-subtree is literally a shell script, though not a minor one).


I've said this before but I'll say it again. Stay the hell away from Google's Repo tool. It's a half-baked badly maintained piece of ad-hoc software. It will completely destroy your git workflow. You'll also be married to the crappy review tool called Gerrit.

Repo was made prior to Git submodules to do the exact same thing for Android. Now that Git has submodule support, Repo is useless. It does pretty much the same thing as submodules, but it does it in a very crappy way.

For example, you cannot go back to a specific set of versions of subdirectories with Repo. In other words there's no "global" git bisect as there would be with submodules. This stops you from automatically finding a problem in one of the submodule repos if there are dependencies betweeen the problematic repo and other repositories.


Repo seems to work the way it does on purpose:

"...the reason we made repo was because we didn't want to deal with commits in the super project, or trying to merge concurrent branches in the super project. Instead we wanted each subproject to use a floating branch as the revision it is tracking."

https://groups.google.com/d/msg/repo-discuss/ZpqOOE5mLXo/Sw0...


If you ask me, it was the wrong choice. I work on a huge project that's hosted on Repo (full Android OS + big proprietary driver tree), and the lack of this feature has really cost me lots of time doing manual labor that could be automated.

Another decision the Repo designers made that I disagree with is the silly Change-Id added to every commit message(!). Since when do globally unique identifiers (GUIDs) solve problems instead of creating more?


That's an artifact of Gerrit, not Repo.


Well, since the maintainer of repo wants to kill it off, it is perhaps not surprising that it is badly maintained at the moment.


Do you have a source for that? If this information is true, I could use it to help convince our management at work that we should move off Repo and into Git submodules.


Not on paper but Shawn Pearce (Gerrit/repo maintainer) said it at GitTogether 2011. But wanting to kill something off and actually doing it, they are of course different.

Also, if you do use Gerrit (even though you don't seem to like it) it has some better support for submodules from 2.3 onwards. See https://gerrit-review.googlesource.com/Documentation/user-su...


Git's submodule feature definitely has rough edges, however, I think the benefits outweigh the cons.

One of the best benefits I see is that submodules make embedding forks more manageble. For example, when you include an open source library in your own project, it's common that you'd want to modify the library in some way. If you commit the modified library into your own project's repository, you'll have a harder time absorbing bug fixes/features from upstream later on. Instead of a simple merge, you'd have to check out the updated library somewhere else and use a diff tool to compare the changes. And, if the library changed much, you may need to find the specific commit that your modification were based off so you can understand how to rebase your modifications.

In addition, submodules makes it easier to: contribute bug fixes/patches, share modified open source library across different projects, and identify bugs introduced in updated submodules (since the history is preserved).


If you want to make changes to the upstream, I suggest adding an extra mirror in between the upstream and the repo you want to add the submodule in. That way, you can maintain your version without worrying about how your other code is affected. Only after it is tested do you then merge it into your other repo.


The projects I add as submodules are all hosted on Github. For those that require modification, I would fork (mirror) the project, commit the changes to my fork, and add the fork as a submodule to the parent project I'm working on. So, yes, I do add an extra mirror in between but it's really because I lack contributor permission to the main repository.

Also, if repository A and B reference repository C as a submodule and I update repository C for A, B would not be affected since submodule references are just commit IDs. There's no need to create two additional mirrors (C-for-A and C-for-B) if that's what you meant.


So "if you forgot to run git submodule update, you’ve just reverted any submodule commits the branch you merged in might have made"... Yes, if you forget to type the correct commands then undesired effects will happen. Now that's a git design flaw? Give me a break.


So you'd also say that a car that explodes if you forget to press the "don't explode" button after refuelling is a perfectly good design?

Even if you think you're perfect and never make mistakes, it's not a good idea to add easy opportunities to make mistakes.


Unlike an exploding car, Git has undo. See git reflog.


It appears that one of the solutions he recommends, git-subtree, is going to be merged into git soon:

http://git.661346.n2.nabble.com/git-subtree-Next-Round-Ready...

This is excellent news. I've been using git subtree for a couple of years now without incident, and highly recommend it.




she.


having a controlled state for foreign repos is absolutely essential.

having worked with all-in-one repos, where external stuff is thown in... then rots... submodules are a better way, making keeping external code up to date. simple yet controlled.


Yeah, I agree, submodules prevent the copy and paste rot that can happen when you copy library code directly into your repo. We've been using submodules for most of our projects, and I don't think they are that bad as long as you are just a little extra careful with managing them.


Very good article. The question is, will it ever be fixed?

As it stands submodules don't even work sanely for the most trivial use-case of tracking a (slow changing) vendor-repo.

Whoever designed this (Linus?) had a real brainfart here.


> As it stands submodules don't even work sanely for the most trivial use-case of tracking a (slow changing) vendor-repo.

That is not the use case it was designed for. And submodules still work just fine for that. The fact that you have to do a separate submodule update does not make it broken. In fact, it makes it better when you start thinking about all the corner cases there are.

If you truly want to build a software out of subrepositories, you need to have a sane way of tracking working sets of revisions of all the submodules there are. And that is what git submodules does.

If you were blindly tracking the head of a repository, it would be useful _only_ for the use case you mentioned. Tracking slowly changing external vendor repos. That would be a whole lot less useful.


The problem, as I see it, is that it was designed for a rather complex/obscure use-case.

Most people need only a fraction of that (akin to mercurial subrepo or even svn externals) and don't want to be exposed to all the corner cases and terrible usability.


> The problem, as I see it, is that it was designed for a rather complex/obscure use-case.

Managing software version dependencies is a complex problem and will inevitably have a complex solution, you can't make it any simpler by dumbing it down. If it was simple, there would be even more annoyed people when Git takes a long time checking for new versions in external repositories when doing something unrelated. And finally when the "simple use case" of tracking the remote HEADs and silently updating would give you an incompatible set of dependencies, someone would get furious and write an angry blog post.

What most people are asking here (get rid of "submodule update") is solvable by adding a shell alias. Or with the .gitignore and simple clone method discussed in this thread.

I don't think "git submodule update" is terrible usability. Complaining about is like complaining about having to "git add" before a commit. Removing that feature would make git worse, not better (albeit a little simpler). And as said earlier, if you don't like the "usability" it's a matter of adding a shell or a git alias to do what you want.


I wonder. How difficult can it be to mimic mercurial's subrepos?

http://blog.codekills.net/2011/07/14/nested-repository-handl...


Yes!

That should be built straight into git. The abstraction leaks are no worse than with submodules (it's just making different trade-offs) but the usability is so much more sane, it's not funny.

The importance of having this in core-git can not be overstated. When it's not in core then it's not getting used. Case in point: count the number of projects on github using one of the external training wheels (subtree etc.).

Submodules has proven inadequate for reality (again: count the number of subprojects on github). I wish one of the core-dev's would make a kickstarter for a solution...


I think git subtree seems like the most sane solution here, the ability to split out updates to the subtree back upstream makes them just as useful as submodules without all the pain.


Submodules are pretty annoying if you modify the submodule repo a lot, but a pre-commit hook would help against the "forgot to push in submodule" issue, no?


Or a simple git or shell alias which would do the submodule update when you want it. That would solve most of the sources of complaints in the article and in this thread.

However, most likely you don't want an automatic submodule update because of all the issues there are. It would only be useful for tracking a very stable slow moving external dependency.


Agree with most of the posters, that Git submodules are very hard to work with.

The key to submodules is that you should not update them on a regular basis. A good example is the Gitflow project that uses the shFlags repo as a submodule.

A small gotcha is that you need to use --recursive when cloning the repo, so that you get the submodule cloned as well.


I've also tried and failed several times to use submodules. Live would be a lot easier for my situations if the parent could always point to the head of the child instead of a specific revision.


No it would not be a good idea for the submodules to be always pointing to the HEAD of their repos. It might make sense for syncing with some very stable external projects but that is a very limited use case.

I need to have a consistent set of all the submodules I'm working with. I need to reliably get the exact same versions of all the modules in the big repository. This allows me to do a "git bisect" to search for problems in submodules.

Doing "git submodule update" is not as bad as the OP suggests it is. It makes perfect sense to have it as it is.


I think there's usefulness in being able to specify both things. I like the way it is now for a default, but like you say for some stable branch of another repository, it would be nice if submodules could be updated automatically.


The problem in automatically running "git submodule update" is that it might require network access, which may be down and/or slow. And the latest head might not be compatible with the rest of your software. Since there is no smart way of managing these problems, it's best not to do that implictly.


I think you misunderstand my intent. I'm not saying to run that automatically, I'm saying to point a submodule to the head of a branch of another repository, not a specific commit.


You could specify a directory in .gitignore and create a script that clones/pulls the repository of your choosing into that directory.


This is essentially what I do. A seperate target in my build script fetches external libraries with simple git clone/checkout commands just as if you were doing it from the shell. I normally have it check out a particular commit but if I wanted to check out the HEAD it would just be a simple modification of the command. Whatever your intentions regarding external dependencies, better to have them stated explicitly in a script that sits alongside your source files than lost in the metadata of your version control system, I say. Mind you, my requirements are probably quite simple compared to some people here, and I've gone through a lot of aggro in the past to settle on a comfortable build system (based on Fabricate, http://code.google.com/p/fabricate/) that I use wherever possible; when I was still using Make I didn't ever enjoy editing those fragile makefiles.


> No it would not be a good idea ...

Did you notice I specifically said "my situations"? I wasn't talking about your situations, and it appears that the existing mechanism suits your needs. It does not suit mine and has been so problematic that I had to abandon use of submodules.


Absolutely. I certainly understand Why it works the way it does. Especially when dealing with 3rd party modules, it's good to be explicit about which version of a submodule should be included. But having a setting that allows me to keep a submodule at HEAD at all times would certainly help me save quite a few keystrokes for my own internal libraries - most of which I quite regularly find myself crawling up the tree to add, commit, and push.


gitslave probably is closer to what you want


The problem with all wrappers and extensions is that you're no longer using vanilla Git, which probably breaks other tools like GUI clients. Plus the wrappers and extensions come with their own caveats, so it's still perfectly possible to get the repo into some broken state, only this time your scenario is even more esoteric. I so wish submodules got a better treatment in the vanilla Git.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: