I kind of think of the relationship as a pointer into what should be a entirely separately managed project. Changes to that project should happen in that project and only then should the "pointer" be modified. I usually go as far as checking out the submodule separately (to my "projects" directory) when I need to work on it, which of course entirely unnecessary, but for me helps keep that separation.
Another approach I've found that helps to combat issues some people have with them is to not have them littered throughout a project but have a very clear delineation between what are submodule and "root" project git managed files - ie. an appropriately named, usually top level, directory.
In practice I find them an extremely useful way of getting stuff done effectively and efficiently!
Your hacks are certainly worth exploring. But they are also solutions to why other people get pickled. Good for you. Not so for them.
I'd be curious to know if you use the CL or a UI-based git tool? It would seem to me - once you get to the necessity / complexity of something that entails submodules - seeing would be believing. Trying to juggle a detailed picture in your head, __and write good code__, __and get your git commands right__ is not exactly a recipe for success.
In many ways git is a great tool. But in the ways it is not expose its Achilles heel as we edge up on the second decade of the 21st century.
p.s. thx for sharing. I'm going to see about using your approaches.
On the surface, the commands are fairly verbose. And (as the article pointed out) the documentation on it could use some love.
However the biggest problem I have encountered while using it, is that what code I actually have checked out is opaque without doing the song and dance of actually checking out each submodule & inspecting the dir. This sucks when evaluating the codebase without cloning e.g. in code review on GitLab/GitHub. And the commands/UX for updating them is just painful and error prone.
Really, the difference between putting something in a git submodule and putting something in a private npm/maven/etc. repo is that I can look at a file and read:
Whereas if I look at a repo with a submodule, I just see a URL and a SHA. And while actually I think that this is a better model for keeping track of versions, it's terrible from a UX perspective.
There's also a whole host of tooling around your languages chosen artifact/package/dependency management that git submodules don't have yet. It's often supported by the same tools we use for compiling and other task running. Git submodules require another thing that lives outside of that ecosystem.
I kind of wish that a mainstream language would just adopt git submodules as part of their de facto package management strategy and build the tools on top of it we need to make it livable.
Every language has a package management system already. Git should be made easier to use, not harder. I see the utility of submodules to power users, but they're not so necessary at all, but increase the complexity of git by adding yet another abstract concept.
Love you git, but you should be working on removing features, not adding them.
It also sucks once you start building projects that are multi-language/ecosystem. It's why so many front-end technologies have just sucked it up and adopted NPM, because it has the most market share and continuing to bifurcate the package ecosystem is too cumbersome for consumers.
It would be much better to solve it /once/, with a tool that provides the lower level constructs. Then tools could be built on top of that to serve specific needs & improve the UX.
Maybe git submodules could be that? Whether it could or not, though, I do think that providing the ability to manage the versions of my dependencies in the same tool that I manage the versions of my own project... makes a lot of sense!
I think of them as "Make done right".
Agree. The general concept is sound and needed but the execution / implementation is subpar.
Providing software solutions is difficult enough. Your tools shouldn't add friction.
Git is not a dependency manager, git is not a dependency manager, git is not a dependency manager, git is not a dependency manager.
Seriously, use the package manager provided by your build system. Some of them can point to git repos if you don't have a proper package registry, this is a better solution than plain submodules, for many reasons; Transitive dependencies, diamond dependencies, semantic versionion, can't forget to submodule update --recursive, etc, etc.
Git is about source code. How would you use the package manager to manage source code?
The original author seems to he worried that some part of the system is not being updated to the newest available source before being built. I don't know of any system that would somehow automate that process.
The suggested submodule system might work as long as branch names don't change. But in real life they do change, old branches go out of maintenance. So in the end there must be a human making sure you don't miss anything.
Subversion with svn:external does exactly this by default, and generally you only peg a revision in an external dependency when you tag for release.
Well, one could argue that diamond dependencies are just bad design, transitive dependencies are just submodules of submodules, leading to conclusion that using submodules is like pinning entire dependency tree to a particular commit. So for a given commit there is only one way to download entire tree that is cryptographically secure, without trusting third parties.
I still wouldn't recommend submodules if your build system have packages but it's an interesting idea nonetheless.
Cabal runs a dependency solver to satisfy most dependencies, but for compiler-provided things like `base` it simply checks whether or not the requirement is satisfied by the compiler being used (if not, it bails out).
The next worse thing about submodules is when you add a submodule to a project, if that submodule has submodules of its own it's completely unobvious how to perform the submodule addition recursively. The `git add module` command doesn't recognize `--recursive`. IIRC the way you work around this is via the magic incantation `git submodule update --init --recursive` after adding the submodule having its own submodules.
I really like submodules conceptually but the current UX surrounding their implementation is awful.
But I agree, maybe it should be turned on by default.
A random git user with just the URL for my submodule-using repository isn't going to know to use some special thing to clone the repository. They're going to run `git clone URL` and then be frustrated by the results.
Of course there's a little more than just this wrong with submodules but it seems like given how much hate they get, that someone would be interested in actually fixing them
The problem is the requirement of prior knowledge about the repository's use of submodules. The plain clone won't even report any sort of feedback about the submodules being present and skipped, nor any hint as to how to retrieve them: `git submodule update --init --recursive`.
The presence of git-subrepo and `git clone --recursive` is of little consequence from the perspective of the many users who are now familiar with the ubiquitous `git clone URL`.
Does github tell people to use git-subrepo to clone a given repository? Hell, does github even tell people to add `--recursive` when a repository uses submodules? I haven't checked, but don't recall ever seeing it do so in the past.
These are not difficult technical issues, it's just the sad state of the submodules UX in git. I presume it will improve eventually.
> Questions or comments for the Git community can be sent to the mailing list by using the email address firstname.lastname@example.org. Bug reports for git should be sent to this mailing list.
This also makes it easier for the git devs to focus on actual development and not explain git to people unwilling to use google (the same people are to lazy to use email).
Can't even count the number of times that we brought on a reasonably decent programmer, wrote decent working code, but that didn't have a clue about git, which resulted in the work being a mess of commits across multiple branches and forks.
To make git more powerful, the developers should make it easier to learn, not add power-user features.
On the flip side every single person I’ve met that is either too stupid or too ignorant to learn the basics of using git also writes terrible code.
> every single person I’ve met that is either too stupid or too ignorant to learn the basics of using git also writes terrible code.
With git, a beginner programmer struggling to bang out some code now has to learn a whole additional system, just to save and share their code. Now, instead of teaching them on the language/product, you have to spend your time teaching them git. Of course experts know how to use git, but everyone else has to spend a week-plus learning a system that's not directly related to their job responsibilities.
It's somewhat similar to lawyer learning Microsoft word, an accountant with Excel, or teaching a draftsman how to hold a pencil... sure the good ones know how to do this already, but people would move so much faster if they could jump into things faster.
Agree. However, the difference being __all__ those (each) have a universal defacto UI. Even when you write code, you see the result.
On the other hand, (CL-based) git requires you keep a bunch of extra stuff in your head. You have to know why questions (read: commands) to ask in order to "see" what's going on, etc.
We live in a UI / UX world. In that context (CL-based) git feels like a fax machine. Couldn't / shouldn't there be something better? If disruption is such a great thing, why does git get a free pass with "this is how we've always done it"?
The irony baffles me.
> why does git get a free pass with "this is how we've always done it"?
I'd argue that the git monoculture is a much more recent phenomenon than we remember, and driven largely by the hegemonic status of GitHub in pop-dev culture. Now the possibility of using varied tools seems remote because everyone wants to be in the same place as everyone else.
Even the way we talk about teaching and onboarding new devs is couched in terms of GitHub, and GitHub alone.
The basic feature set is pretty much the same, and requires similar effort to grok. Some things are much harder to do in Mercurial, and some things are just confusing (multiple heads, pushing branches, multiple branching models, the "tip", etc.). Online platforms are worse (Bitbucket vs Github) in practice.
However this also reveals the awkwardness of submodules:
* Everyone who clones the git repo either forgets or doesn't know that they have to do 'git submodule init' (and maybe update too? even I'm forgetting ...). So everyone asks us on IRC why it "doesn't work" and has to be told what to do. IMO git clone should also clone the submodules and set things up.
* You cannot add downstream patches to Linux this way. For a while we needed a small bug fix which wasn't in the Linux submodule (it's controlled by another company) so we had an awkward workaround in the build system to copy the whole submodule and apply the patch before building.
* The link is to a non-fast-forward branch of Linux, so the commit hash sometimes becomes invalid, which we don't notice until someone is trying to build it from scratch. You'd think that because both the module and submodule are hosted on github, that github wouldn't garbage collect the old commit hash, but that's apparently not how it works.
I don't know if there's some tool which solves this use case better, but git submodules are what we have.
`git submodule update --init --recursive` to combine init and update and also recursively.
‘git clone --recursive’ Add that to your README’s getting started section. Sounds a bit like you aren’t providing a good onramp for contributors.
> * You cannot add downstream patches to Linux this way...
> * The link is to a non-fast-forward branch of Linux, so the commit hash sometimes becomes invalid... You'd think that because both the module and submodule are hosted on github, that github wouldn't garbage collect the old commit hash, but that's apparently not how it works.
Commits will not get GCed if you have a ref pointing at them. Relying on commits that do not have a ref means they are not part of the official history of that project. You should fork linux and have a branch where you land your custom changes. You can track whatever upstream you want, third party or mainline linux, and rebase your changes on top of them.
I don't understand why you cannot add your own patches though? Just need to push your fork somewhere available and then ref that in your submodule?
In the sense of working with git submodules, it can be quite painful sometimes, as most HN readers here know. The process of installing and updating them, in my experience, does not follow the principle of least astonishment. When dealing with a project using multiple submodules and switching between different branches, it’s difficult to get the right combination of submodule dependencies on the first try.
submodules are a tool like any other and should be used as such. For us that usually means that we want to make a semantic separation between two different projects but in reality that only one project uses another for awhile and so it makes sense to have the less overhead of the submodule dependency vs “real” dependency.
So now I'm working on a project to write a single binary that can installed as a server-side git hook which would publish a subdirectory as its own repo, and then syncs commits and branches between the two.
Of course, it's not perfect. It won't be able to handle signed commits or tags. And the repository have to be on the same box (it uses symlinks, but that could in theory be worked around at the cost of a more complex setup script )
But it should be completely transparent to the repository users.
means for most people using the repo they don't have to worry about the other repos.
Same with Pip (the Python package installer thingy) BTW: it's just a hack somebody put together to scratch an itch.
It's too late to delete the comment. Eternal proof of my pitiful human fallibility.