Hacker News new | past | comments | ask | show | jobs | submit login
Git Subtree merged into mainline git (github.com/gitster)
113 points by moonboots on May 4, 2012 | hide | past | favorite | 38 comments



If you want to know what the git subtree command is, see http://apenwarr.ca/log/?m=200904#30


Additionally useful: Text file explaining the command and usage (from the commit): https://github.com/gitster/git/blob/634392b26275fe5436c0ea13...

Examples are at the bottom.


Way more useful than the linked article. Many thanks.


Subtree > Submodule in so many ways. Git submodule is a mess and now a legacy of git.

I've been been using subtree in a fork of Git personally but can't (safely) use it at work with everyone else (still use submodules there).

This is going to be a problem for github for reasons I put in my blog a few months back: http://zbowling.github.com/blog/2011/11/25/github/

Because Github has their explicit, top down, a-fork-is-only-a-fork-by-clicking-the-fork-button kind of graph between projects, using subtree won't work easily with their online tools to do pull-requests and see the network graph of forks.

Basically my repo is going to have the histories of 7 different projects combined in one repo and from that single repo I will be pushing back changes to all 7 (and others versions of those 7). Github's pull request feature is going to have trouble with that concept because they make the invalid assumption of a single upstream.

There are work arounds for sure like pushing your changes to a staging repo before finally doing a pull-request upstream but that is cumbersome. I'll probably write a shell script to automate it.


> Git submodule is a mess and now a legacy of git.

I hear this kind of thing all the time, but I don't understand why. I've never had any problem whatsoever with git submodule. Can you elaborate a bit on what you mean by this?


Off the top of my head:

Ever tried removing a submodule? Why is there no git submodule rm? (No, "it's confusing" is not a valid argument.)

If the pull fails (say, from the wrong protocol), you end up with a half-baked submodule. Impossible to move, impossible to remove, impossible to update unless you dive into the config.

Misspell the folder you wanted the submodule in? That sucks. Go delete it from two separate locations, and try again!


If that is the worst problem -- one that is easy to solve and easy to repeat the few times you need it -- then the more love for submodules. I'll take that over any other available solution any day (I don't count subtrees as 'available' yet).


Yes these problems suck.

Sometimes I have deleted my entire repository and cloned it again to update the submodules just because I changed the .gitmodules file.

Note: This actually takes less commands than messing around with .git/config.


Speaking of git submodule rm, here's a basic script I use for that [1], based on Stackoverflow's answer [2].

1. https://github.com/byrongibson/scripts/blob/master/git-rm-su...

2. http://stackoverflow.com/questions/1260748/how-do-i-remove-a...


Ever had an upstream gitsubmodule move or shut down? I can't go back 6 months in my SCM and rebuild exactly what I had because the repo is gone now. This is bad and against the concept of having an SCM in the first place.

There is a horrible habit of people forking projects on github just so their submodule stay stable. It's broken.


  > There is a horrible habit of people forking
  > projects on github just so their submodule
  > stay stable. It's broken.
People are forking projects on github in that way because it's a quick, easy, cheap way to create a mirror/hosted copy of the repository they want to use.

I don't see how this is an example of how git-submodule is broken. If you want to use someone else's code, that is under someone else's control in your repository without creating your own backup of said code, then you're the one taking the risk by not creating said backup.

If the git-submodule is mission-critical to you then you should either:

1. Always keep a separate mirror of the 3rd party repo.

or

2. Mirror the 3rd party repo to a hosted location, so that your submodule can point to the mirror instead of the source (basically creating a caching layer under your control).

This is no different than the guy that keeps all of his email in Gmail, then complains because Google shut down his account that 'email is broken' because it's possible for this to happen.


Sure. Why I explained the work around. Subtree includes the full history of the other project without work-arounds


subtree also includes the full history of the other project in your project's history unless you squash all of the commits (and if you squash them, then you haven't stored the full history). There might also be licensing reasons for wanting to keep the repositories separate, but not.


Which is why you should only use submodules with repositories under your control. At my work we happily use git submodules. Much better than svn externals and much better than any alternative that existed so far.


>There is a horrible habit of people forking projects on github just so their submodule stay stable. It's broken.

Yes I do this and I hate doing it!


It's been merged into the contrib/ directory, so it's not installed by default. Saying it's merged into the mainline has the connotation that it's available like any other tool, it isn't.


So this looks pretty awesome. I can see myself using it a lot. There is one workflow that I do with submodules, that i don't see how to do in subtrees:

clone a repo to the target dir. Add it as a submodule. Decide to play with a branch of the submodule, so switch to that branch in the cloned repo. Then if i decide to work with that, update my submodules, otherwise switch back.

Is that sort of workflow available in subtrees? How do I do it?


I've never used git in-the-large, which may be why I don't understand:

Why must some other library/module be part of my project? Why not reference and maintain the external lib/mod externally? We've been doing that for decades. It seems a solved problem.

Bringing an external library within the fold of your project feels like unnecessary coupling.


Dependency control it is a solved problem. Which is why SVN has externals, Hg has hgsubs, and Git has submodules and now subtrees. This isn't a git-in-the-large problem, it's an almost-any-nontrivial-project problem. It only seems unnecessary 'till you've had it and tried to live without it.

Bringing other libraries into my project is beneficial because then my build system is able to wrangle those just as easily as my code. (And it's also simply useful in the case where I've written both modules but maintain a separation for whatever reason--one is an open-source project and the other is not, whatever--to be able to make changes to one from within the other, run the tests for the submodule, and push it up to staging or upstream, without having to leave my current project.)


Sometimes when I'm at work I forget that not all languages have maven.


I really like maven (and run my own repo for my own projects), but it has its own problems in the latter case I described.


Yes exactly.

After you have divided your project into independent modules you have agreed that the changes in these modules are going to have minimal impact on each other, then what exactly is the point of merging the history of all those changes?.In my use case that would actually create a bigger mess.

Now I think this could perhaps be useful if there are modules that I have forked from elsewhere and the fork is going to be used in my project only.Although even then I dont see any downside of using submodules.

The downside of submodules is lack of good commands .For example a command to check out a different branch of each of my submodules - the branch which is used in this project.This could be done using the -for-each tag but its not trivial.

EDIT:In this thread zbowling makes a great argument against submodules.

There is a horrible habit of people forking projects on github just so their submodule stay stable


For instance because you have componentized your product into different repositories, use different combinations of the components in different projects, but are still actively developing many of them simultaneously on each of those projects. Having to build, deploy and fetch gems (or whatever other method of distribution your language uses) is cumbersome if development is still really active.

I don't think you should ever want to bring unreliable external components, like a random github project that you aren't actively contributing to, into your tree like that.


"For instance because you have componentized your product into different repositories, use different combinations of the components in different projects, but are still actively developing many of them simultaneously on each of those projects."

Funny, I thought this was the beginning of an argument agreeing with me. :)


One example would be:

  * https://github.com/altercation/solarized

  * https://github.com/altercation/vim-colors-solarized
The vim-colors-solarized repo is a subtree of the solarized repo. This is mostly a convenience for vim users that use something like pathogen + git-submodule to keep their plugins up-to-date. This way you can create a submodule @ ~/.vim/bundle/vim-colors-solarized and it would be the root of the bundle tree. If the vim colorscheme was only part of the larger repo, then users would be forced to create their own repos, or else do something like:

  git submodule .vim/bundle/.vim-colors-solarized
  ln -s .vim-colors-solarized/vim-colors-solarized .vim/bundle/


It's important to not only reference an external project (e.g. library) but also to reference a particular version of that library. (e.g. newer versions could remove deprecated methods which you are using, i.e. which weren't deprecated when you wrote your code.)

Different versions of your code could rely on different versions of the library (e.g. you update your code to a newer version of the library.) So which version of the library you rely on also needs to be version-controlled.


It makes it a little easier to maintain one or more forks of some library. Without submodules, you must have a separate repo for that, and there's no clear connection to the repo of your particular project.

This gets even more fun when you have a second project that needs an incompatible fork of the same library.


Here's a blog post that illustrates how Git Subtree works in a visual way: http://psionides.eu/2010/02/04/sharing-code-between-projects...

So, to avoid duplicate commits, you'd either have to rebase the main branch or squash the commits before adding it back to the subtree (and lose history)?


Maybe I'm stupid, but gitster/git doesn't sound like the mainline for git.


It's the account of Junio C Hamano who is the main maintainer of git


and the changes are now also in git/git


I've been trying out git subtree and I understand it's advantages. But I've wondered if there's a way to use it without adding the merge commits to the timeline when I update the subtree repository. Something similar to what git rebase does. (Maybe this is a dumb question and I'm not using git subtree right) :)


Honestly I hope they actually do what the article title is suggesting and bring subtree into mainline. Congrats on making contrib Avery.


Good explanations of subtree merge:

    http://progit.org/book/ch6-7.html
    http://help.github.com/subtree-merge/


git-subtree is not the same as a subtree merge. The author himself likes to point that out. A subtree merge is a one-time operation. git-subtree, on the other hand, enables you to continue to merge in upstream changes. You can also split the subtree (including it's history) from your repo and make it a standalone repository again.


Thank you for the clarification. Quoting the author:

“[Subtrees] are also not to be confused with using the subtree merge strategy. The main difference is that, besides merging the other project as a subdirectory, you can also extract the entire history of a subdirectory from your project and make it into a standalone project. Unlike the subtree merge strategy you can alternate back and forth between these two operations. If the standalone library gets updated, you can automatically merge the changes into your project; if you update the library inside your project, you can ‘split’ the changes back out again and merge them back into the library project.”

https://github.com/apenwarr/git-subtree/blob/master/git-subt...

I guess I will have to get used to the idea that linking external repositories is harder than Subversion externals :) Which have serious limitations, I know, but at least they are so simple.


Reading the docs now, this looks like excellent work. Thanks to everyone involved =)


I hope to see Android's 'repo' tool improved to (optionally) make use of this.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: