

Rework git core for native submodules - sdqali
http://thread.gmane.org/gmane.comp.version-control.git/220047

======
drewcrawford
I am not a git maintainer, but as someone interested in improving submodules I
can try to summarize the thread.

Submodules are difficult to use in practice for a wide variety of reasons.
There are serious, complex proposals that have made it into git-contrib to
build a "better" submodule, but for various reasons these have produced
systems that merely make the tradeoffs in a different way that some people
prefer.

This is not like any of those proposals. His problem is that "git add" "git
diff", etc., don't "understand" submodules. It would be as if ls, cd etc.
don't "follow" symlinks, so that you had to navigate to the correct directory
yourself before you can use standard unix tools.

This is a serious problem, but his solution is essentially "we should use
hardlinks instead of symlinks". That is, he wants to take the code that
understands submodules out of the individual tools, and pop them in the
filesystem somewhere where they are "shared" among more of the tools and don't
have to exist in any of them.

There are many objections to this proposal. The chief one seems to be that
this does not seem to directly address any particular problem. I think
Ramkumar perceives that the reason git add/diff/rm don't support submodules is
as a metaproblem "it is too hard to add submodule support to arbitrary tool".
Whereas the git maintainers are saying "It is _possible_ to add submodule
support to arbitrary tool." So that's the initial standoff.

Another problem is that this requires a filesystem change, and that is
essentially the most stable part of git that breaks incompatibility with other
versions. If you read Linus's rants, you know that he generally applies an
enormous amount of scrutiny to breaking compatibility. And so from his desk,
you would need not just one clear benefit, but an overwhelming number of them,
to break the contract like this.

But what I suspect is the True Rejection here is that this will pan out like
all the proposals before it: to be different, but not strictly better, than
the current implementation. To return to the POSIX analogy: we have both
symlinks and hardlinks, and which one is better depends on what you are doing,
there is no "one true link". If you replace all the symlinks with hardlinks, I
think you will run into trouble with the hardlinks too.

Finally, it is unfortunate that the flamewar is about the monolithic patch
rather than about some of the principles that led to the patch. I think
Ramkumar has had (at least) two very good insights: that "git add" and friends
should understand submodules a lot better than they do, and also that they
should have this understanding by way of consuming some API that understands
them rather than incorporating separate code for submodules into every tool.
These strike me as a concrete improvement over the existing system, and I wish
that the energy that leads to huge unusable patches like this could be
redirected into usable ones.

~~~
tinco

        The chief one seems to be that this does not seem to directly address any particular problem.
    

Except that you later say:

    
    
         I think Ramkumar has had (at least) two very good insights: that "git add" and friends should understand submodules a lot better than they do, and also that they should have this understanding by way of consuming some API that understands them rather than incorporating separate code for submodules into every tool.
    

This is exactly the problem this solution solves. Instead of having a weird
configuration file in the working tree for something that should be an
integral part of the repository, there will be a generic system for adding
links. With this generic system in place it is much easier to implement "git
add" and friends support for submodules.

He repeatedly makes this clear but no one reacts to this point.

    
    
        But what I suspect is the True Rejection here is that this will pan out like all the proposals before it: to be different, but not strictly better, than the current implementation.
    

Implementing code in a different but not strictly better way that allows you
to more easily understand and extend your library is called refactoring. This
'True Rejection' is essentially rejecting the merit of refactoring code.

I also don't think that the hardlinks/symlinks analogy holds very well.
Hardlinks and symlinks are both features in their own rights. Having
submodules be defined as a weird file instead of as a part of your
repositories objects is a superficial change, he also states this. Everything
the current submodules do could be achieved using the proposed solution. (As
he repeatedly has to make clear to Linus and Junio)

~~~
drewcrawford
There are a complicated set of problems that are preventing us from
understanding each other. I am going to do my best.

> weird configuration file

One of the disputes here is that the maintainers are of the opinion that
config files are actually _good, on the face of them_. They point to examples
of well-settled uses like .gitignore to claim that config files are The Git
Way.

It may very well be that configuration files are in fact weird, or are weird
in this particular case, but since the convention is and has been for git's
history that config-files-are-good it would require a well-reasoned essay to
move the needle of discourse on this subject, not just to use "they are weird"
as a claim to prove something else.

> This 'True Rejection' is essentially rejecting the merit of refactoring
> code.

I don't want to get into a big meta-meta flamewar here, but there are many
people who _do_ reject the merits of refactoring working code, for some
definitions of "refactor", for some definitions of "working", and this has
been the subject of many popular essays, most notably Spolsky et al. This is
another place where moving the needle of discourse would require writing a
well-reasoned essay that quotes the appropriate authorities, and it is not
sufficient just to appeal to a particular view of the merits of refactoring as
a claim to prove something else.

> Hardlinks and symlinks are both features in their own rights.. [this] is a
> superficial change.

This is another one of those thorny semantic problems that are preventing us
from understanding each other. There is a _sense_ in which it is superficial,
and another sense in which it is a substantial change. If you are using "git
add", or are implementing it, it is a superficial change. If you are writing
subtree-merge or git-submodule or something that really needs to understand
the storage of submodules, it is substantial.

And so they _are_ both features in their own right, in the sense that: git-
add-and-friends will want to access things with a certain pattern, and git-
submodule-and-friends will want to access things in a very different pattern.
This is why I suspect the solution here is to have two distinct APIs, that
access the same underlying storage mechanism. And if it makes sense to
continue to support something very much like the old API, it probably does not
make sense to redesign the FS to look like the new API.

Of course, there is a lot of resistance in the git community to have two ways
to do the same thing. So when I say "I suspect the solution is to have two
APIs" I mean only that it would address most of the objections raised thus
far, not that it would actually be implemented in mainline.

> Everything the current submodules do could be achieved using the proposed
> solution. (As he repeatedly has to make clear to Linus and Junio)

And as Linus and Junio have repeatedly made clear, merely doing everything the
current implementation does is not within a few galaxies of meeting the burden
for breaking FS compatibility. The compatability-break burden is extremely
high.

~~~
tinco
> I am going to do my best.

Great :)

> One of the disputes here is that the maintainers are of the opinion that
> config files are actually good, on the face of them. They point to examples
> of well-settled uses like .gitignore to claim that config files are The Git
> Way.

Yes but .gitignore only configures your git client, the gitsubmodules say
something about the repository instead. If that was the git way, wouldn't
branch names be in a .gitbranches as well?

> I don't want to get into a big meta-meta flamewar here, but there are many
> people who do reject the merits of refactoring

I might be an extremist on this topic, so it's good to just leave it be.

> This is why I suspect the solution here is to have two distinct APIs, that
> access the same underlying storage mechanism.

I agree, but I think Ram. is correct in asserting that both ways could be
achieved by having a link object with some configuration in it. (it could just
be the .gitmodules file moved to the .git directory for all the end users
care)

> The compatability-break burden is extremely high.

I understand, and it should not be taken lightly. But no one was suggesting
this feature would be added to the master and shipped in the next release of
git. It could even be delayed until there is another compatibility breaking
change. Ram. never pretended his current work would be the final way of doing
it.

Thank you for elaborating your understanding of the discussion :)

~~~
drewcrawford
This is one of the nicest disagreements I have ever had. If we don't already,
we should compare notes and find something to work on together, because when
two people can disagree but still understand each other, that is where you
make progress on complex problems. :-)

> Yes but .gitignore only configures your git client, the gitsubmodules say
> something about the repository instead.

This feature is often used to configure the repository, and I in fact use it
that way. By way of example, <https://github.com/new> operates under the
assumption that you use .gitignore to configure a repository. Perhaps it is
best to say that config files offer flexibility in this dimension, whereas a
link file is more rigid.

> It could even be delayed until there is another compatibility breaking
> change.

I believe that perhaps the discussion on the point of backwards
incompatibility has been framed in a way that is nonproductive. Of course,
once one has decided on a course of action, it is proper to consider how to
reduce the impact of that decision. I agree with you that there are a wide
variety of harm reduction strategies available here.

But these inquiries only become relevant once one has decided that the patch
is in general an improvement in some dimensions. As an outside observer, I do
not see an improvement.

I can see the logic that _if_ it is true that git-add-and-friends have omitted
support for submodules _on the basis_ that such support is difficult, this
patch could solve that problem. But I have not been convinced of the premise;
there is no citation of the people who maintain the UI tools making claims of
difficulty. Furthermore, Junio seems to argue at least that add's behavior is
_by design_ , I do not know enough about it to know if that is a sensible
design, but it does suggest to me that the problem with UI tooling is not a
function of implementation difficulty, but there is perhaps some design or
ideological reason for the behavior of these tools that explains the state of
them today.

The other problem that I have is as follows: if I accept the premise that the
trouble with git-add is a matter of implementation difficulty, it seems to me
that the trouble can be resolved at some other tool layer rather than in the
FS proper. So _if_ the hypothesis underlying the patch is correct, it seems to
me that one should adopt the implementation that doesn't break compatibility
over the implementation that does.

It is unfortunate that the matter of backwards compatibility was raised early
and vociferously in the thread, because as you have pointed out there is a lot
that can be done about backwards compatibility that doesn't address the real
merits of whether the idea is good or bad. (Although I can understand why
compatibility would be at the top of any maintainer's mind.) Perhaps this
exchange between Junio and Ram. is an example of two people being far enough
along their own lines of inquiry that they are having trouble making any sense
of one another.

------
niggler
"This is going nowhere. You're stuck at making the current submodule system
work, not answering my questions, diverting conversation, repeatedly asking
the same stupid questions, labelling everything that I say "subjective", and
refusing to look at the objective counterpart (aka, the code). It's clear to
me that no matter how many more emails I write, you're not going to concede.

I'm not interested in wasting any more of my time with this nonsense.

I give up."

[http://thread.gmane.org/gmane.comp.version-
control.git/22051...](http://thread.gmane.org/gmane.comp.version-
control.git/220514/)

~~~
Tobu
Hah, I thought you quoted a maintainer, but this is from the original
submitter. Cheeky.

~~~
tinco
It's not cheeky, it's desparate. He proposes an honest and well thought
through idea that he spent a lot of time on, and someone he looks up to just
behaves like a complete ass. Junio et al. do nothing but thinking against him,
instead of with him. They'd do better just not responding at all.

~~~
jedbrown
I followed the discussion and read it exactly opposite. Ram was putting the
cart in front of the horse ("I really need you to start reviewing the code
now." see replies [1,2]) and everyone else involved in the discussion wanted
to understand the benefits first. Junio was never dismissive of the idea, he
just requested a coherent argument of the benefits so that real issues could
be discussed. It is understood that submodules are not a smooth workflow in
many cases, but Ram's proposed change would be very disruptive and most stated
"benefits" of his design are red herrings.

[1] [http://permalink.gmane.org/gmane.comp.version-
control.git/22...](http://permalink.gmane.org/gmane.comp.version-
control.git/220275) [2] [http://permalink.gmane.org/gmane.comp.version-
control.git/22...](http://permalink.gmane.org/gmane.comp.version-
control.git/220299)

~~~
kelnos
I agree with you on the tone of the conversation, but from my -- admittedly
biased -- view, submodules are an abomination, and any serious proposal to
come up with an alternative should be welcomed with open arms.

Ramkumar may have taken the questions directed at him the wrong way, but IMO
the questioner shares equal fault for that. Know (or learn) your audience, and
tailor your responses so you you achieve a good outcome. "I'm super frustrated
and feel like I've wasted my time so I give up" is not a good outcome, for
_any_ of the parties concerned.

------
artagnon
Who's who, for those of you just joining in:

\- Linus is the original author of Git, and he wrote it in April 2005. He
doesn't contribute anymore, and is rarely seen on the Git mailing list these
days (except when something like this happens). In number of patches, he's #4,
after Junio, Jeff, and Shawn.

\- Junio is the maintainer of the Git project. He took over maintainership of
Git a few months after it was originally built, in July 2005.

\- Jonathan is a very big contributor at #6. He doesn't focus on any one part
of the codebase, and contributes to a wide spectrum.

\- Jens primarily contributes to submodule.c/ git-submodule.sh, the current
submodule implementation. Along with Heiko, he's one of the authorities on the
current submodule system.

\- Ram is a small contributor. He started out in Jan 2010 with two GSoC
projects: one in 2010, and another in 2011 (neither were in submodules).

------
qznc
He wants to unify submodules and subtrees? Sounds fishy to me, since these are
for very completely different use cases.

Submodules are for tying project parts together, where you have control over
all of them. For example, the clang compiler frontent could submodule the LLVM
backend. Both are under the LLVM project, so people usually work on both of
them at the same time. They should not be in the same repo, since LLVM also
has other users unrelated to clang.

Subtrees are for integrating external projects, which are not really under
your control, but you probably want to follow upstream developments. Since a
subtree includes all the repo data, you can cleanly check out, even if the
external origin repository vanishes.

~~~
scribu
`git subtree` seems like the perfect tool to complement `git submodule`.

Too bad it's not enabled by default: <http://engineeredweb.com/blog/how-to-
install-git-subtree/>

~~~
dustingetz
parent comment is out of date; git subtree is part of git since roughly git
1.8.

~~~
Tobu
It's part of git/contrib. Depending on the packaging you still need to enable
it manually.

------
Tobu
I've started the thread on Linus's first reply, and the guy is completely
unconvincing. He was after a quick feature improvement (I don't really know
what but Linus seemed to) and implemented it, but he gave little thought to
the overall design (either his or Git's).

Big meh, and I'm normally interested in the evolution of Git.

~~~
tinco
I don't know man, anyone who uses gitmodules knows that they are a pain and
really unlike anything else in git. This guy had an idea on how to improve it,
introduce a cool new basic object type to git and he even wrote a PoC, I'd say
hats off to that.

Linus makes a rather unconvincing argument against the system, saying the
current system allows for submodules be different for local sites. As if the
proposed system would not support that, and as if the current 'dirty
submodule' system is a better solution. He's being an absolute moron.

And Junio is just being very unproductive, he seems fully incapable of
inducing anything from the design Ramkumar proposes and fails to see
implications that anyone could see, even though he is a core git guy. And
frankly he's being an ass too.

What I see is someone enthousiastically trying to fix a core problem of git in
an ambitious but well constructed way, and a bunch of old guys just bashing
the life out of him.

I think he's better off just not asking Juno or Linus for advice and just keep
on hacking on his fork. I know I would use it.

~~~
snprbob86
> just keep on hacking on his fork. I know I would use it.

Correct me if I'm wrong, but isn't the problem that Linus brought up this: If
you introduce a new object type, you need to get it right. A new object type
would create non-backwards-compatible repositories, so you'd have a new
minimum Git version. If you were to use this fork, then everyone who checks
out your code would have to use it. Also, it would preclude tooling support
(eg GitHub). Once such important repository versioning decisions are made,
they can't be unmade. Git, at it's core, is basically just a well designed
repository model.

~~~
tinco
Yes, precisely :)

------
alexchamberlain
I would like to applaud this guy; he has got insightful and polite answers
from Linus.

~~~
k3n
I noticed that too; after getting my popcorn ready, I could find only mild
technical disagreements.

I'd be honored to have an idea shot down so mercifully by Torvalds.

~~~
alexchamberlain
As would I...

~~~
akkartik
I think both of you are being unfair. Find me an example when some newcomer
submits a patch and gets flamed by Linus. His flames tend to steer clear of
actual code (at least at the start), and of outsiders.

------
drtse4
This thread is a mess... and i'm not sure statements like this one "'git add'
should not go past submodule boundaries. I should not be able to 'git add
clayoven/' or 'git add clayoven/LICENSE'" are a good start. Gives a simplified
description of what he want to do without going too in-depth about why that
path was chosen and starts coding right away.

~~~
tinco
Why would he need to go in-depth about why that path was chosen, isn't it
obvious? The workflow he proposes is miles better than how gitmodules is
working now.

~~~
drtse4
Why? Simply to discuss it and evaluate alternatives that could be better. I'm
referring to the solution he proposed not to the fact that git modules have a
lot of space for improvement. "miles better" considering that we are talking
about git modules it's not really that hard to devise.

------
plorkyeran
Mostly unrelated to the topic, but I'm always amused by things like "teach
ce_compare_gitlink() about OBJ_LINK". I've never seen any other project that
anthropomorphizes the code like that, and I sort of like how it makes the
resulting changelog read.

~~~
davvid
_I've never seen any other project that anthropomorphizes the code like that_

Git's SubmittingPatches document says to use an imperative tone in commit
messages. That's why it reads the way it does.

------
stormbrew
So, I'm curious. In response to Linus' comment that "... .gitmodules was
always a bit of a hack, but it's a _working_ hack ...", does anyone who's
actually used them really feel that they are indeed a 'working' hack? I find
that whenever I interact with a git repo with submodules I spend an inordinate
amount of time wrangling them to do things they clearly weren't meant to do. I
find that most people I talk to about them have experienced the same. And then
I go and do something like try to use bisect in concert with them and I
basically want to shoot my computer.

Am I missing something?

------
richardwhiuk
I'd really like something like this to happen, but I agree that this set of
patches isn't likely to get included. Submodules are my biggest gripe with git
usage, and what persuades me not to suggest people roll git out more widely.
I've seen various strategies to avoid submodules (build scripts that clone sub
repos instead is one example alternative) but it'd be much nicer if it there
was a One True Way which worked properly.

------
lnanek2
Sure would be nice. Sometimes I'm working on projects and get sent repos to
work on with all the deps missing, because people just cloned the deps into
subdirectories and git ignored them or something. Would be much better if they
had a .git in every folder like Subversion does nowadays instead of trying to
have a special root that includes and ignores certain children.

------
comex
Just to comment on one of the issues in the thread: not everyone uses a
command line editor or even an editor which can be easily invoked from the
command line (though I do), so requiring a special command, "git edit-link",
to edit some inherently textual data that seems to work perfectly well being
stored as a normal text file in the repository, is a little gross.

