Hacker News new | past | comments | ask | show | jobs | submit login
See the History of a Method with Git log -L (calebhearth.com)
116 points by caleb_thompson 30 days ago | hide | past | favorite | 25 comments

If you're curious how Git knows the syntax of different languages in order to support this kind of feature, take a look in https://github.com/git/git/blob/master/userdiff.c

Here's how support for Python and Ruby are defined:

        "^[ \t]*((class|(async[ \t]+)?def)[ \t].*)$",
        /* -- */
        /* -- */
        "^[ \t]*((class|module|def)[ \t].*)$",
        /* -- */

it's a fantastic feature in theory. in practice, it's imprecise and error-prone, and I believe these regular expressions are probably why. I hadn't looked at the implementation before, but I approached it from the other end: I set up a bunch of test cases, and I was pretty disappointed.

there were two disappointments. first, `git log -L` seems to prioritize tracking blocks of code over lines of code. that's just a design choice I disagree with, so it wasn't a big deal. but it also lost track of lines of code for me quite often, and produced a number of false positives to boot.

to be fair, I haven't tried using `diff=LANG` (per a comment below), and that might get more reliable results.

Yeah, this has been my experience too. Easily confused by common constructions in some codebases, and that can make it almost completely useless. I would happily sacrifice a lot of speed to get a difftastic level of precision.

I've attempted something similar to your ast-search tool, but it instead iterates through git history, pulls out the relevant text and then provides the diff to the user.

It's a tricky problem because it sits somewhere between text, where a function name could get renamed and it's obvious because it is textually similar, and an AST where 'similarity' is a difficult concept.

I struggled to make it usable, but of course there's a module to do half of it that I didn't find initially - https://pypi.org/project/pyastsim/

Interesting. Also I am surprised at how short the list is!

Oh wow, this is very cool!

The way this works boils down to the following: by default, Git has a heuristic for determining the "context" of a diff hunk by looking for lines that start with certain non-whitespace characters. This context is printed out after the "@@" marker in the hunk header. Within git, this context is referred to as the "function name", but that's a bit inaccurate as the patterns will usually match other scopes like namespaces and classes.

Setting "diff=LANG" activates a different (regular expression) pattern which is used to identify context; for example, in Python, this will look for "class" and "def" keywords. Git ships with a bunch of built-in patterns (defined in https://github.com/git/git/blob/master/userdiff.c), and the "diff.LANG.xfuncname" config option can be used to specify a custom pattern.

-L can then be used to look for hunks which have context matching a certain pattern. For example, if you want to look for function "foo", you could use -L ':\bfoo\b:file.py' (note that if you don't use \b you'll get every function that contains the word foo). Also related is the -W flag, which will show the entire function/class/scope in the diff, again based on context.

Note some limitations: the matching is line-by-line, so it will pick up "context" from things like string literals and comments, and you will only get the first line of the context (so multi-line signatures will be truncated). Also, since -L takes a regular expression to match against the context line, you'll want to take care to use an appropriate pattern to avoid matching unwanted functions (e.g. use \b to avoid substring matches, or even "def foo(" to ensure you only match to methods and not to classes or parameter names).

See also https://stackoverflow.com/questions/28111035/where-does-the-... for a very comprehensive overview of this feature.

> Within git, this context is referred to as the "function name", but that's a bit inaccurate as the patterns will usually match other scopes like namespaces and classes.

Thank you for mentioning this (and the other details). The userdiff.c file was mentioned elsewhere in the thread, but I was doubting it since its regexes also matched classes, Perl POD blocks, etc. Good to have it clarified that it's the Git man pages that are inaccurate, helps understand this file (userdiff.c) and this feature better.

I love using the `-G` flag for tracking the history of any occurrence of a given regex across all directories/files. It feels more flexible than `-L`. As an example:

  git log \
    -G "$some_regex" \
    --patch \
    --stat \
    --source \
    --all \
    --decorate=full \
    --pretty=fuller \
    -- . ":(exclude)\*.lock"

I often have to remind myself that the pathspec is defined here in the glossary: https://git-scm.com/docs/gitglossary#Documentation/gitglossa....

Huh TIL Git can do this for non C-like languages out of the box. Is there a reason these custom hunk handlers are not defined by default for the languages Git ships support with?

This is a great capability. Wonder how many more are lurking within tools I use every day but never read the man pages.

Git really needs an equivalent to Perforce's Timelapse View.

How can we handle function overloading case with this?

I had no idea about this. This is amazing

How can this be uses with PHP?

According to `man gitattributes` [1] a `*.php diff=php` should be enough.

[1] https://git-scm.com/docs/gitattributes

It's just a regex to match the line you want to see the history of. Function would be a common case, but it's completely arbitrary.

For example, `git log -L:members:Cargo.toml` will show the history of constituent rust projects in a Cargo workspace.

It's a regex? Is it even guaranteed to work then? I would think you have to parse non-regular languages to always find the end of a function in a some languages?

Indeed, it does not always work correctly for Julia, as an arbitrary example I tried. Seems like it goes by indentation? Still nice though and worked out of the box!

There's a comment about that here: https://github.com/git/git/blob/bc5204569f7db44d22477485afd5...

    When writing or updating patterns, assume that the contents these
    patterns are applied to are syntactically correct.  The patterns
    can be simple without implementing all syntactical corner cases, as
    long as they are sufficiently permissive.

Wow, that file must be paradise for regex nerds, assuming there are any such...

there are, I am, and it's not (sorry). some languages have the ability to comment regexes, and that would be very useful here.

It has lots of comments inside the regexes. How would this better comment support look like?

These are not technically "comments inside the regexes", that would be something like the "Delete (most) C comments." regex here: https://perldoc.perl.org/perlre#/x-and-/xx

Here, instead, they've used string juxtaposition cleverly to write comments between parts of the regex/string. It effectively serves the same purpose though.

there are, I am, and it is. (Except for the proliferation of backslashes due to C not having "raw" strings.)

It's just to match the (starting) line. I assume the +n context uses the same method diffs do anywhere else, however that works (and sometimes doesn't).

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact