Here's how support for Python and Ruby are defined:
"^[ \t]*((class|(async[ \t]+)?def)[ \t].*)$",
/* -- */
/* -- */
"^[ \t]*((class|module|def)[ \t].*)$",
/* -- */
there were two disappointments. first, `git log -L` seems to prioritize tracking blocks of code over lines of code. that's just a design choice I disagree with, so it wasn't a big deal. but it also lost track of lines of code for me quite often, and produced a number of false positives to boot.
to be fair, I haven't tried using `diff=LANG` (per a comment below), and that might get more reliable results.
It's a tricky problem because it sits somewhere between text, where a function name could get renamed and it's obvious because it is textually similar, and an AST where 'similarity' is a difficult concept.
I struggled to make it usable, but of course there's a module to do half of it that I didn't find initially - https://pypi.org/project/pyastsim/
The way this works boils down to the following: by default, Git has a heuristic for determining the "context" of a diff hunk by looking for lines that start with certain non-whitespace characters. This context is printed out after the "@@" marker in the hunk header. Within git, this context is referred to as the "function name", but that's a bit inaccurate as the patterns will usually match other scopes like namespaces and classes.
Setting "diff=LANG" activates a different (regular expression) pattern which is used to identify context; for example, in Python, this will look for "class" and "def" keywords. Git ships with a bunch of built-in patterns (defined in https://github.com/git/git/blob/master/userdiff.c), and the "diff.LANG.xfuncname" config option can be used to specify a custom pattern.
-L can then be used to look for hunks which have context matching a certain pattern. For example, if you want to look for function "foo", you could use -L ':\bfoo\b:file.py' (note that if you don't use \b you'll get every function that contains the word foo). Also related is the -W flag, which will show the entire function/class/scope in the diff, again based on context.
Note some limitations: the matching is line-by-line, so it will pick up "context" from things like string literals and comments, and you will only get the first line of the context (so multi-line signatures will be truncated). Also, since -L takes a regular expression to match against the context line, you'll want to take care to use an appropriate pattern to avoid matching unwanted functions (e.g. use \b to avoid substring matches, or even "def foo(" to ensure you only match to methods and not to classes or parameter names).
See also https://stackoverflow.com/questions/28111035/where-does-the-... for a very comprehensive overview of this feature.
Thank you for mentioning this (and the other details). The userdiff.c file was mentioned elsewhere in the thread, but I was doubting it since its regexes also matched classes, Perl POD blocks, etc. Good to have it clarified that it's the Git man pages that are inaccurate, helps understand this file (userdiff.c) and this feature better.
git log \
-G "$some_regex" \
-- . ":(exclude)\*.lock"
For example, `git log -L:members:Cargo.toml` will show the history of constituent rust projects in a Cargo workspace.
Indeed, it does not always work correctly for Julia, as an arbitrary example I tried. Seems like it goes by indentation? Still nice though and worked out of the box!
When writing or updating patterns, assume that the contents these
patterns are applied to are syntactically correct. The patterns
can be simple without implementing all syntactical corner cases, as
long as they are sufficiently permissive.
Here, instead, they've used string juxtaposition cleverly to write comments between parts of the regex/string. It effectively serves the same purpose though.