
Show HN: FixCache – keep track of bug-prone files from Git commit history - aavshr
https://github.com/aavshr/fixCache
======
nmstoker
Looks really cool, although it's a shame the description isn't a bit more
"plain language" \- a simple summary of what it is intended for, before going
into jargon would help people realise why it's valuable. The fact it uses a
cache isn't immediately important: I doubt anyone is saying "I hope I can find
a solution which uses a cache", they'll be saying eg "I'm struggling with QA,
what can streamline it?".

The paper's introduction section is more useful, but I wonder if people will
get that far. Likewise the repo screenshot is useful but it's way down the
README page.

Anyway, these are just well intentioned comments. It's easy to lose track that
potential users won't get what you've been building until you pitch it in
their terms.

~~~
aavshr
I agree, I will update the README to showcase it's intended use and value
more.

Thanks for the input!

------
aavshr
Hello HN, this is a github app implementation of
FixCache([https://people.csail.mit.edu/hunkim/images/3/37/Papers_kim_2...](https://people.csail.mit.edu/hunkim/images/3/37/Papers_kim_2007_bugcache.pdf))
I wanted to do as a side project that might prove to be useful for pull
request reviewers.

The app maintains a fix-sized cache of bug-prone files from fix-commits and
updates a pull request with information about the cache if these bug-prone
files have been updated in the pull request.

------
remram
I read the whole README but have no idea what this does. Is it letting you
know when you change files that have had a large number of bug fixes recently?
With the assumption that a code location that was recently fixed is likely to
have more bugs?

Wouldn't the fact that a location was fixed recently imply that it now has
fewer bugs? And wouldn't a location that hasn't been touched recently be
likely to be problematic?

~~~
aavshr
The core assumptions of the algorithm:

\- if a file introduced introduced a bug recently, it will tend to introduce
bugs again \- new files added with the bug introducing file will tend to
introduce bugs \- other files changed with the bug introducing file will tend
to introduce bugs \- files often changed together with the bug introducing
file will tend to introduce bugs soon

fixCache maintains a fixed-size cache of these bug-prone files based on bug
fix-commits. This helps in prioritizing verification and testing resources
(right now it only updates a pull request with a comment and a label). If a
file no longer introduces a bug, it will eventually be replaced from the
cache.

~~~
remram
Are you sure it's not discovering bad developers?

Maybe critical areas, e.g. that have the same amount of bugs as the rest but
are complained about more? (since the algorithm can only consider bugs that
have been reported, so biased to areas important to users) Or maybe that are
prioritized by management? (since it considers _fixed_ bugs, so bias towards
bugs that were fixed first)

Hopefully an increased scrutiny on new patches to those areas leads to fewer
bugs getting in which breaks the feedback loop, but if bugs are fixed in
separate commit this sounds like it could have negative effects (specific
developers/areas getting all the attention, leading to the discovery of more
bugs/nitpicks there, reinforcing the bias...).

------
rcthompson
Based on the existence of the config option "FIX_KEY_WORDS", it seems like
this detects bugs just using keyword matching? Github also tracks when issues
are fixed by commits (and pull requests, I think). Is it possible to use that
metadata in addition to (or instead of) keyword matching? (Does Github even
make that information available through an API?)

~~~
shoo
One way to support this would be to make FixCache support an optional hook
that would be responsible for deciding if a given commit was a bug fix or not.
Naively, it could have an interface like "IsBugfix(commit SHA1Hash) bool". The
hook could be implemented as a function call to a used-defined plugin or
something, or perhaps by executing an external process (e.g. shell exec
my_custom_isbugfix.sh abcd1234")

Then users could write a custom hook that looked up the given commit using
github's APIs or whatever other crazy scheme your team uses for bug tracking
(e.g. cross reference JIRA ticket number baked into commit message with JIRA
and look at the type of the JIRA ticket), but FixCache itself could be kept
clean and pure from these integrations, which most users wouldn't want.

Github has lots of APIs, I'd bet it is possible to do this with github
provided the data defining the relationship between the commit and the bug is
encoded somehow -- either in commit message or in github issue or PR metadata
or comment text.

------
bjackman
Neato. How is "spatial locality" defined? Just distance in the filesystem
tree?

~~~
aavshr
With logical coupling, if two entities are changed together many times, they
get a shorter distance.

distance(e1, e2) = 1/c(e1, e2) if c(e1, e2) > 0 otherwise infinity. c(e1, e2)
is the count of times e1 and e2 have been changed together.

I have not implemented the spatial entity in v0, as it is a bit tricky to
identify the exact file that introduced a bug from a fix commit. For now, only
the files modified in a bug-fix commit are put in the cache.

