
Show HN: Statistical tool for analyzing a Git repository - arzzen
https://github.com/arzzen/git-quick-stats/
======
qznc
I made a tool myself [0] with slightly different tools. It answers two
questions: Who are the relevant coders and what parts of the code are the
hotspots?

Here is an example run on sqlite:

    
    
        Top Committers (of 28 authors):
        D. Richard Hipp      13359 commits during 19 years until 2019-09-17
        Dan Kennedy          5813 commits during 17 years until 2019-09-16
         together these authors have 80+% of the commits (19172/20987)
    
        Files with most commits:
        1143 commits: src/sqlite.h.in      during 19 years until 2019-09-16
        1331 commits: src/where.c          during 19 years until 2019-09-03
        1360 commits: src/btree.c          during 18 years until 2019-08-24
        1650 commits: src/vdbe.c           during 19 years until 2019-09-16
        1893 commits: src/sqliteInt.h      during 19 years until 2019-09-14
    
        Files with most authors:
        11 authors: src/main.c          
        11 authors: src/sqliteInt.h     
        12 authors: configure.ac        
        12 authors: src/shell.c         
        15 authors: Makefile.in         
    
        By file extension:
        .test: 1333 files
           .c: 379 files
         together these make up 80+% of the files (1712/2138)
    

[0] [https://github.com/qznc/dot/blob/master/bin/git-
overview](https://github.com/qznc/dot/blob/master/bin/git-overview)

------
eatonphil
Pull Panda [0] is another tool we've been using that is offered as a SaaS and
is now free since it was acquired by Github this year. It tells you average PR
review time, average PR diff size, who is most requested for review, review-
comment ratio, etc. I can't believe it's taken Github so long to make progress
on dashboards like this for engineering managers, but looking forward to the
time Pull Panda is fully integrated.

[0] [https://pullreminders.com](https://pullreminders.com)

------
bastijn
Was hoping to find an opensource alternative to codescene [0]. A tool using
Predictive Analytics to find hidden risks and social patterns in your code,
all from your git repo (and some manual mapping of teams). Unfortunately not
the case.

Codescene was built on top of an opensource tool by the way. But the UI is
nice to have.

[0] [https://codescene.io/](https://codescene.io/)

~~~
nephrenka
CodeScene isn't built on the open source tool. CodeScene's engine is
implemented from scratch in order to deal with large-scale codebases. The
basic metrics are the same, but CodeScene adds plenty of information and
pattern detection on top of them. If you're interested, the story of CodeScene
is written down here: [https://empear.com/blog/happy-
birthday-3-years/](https://empear.com/blog/happy-birthday-3-years/)

------
adim86
I love this project because ir brings up a big issue. There is a treasure
trove of information buried in git that isn't easily accessible or useable,
even for developers it is hard to make use of this data easily.... That's why
I and friend came up with Ship Scoop[0] which is in beta testing now. For
anyone interested in testing it out and a forever free account hit me up. The
idea is to leverage information that is already being created to do more for
you. [0] [https://www.shipscoop.com/](https://www.shipscoop.com/)

~~~
itake
Is this for teams that don't use a project management tool like JIRA?

------
surfsvammel
I love that it’s bash and not Go or Rust. Great stuff!

I must admit, I was playing around building something similar while learning
Rust, just for the fun of it. Now I probably won’t finish that:)

------
hbt
I wrote [https://github.com/hbt/git-forks-
analysis](https://github.com/hbt/git-forks-analysis) to analyze a git repo and
its forks.

I modified existing tools to scan all branches.

The most interesting tool is
[https://github.com/src-d/gitbase](https://github.com/src-d/gitbase) which
turns your git repo into a database you can query. This is more effective than
parsing git log output.

I keep a list of similar tools here [https://github.com/hbt/git-forks-
analysis#other-git-data-min...](https://github.com/hbt/git-forks-
analysis#other-git-data-mining-tools-worth-a-mention)

Happy to add yours

------
bsg75
This and [https://pydriller.readthedocs.io](https://pydriller.readthedocs.io)
have been very useful in extracting useful project metrics embedded in Git.

