Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Statistical tool for analyzing a Git repository (github.com)
120 points by arzzen 30 days ago | hide | past | web | favorite | 9 comments

I made a tool myself [0] with slightly different tools. It answers two questions: Who are the relevant coders and what parts of the code are the hotspots?

Here is an example run on sqlite:

    Top Committers (of 28 authors):
    D. Richard Hipp      13359 commits during 19 years until 2019-09-17
    Dan Kennedy          5813 commits during 17 years until 2019-09-16
     together these authors have 80+% of the commits (19172/20987)

    Files with most commits:
    1143 commits: src/sqlite.h.in      during 19 years until 2019-09-16
    1331 commits: src/where.c          during 19 years until 2019-09-03
    1360 commits: src/btree.c          during 18 years until 2019-08-24
    1650 commits: src/vdbe.c           during 19 years until 2019-09-16
    1893 commits: src/sqliteInt.h      during 19 years until 2019-09-14

    Files with most authors:
    11 authors: src/main.c          
    11 authors: src/sqliteInt.h     
    12 authors: configure.ac        
    12 authors: src/shell.c         
    15 authors: Makefile.in         

    By file extension:
    .test: 1333 files
       .c: 379 files
     together these make up 80+% of the files (1712/2138)
[0] https://github.com/qznc/dot/blob/master/bin/git-overview

Pull Panda [0] is another tool we've been using that is offered as a SaaS and is now free since it was acquired by Github this year. It tells you average PR review time, average PR diff size, who is most requested for review, review-comment ratio, etc. I can't believe it's taken Github so long to make progress on dashboards like this for engineering managers, but looking forward to the time Pull Panda is fully integrated.

[0] https://pullreminders.com

Was hoping to find an opensource alternative to codescene [0]. A tool using Predictive Analytics to find hidden risks and social patterns in your code, all from your git repo (and some manual mapping of teams). Unfortunately not the case.

Codescene was built on top of an opensource tool by the way. But the UI is nice to have.

[0] https://codescene.io/

CodeScene isn't built on the open source tool. CodeScene's engine is implemented from scratch in order to deal with large-scale codebases. The basic metrics are the same, but CodeScene adds plenty of information and pattern detection on top of them. If you're interested, the story of CodeScene is written down here: https://empear.com/blog/happy-birthday-3-years/

I love this project because ir brings up a big issue. There is a treasure trove of information buried in git that isn't easily accessible or useable, even for developers it is hard to make use of this data easily.... That's why I and friend came up with Ship Scoop[0] which is in beta testing now. For anyone interested in testing it out and a forever free account hit me up. The idea is to leverage information that is already being created to do more for you. [0] https://www.shipscoop.com/

Is this for teams that don't use a project management tool like JIRA?

I love that it’s bash and not Go or Rust. Great stuff!

I must admit, I was playing around building something similar while learning Rust, just for the fun of it. Now I probably won’t finish that:)

I wrote https://github.com/hbt/git-forks-analysis to analyze a git repo and its forks.

I modified existing tools to scan all branches.

The most interesting tool is https://github.com/src-d/gitbase which turns your git repo into a database you can query. This is more effective than parsing git log output.

I keep a list of similar tools here https://github.com/hbt/git-forks-analysis#other-git-data-min...

Happy to add yours

This and https://pydriller.readthedocs.io have been very useful in extracting useful project metrics embedded in Git.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact