Hacker News new | past | comments | ask | show | jobs | submit login
Personal Git Commit Statistics (stravid.com)
30 points by forknore on Sept 6, 2011 | hide | past | favorite | 15 comments



I once spent a weekend hacking together something like this.

http://jarofgreen.co.uk/2011/04/introducing-creepycoder/

The code was done over several quick sessions (there's a visualisation of that in the link above) so it's really scrappy, but it pulls out the timestamps of a programmers commits and presents that in several different formats. It uses Github API or just SVN log.

https://github.com/jarofgreen/CreepyCoder

I agree with you that LOC is not really a great metric.

Next I was going to look at separating variables in the data & comparing them - eg, see two programmers commit habits side-by-side.

I've also been to a talk by a company that pulled stats from commit logs, will try to find it and post it.


> http://jarofgreen.co.uk/2011/04/introducing-creepycoder/

hg has a pretty great extension shipped with it doing that kind of stats collecting: churn. Git might have something similar already, no?

For instance, your graphs 2 and 3 (commits by hour of day and commits by day of week) can be obtained through the following (tested on pypy's repo):

    > hg -R pypy churn -csf "%H"
    00    850 **************                                                        
    01    534 *********
    02    385 ******
    03    299 *****
    04    202 ***
    05    181 ***
    06    217 ***
    07    449 *******
    08   1336 *********************
    09   2421 ***************************************
    10   2893 ***********************************************
    11   3295 *****************************************************
    12   3277 *****************************************************
    13   3596 **********************************************************
    14   3915 ***************************************************************
    15   4105 ******************************************************************
    16   3955 ****************************************************************
    17   3812 *************************************************************
    18   3259 ****************************************************
    19   2091 **********************************
    20   1713 ****************************
    21   1630 **************************
    22   1466 ************************
    23   1225 ********************

    > hg -R pypy churn -csf "%u"
    1   6985 ************************************************************           
    2   7741 *******************************************************************
    3   7594 ******************************************************************
    4   7404 ****************************************************************
    5   7470 *****************************************************************
    6   5213 *********************************************
    7   4699 *****************************************
(-c counts changesets rather than diff lines; -s sorts by aggregation key rather than resultcount; -f aggredates on an strftime-like dateformat)


Cool, never used hg, thanks. Git might have a module; I've never seen it if it does. However, having it separate to git means we can pull in other data to build up a complete picture of the users activity - the comments made in an issue tracker for instance.



Maybe I'm missing something about the desired data, but why not just do something like:

    git shortlog -s | grep "My Name"
This shows how many commits you've made on a repo. Omit the grep to see everyone's counts.


That's a perfectly fine solution, but not exactly what I want. With this solution I would have to keep a list of repositories up to date. And in order to update my data I would always need to clone every repository, run the command and put all numbers together.

I like automation, so I looked for a different solution.


Fair enough. I wouldn't consider having a clone of the repo to be a blocker. You could very easily automate the cloning, reporting, deleting of a repo.


That's true. A big advantage of this solution over, let's say a self written post-commit hook, is the past. It doesn't matter when you start collecting this data, thanks to Git you will have data about yourself from the past. With a post-commit hook that's not the case.

I'm still not sure what's the best way to go, but thanks for your comments!


But then it's just a list of dates and its hard to see any patterns - that was my motivation ... I wanted graphs, iCal data you can put on a calendar and so on.


A list of dates?

Using my command I showed above I get the following output:

      2791	Ryan Funduk
Still, yea, if you want data you could put on a calendar then I would instead suggest `git log` with `pretty-format` which you could then parse pretty easily.

Docs: http://www.kernel.org/pub/software/scm/git/docs/git-log.html

Eg, here's a command that will output a csv with the commit sha, message, unix timestamp and committer name:

    git log --pretty=format:'"%h", "%s", "%at", "%an"' | grep "Your Name"


Ahh, sorry, I wasn't familiar with the shortlog command.

I still want to be able to pick out patterns in the data tho. Getting the data out is only half the battle, then you have to be able to pick out something interesting in it.

If you see my other comment here, I already wrote some scripts that pulled timestamps into iCal files and graphs.


I love this collection of Python scripts to provide Git stats for my projects. Appropriately named http://gitstats.sourceforge.net/


The question is how does he get all the data? I have a bash function that does a git commit, and sends off the message / repo name to a webserver that provides a reasonable interface for it.

http://cl.ly/3u3w1h3x2e1P1L3d0g31


You could use Ohloh. As an example, here is one of stravid's projects on Ohloh - https://www.ohloh.net/p/stravids_mapgenerator


If you don't mind having your statistics in public, have a look at Masterbranch http://www.masterbranch.com/

It also serves very well as a CV for programmers.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: