
Amount of profanity in git commit messages per programming language - AndrewVos
http://andrewvos.com/2011/02/21/amount-of-profanity-in-git-commit-messages-per-programming-language/
======
edw519
_Out of 929857 commit messages, I found 210 swear words (using George Carlin's
Seven dirty words)._

That set only includes [shit, piss, fuck, cunt, cocksucker, motherfucker,
tits], so these are probably not meaningful results.

I have personality commented "asshole forgot to increment the counter" 527
times in 4 different languages.

[EDIT: 528 times in 5 different languages. Sorry, bitches.]

~~~
ZachPruckowski
Yeah, any viable list of swear words has to include "damn" (and derivatives),
"hell", and "ass" (and derivatives). I'd even go so far as to say that "crap"
and "retard" (and derivatives) are sufficiently unprofessional that they
belong on the list.

~~~
reinhardt
Add "wtf" to this list and you match more than 50% of all commit messages ;-)

~~~
cema
Oh no, _wtf_ is professional terminology, cannot remove.

------
jakevoytko
I never curse in my commit messages. That doesn't mean I don't want to!
Cursing is a vice of mine, acquired through summers of cleaning bathrooms and
picking up trash at a state park in high school. I use euphamisms when coding
professionally, but it's easy to map my commit messages at old companies back
to my original swear.

"Blameless" bug:

    
    
       Original: Now recalculates the height of the container element after repopulating
           the content.
       Translation: Did Bob test this fucking thing ONCE before he committed this?
    

Fixing my own mistake:

    
    
       Original: Tweaks the NUM_PATHS config value.
       Translation: Wow, I apparently have shit-for-brains. I hope nobody ran a build in
           the past 20 minutes.
    

Overdesigning:

    
    
       Original: Updates the object creation code per Bob's feedback.
       Translation: Another Goddamn FactoryFactoryBuilder?! I officially don't 
           understand this codebase.
    

Major cleanup needed:

    
    
       Original: Style tweaks needed for GCC compilation.
       Translation: OMFG. This isn't even valid C++. It doesn't even compile.
    

OK, I'm not perfect:

    
    
       Original: Fuck IE7.
       Translation: No seriously, fuck IE7.

------
nostrademons
I'd kinda like to see _which_ swear words appear most often in commit
messages. I'm guessing that "shit" and "fuck" are much more common than
"cocksucker" and "motherfucker", and if that's not true, I want to know which
language has the most cocksuckers and motherfuckers.

~~~
AndrewVos
From what I remember there was only one or two "motherfuckers".

I will post up some more data if anyone is interested.

~~~
WalterGR
I run a slang dictionary website which lets users assign an offensiveness
score to each term. That would be an interesting bit of data to add: not only
the raw word count, but _how offensive_ the swearing is for each language.

(For sample data, the 100 most vulgar words on the site are in a table here:
<http://onlineslangdictionary.com/lists/most-vulgar-words/> )

------
stcredzero
How to offend members of 3 different programmer communities in 9 different
ways with just one sentence: "It somehow makes sense that C++, Ruby, and
JavaScript are all equally profane."

~~~
bad_user
Or that they are so relevant in 2011 :-)

~~~
stcredzero
If you think that's clever, then perhaps you're lacking a few bits of
programming language history. If you think it's only partly clever, you're
perhaps lacking fewer pieces. If the parent comment just makes you roll your
eyes and think it's probably not worth explaining, you probably understand
what I'm getting at.

------
twymer
Given that there were only 210 total swear words, the accuracy of this seems
pretty questionable. It's possible that one guy could be responsible for a
large percentage of swearing for a given language.

~~~
rosser
It's code comments vs. commit messages, but the prevalence of profanity in the
Linux kernel tree suggests developers' use of blue speech is pretty
widespread.

------
csphy
I want to know the proximity of the curse to 'IE' in the Javascript commits

------
snprbob86
Pie chart? I have no idea how to interpret this...

[http://www.flickr.com/photos/amit-
agarwal/3196386402/sizes/l...](http://www.flickr.com/photos/amit-
agarwal/3196386402/sizes/l/)

~~~
DanielStraight
Pie charts have a lot of drawbacks, sure, but it's ridiculous that we're at
the point now where the first (and highest rated) response to a pie chart is
always a negative comment about pie charts, regardless how good or bad the pie
chart is.

This one in particular is very clear:

C++, Ruby and Javascript have the most profanity. They're relatively equal to
each other and collectively account for more than 50% of the swearing in
commit messages.

C is next, with significantly less swearing.

C# and Java are roughly tied a bit below C.

Python and PHP have, comparatively, almost no swearing.

Was that really so hard? When the data is already subjective (what is and
isn't a swear word) and intended almost solely for humor, do we really need
more precision than a pie chart offers?

It is at best hyperbolic and at worst dishonest to say you "have no idea" how
to interpret this. You have an idea. You just don't have precision.

~~~
rbanffy
> Python and PHP have, comparatively, almost no swearing.

Of course. Python users are happy people.

I wonder what happens with PHP... ;-)

~~~
slig
For PHP, you'd have to search for "ass_hole", "fuckmother", "cock_sucker", and
so on to be fair.

~~~
stcredzero
Reading the ones with the underscores makes me think this mode of swearing
deserves its very own accent for when it's read aloud.

------
rflrob
A neat idea, although I think the pie chart isn't really the right format. I'd
prefer to see a bar graph, with the y-axis as (swears/million messages) or
similar.

~~~
m0hit
and please the colors. Why the two greens and two blues? Use either two colors
(to differentiate alternative elements) - you would not need this if this were
a bar chart (which it should be) or use 8 markedly different colors.

------
r00k
I wrote a post on my blog 4 years ago (!) with lots of examples of profanity
in code comments.

It took a half-hour to write and has consistently gotten more traffic than the
rest of my blog.

Ah well, give the people what they want: <http://codeulate.com/2007/12/fcking-
programming/>

~~~
JonnieCache
To be fair, some of those are hysterical.

In particular:

    
    
        # no, no, no, no, no, no, no, no
        # no. fuck no. I am a fucking
        # moron.
    

I think it's the punctuation. It's like angry japanese poetry.

------
PonyGumbo
I'm completely stunned that PHP is on the bottom here.

~~~
phamilton
A small child doesn't think a crayon is badly designed until he has used a
pencil or a pen. Without a frame of reference, a PHP developer has little
reason to swear at the code.

~~~
6ren
That's assuming that most profanity is directed at the language. It might be
the case, but one may also be annoyed by a codebase, by an algorithm, by a
bug, by a co-worker, by the business reason for the commit, or by something
totally unrelated.

There can also be humourous swearing, and swearing for emphasis (see DHH),
neither of which convey annoyance at all.

Hypothesis A: PHP projects tend to be smaller and simpler, with fewer
interactions with other code/requirements/people, and therefore fewer
opportunities for annoyance.

Hypothesis B: PHP is actually a very simple and close fit to its use-case, of
unambitious webapps, and so often the tool just does its job, and disappears.
There's no _space_ to swear at a tool that you don't notice.

~~~
dmoney
At what point does a webapp become "ambitious" and not suited for PHP? What
about PHP with a framework such as, e.g., Cake? I'm curious because I'm
embarking on a project and probably choosing Rails over Cake, but only because
the other developer is a Rails guy.

------
jrockway
No Perl? I fucking swear all the time...

~~~
JustinSeriously
I ran it for Javascript, Ruby, and Perl, and I got this:

{"JavaScript"=>48, "Ruby"=>46, "Perl"=>28}

------
cfontes
Is it ok statistically to get for example all Ruby commits and 25% of C++ ones
and compare them ? Another kind of chart would be nice... also some other
params.

~~~
jhamburger
Why not? As long as the sample is random and equal in size.

------
jefe78
Well played!

I wonder if this has anything to do with:
<http://news.ycombinator.com/item?id=2247962>

I know I plan to comment my python code a little differently now! Maybe that
will help balance the numbers?

I know I'd be pretty vulgar if I programmed in C++/Javascript all day!

------
mgrouchy
I'm surprised there are so few commit messages with curse words in them. 210
out of 929857, thats like 0.02%, I would have thought that developers were
more vulgar then that(I know I am).

Maybe if we looks at comments in source code we would get a better
representation of the vulgarness of developers.

------
m0hit
Interesting. Of course I am thinking of the many ways that the results might
not be representative, but that doesn't make it any less of a cool weekend
project.

Would be great to see some context around where the most _profanities_ occur
by language, and the kind used.

~~~
jhamburger
One that came to mind is certain languages being more popular in english vs.
non-english speaking countries.

~~~
phamilton
Most swear words are international.

Especially the F bomb.

------
jcw
Yeah, this is funny, there is novelty here, etc. A story counting profanities
in source code/commits/etc. pops up every now and then.

I've found that the only real profanity in a source code comment is "HACK".

My swear jar overflows with quarters.

------
JCB_K
Next time they do a test they should include "git". Let's see what happens.

------
damoncali
My favorite:

\- fuck it. let's release

------
khingebjerg
The commit messages. [https://github.com/AndrewVos/github-
statistics/blob/master/p...](https://github.com/AndrewVos/github-
statistics/blob/master/profanity.yml)

------
chrismetcalf
Relevant:

<https://gist.github.com/198320>

A one-liner I wrote that uses git blame to seek out who swears the most in a
given codebase. Pretty fun.

~~~
antihero
When I ran it just said "5" :S

------
jxcole
This is an example of bad poor graphical representation. The proper way to do
this would be to take the swear words per word for each language and then map
this to a bar graph, then you could easily see which has the highest vs the
lowest.

A pie chart is good for things that add to 100%. The number of swear words
that occur in something is not appropriate for this type of graphic.

~~~
daeken
Considering that there were an equal number of commit messages for each
language, this is a perfectly adequate representation. In this case, the pie
chart's total is the number of swears, with each slice representing a
language's share of that.

------
jerguismi
Not statistically signifant, but interesting idea.

------
davidw
Hah, maybe I should add that to <http://langpop.com> \- are you interested,
Andrew?

------
mrcharles
This would be more interesting if it scanned comments from source files for
profanity.

~~~
cookiecaper
This is per commit message, which is mandatory in git, not occurrences inside
the source of the project.

Perhaps PHP and Python has fewer occurrences because the employers of people
that use Python or PHP are less likely to tolerate inappropriate language in
the code.

~~~
rbanffy
> Perhaps PHP and Python has fewer occurrences

Maybe Python programmers are just happy and well adjusted people...

~~~
jackolas
Or we swear in comments not commits :D

------
tomwans
Seems that python devs are practicing the Zen of Python :-)

------
michaelschade
Even ignoring that which other commenters have pointed out, I simply don't buy
it for one reason: that PHP slice is way too small.

~~~
AndrewVos
Yeah it's oddly disturbing. My thoughts are that they just don't care that
much about life anymore to swear in their commits?

~~~
jasonlotito
> Yeah it's oddly disturbing. My thoughts are that they just don't care that
> much about life anymore to swear in their commits?

Or maybe we're all just mature enough to realise swearing in comments is
stupid, and trying to read anything from it is equally stupid?

------
boctor
No Objective-C love for the iPhone/iPad crowd?

~~~
tudorizer
"Shit, I missed a ]"

------
mcarlin
Sample size is too fucking small. Also, as others have pointed out, you've
allowed far too few swearwords.

Good job though :-)

------
billybob
One might assume that more profanity in a langage = more frustration with that
language. But I'd bet that proportion of business use has something to do with
it, too.

If you're hacking on a personal project, you might feel freer to swear in your
commits. And my guess is that you're more likely to code Ruby for personal
projects than C#. But I could be wrong.

------
dcdan
Numbers of committers sampled per language might be helpful in identifying
potential bias.

------
mleonhard
Perl might beat C++ if it was included.

------
edge17
i'd like to see this same comparison, except comparing different version
control systems

------
dustingetz
how about 'chainsaw'?

------
kunley
Go Ruby go!! ;D

