
Linux kernel swear counts - crashoverdrive
http://www.vidarholen.net/contents/wordcount
======
gizmo686
Its worth looking at the scale of the swear count. Linux has about 16629976
lines of code, and I'd estimate from the graph that it has about 370* swear
words (excluding penguin). If you look at the second graph, that is less then
1 swear in 300000 lines.

I checked this on the source tree for 3.8.0. The numbers appear to be inflated
by allowing the swear words to be part of other words.

For example, "shit" appears in 121 lines, but " shit " only appears in 10
lines. Looking at the offending lines, there is only one swearword that is
missed by excluding spaces.

"fuck" appears 29 times, all of which are some conjugation of the verb (and
some lines have duplicates I'm not counting).

"crap" appears 161 times, 20 of which are part of "scrap"

"bastard" appears 17 times, 6 of which go to email addressed hosted at
"lazybastard.org" and "you-bastards.com"

"penguin" appears 99 times, two of which are jokes.

~~~
kleiba
If you want to check the various words in isolation, surrounding spaces might
cost you some matches, e.g. at the end of a sentence ("It's a piece of shit.")
or when followed by a comma. Also, did you ignore case ("Shit happens.")?

How about trying \b[Ss][Hh][Ii][Tt]\b and the likes?

~~~
gizmo686
There were few enough curse words that I manually checked the output of not
requiring spaces. Regarding the case sensitivity, it looks like I missed 12
instances of swearing because of that. Also, grep has a "-i" parameter, which
makes it case insensitive.

~~~
archangel_one
Also a -w parameter, to match whole words only, which is generally better than
adding spaces :)

------
DarMontou
I'm curious about the motivations for coding profanity. I've occasionally
included comments like "don't f* with this unless you understand x, y, and z"
in an attempt to protect fragile sections of code from careless collaborators.
Nearly identical comments without profanity seemed ineffective. Within common
conversations I know that profanity often carries an implication of violence,
usually for the purpose of intimidation. I also find that profanity is
frequently used for comedic relief.

Hopefully these counts don't indicate increasing fragility or violent
disagreements within the kernel. Does anyone with kernel experience have any
insight into common purposes for kernel profanity?

~~~
gizmo686
From a quick grep of the source, it looks like a contributing factor is Matsu
_shit_ a Electric Industrial. If you are interested, I posted the output of
grep to pastebin [0].

[0][http://pastebin.com/MNZF1Vz0](http://pastebin.com/MNZF1Vz0)

EDIT: This is against linux-3.8.0 from Mint's repository.

~~~
anabis
The group's name has change from Matsushita to Panasonic, so this would bring
the count down in the future.

~~~
gizmo686
I doubt they retroactivly change code/comments because the original
organization changed their name. The Matsushita references will probably stay
there until they bitrot and get removed/rewritten.

------
shaggyfrog
Can anyone explain the inclusion of "penguin"? I know Tux is the mascot and
all, but is it some kind of inside-joke-swear-thing?

~~~
DarMontou
If you look at the grep output that gizmo686 posted to pastebin (see below),
you can search for penguin. There are a lot of web addresses, email addresses,
maintainer roles (chief penguin), and logo references. There are a few
instances of variable names containing penguin as well.

[http://pastebin.com/MNZF1Vz0](http://pastebin.com/MNZF1Vz0)

------
foobarbazqux
I like jwz's classic post about swear words in Mozilla:

[http://www.jwz.org/doc/censorzilla.html](http://www.jwz.org/doc/censorzilla.html)

~~~
valleyer
I was laughing until I saw the swears in variable names. Now I'm mad.

------
ColinWright
When this was sumitted some years ago there was some discussion. It might be
worth comparing that with the comments here:

[https://news.ycombinator.com/item?id=850761](https://news.ycombinator.com/item?id=850761)

It has been submitted a few more times, none with comments:

[https://news.ycombinator.com/item?id=2070056](https://news.ycombinator.com/item?id=2070056)

[https://news.ycombinator.com/item?id=4045103](https://news.ycombinator.com/item?id=4045103)

[https://news.ycombinator.com/item?id=6307849](https://news.ycombinator.com/item?id=6307849)

The last of these was just 14 hours ago - the trailing slash defeating the HN
dup detector.

------
m_ram
A casual search of code in Debian [1] shows that this is not limited to the
Linux kernel. Thankfully there's no equivalent to the Parents Television
Council [2] or Focus on the Family [3] for open source projects.

[1]
[http://codesearch.debian.net/search?q=fuck](http://codesearch.debian.net/search?q=fuck)

[2]
[https://en.wikipedia.org/wiki/Parent%27s_Television_Council](https://en.wikipedia.org/wiki/Parent%27s_Television_Council)

[3]
[https://en.wikipedia.org/wiki/Focus_on_the_Family](https://en.wikipedia.org/wiki/Focus_on_the_Family)

------
aylons
I wonder what caused the peak of shit just before 3.0.9, and what reversed it.

The spike just before 3.2.17 must be a glitch, but if it isn't, it's very
intriguing.

~~~
ucarion
It seems as if a lot of lines of code were removed or not measured at that
particular measurement; despite the drop in the usage of 'shit', its
occurrence per line in fact jumps up at that moment.

------
forkrulassail
I love how penguin is halfway between bastard and crap, I'll update my swear
word dictionary.

------
NAFV_P
There is an area around 2.4.36.5 where the fall and rise of "fuck" and "shit"
closely resemble each other. I'm guessing it's a macro which concatenates the
two expletives for use in the code.

~~~
ajkjk
or a variable named 'fuckshit'

~~~
NAFV_P
That's what I'm thinking, they used the ## preprocessor operator to stick
"fuck" and "shit". Oh, if this is linux, then the identifier is more likely to
be "fuck_shit".

------
chatman
With people like Linus Torvalds at the helm of the project, what else can be
expected?

~~~
primelens
That's a little harsh. Yes he has a tendency to be - erm, 'vehement' about the
quality of code, but there is no one else anyone would rather have at the helm
of the kernel project.

