

Most Pressed Keys and Programming Syntaxes - talmirza
http://www.mahdiyusuf.com/post/9947002105/most-pressed-keys-and-programming-syntaxes-2

======
sltkr
I admit, the Lisp one at the end made me laugh.

For most other languages the focus seems to be mainly on letters (typing out
keywords/identifiers), so in that sense there is little evidence that one
language would be easier to write than another.

Additionally, I imagine that some of the code is auto-completed by an IDE,
which this analysis fails to account for.

~~~
sordina
Shouldn't shift be twice as hot as the parentheses?

~~~
snprbob86
No because this heat map is clearly generated "offline". That is, this is
built from a source-file dataset. An "online" dataset would be the output of a
key logger. Online analysis would show the number of right parentheses to be a
tiny fraction of left parentheses due to auto insertion and other paredit-like
operations.

~~~
kleiba
Still, the OP is right - for every character that can only be types by
pressing SHIFT should also increase the counter for SHIFT.

~~~
snprbob86
If you want to be pedantic...

"Shouldn't shift be twice as hot as the parentheses?"

For offline analysis? No. Shift should be as hot as the _SUM_ of both types of
parentheses. In practice, both parentheses will be equal in count modulo some
epsilon for unmatched parenthesis in strings and comments. Therefore, shift
will be _close to_ twice as either parenthesis.

For online analysis? No. Shifted characters can come into existence without
being typed. For parenthesis, autocompletion is one way. Automatic bracket
matching is another. There are many more, including template expansion,
copy/paste, and several paredit operations.

~~~
technolem
If you want to be pedantic...

Shift is used for many combinations besides just parentheses, such as that
capital at the start of this sentence. It would likely be more than twice as
much.

~~~
z92
A lot of emotions here. Making some of us jump to keyboard without reading
carefully.

------
kjhughes
If this were actual keys pressed while programming rather than an analysis of
completed code, the action on app switch and editor meta-keys would rival that
of the highest ranking regular keys.

------
the_cat_kittles
It would be nice to see these heat maps normalized by the average frequency of
each key. Then you can really see what stands out about each particular
language.

~~~
nhebb
... and normalized against the frequency of each letter in a dictionary of
common English words. Most likely, 'e' is common in programming because it's
the most common letter in English:

<http://en.wikipedia.org/wiki/Letter_frequency>

------
citricsquid
_Vaguely_ related, if anyone is interested in tracking their own typing (and
individual key counts) check out <http://whatpulse.org> it's pretty great :-)
(my profile, <http://whatpulse.org/stats/users/210575/>)

~~~
zer01
...so this captures every key and mouse movement on your computer, then sends
it to a 3rd party server for 'analysis'.

Seems legit.

~~~
positr0n
I realize you're probably joking :) but the software just "pulses" the number
of keystrokes and clicks to the server whenever you tell it to.

It can generate keystroke data like this, but the data is stored locally.

------
alpb
I skimmed comments here and couldn't find anything that talks about why e is
so popular? Is it because ETAOIN? Because almost no "{" "}" pressed in C/Java.
Almost no ":" used in Python. "/" key is more popular than "[" "]" in
Objective-C. This makes no sense. I don't believe that blog post.

~~~
Heinleinian
Just off the top of my head, I'd guess it's due to a lot of e's in words that,
depending on the language, you'll see in almost every single method or
function. Things like return, end, else, true, include, self, private, etc.

~~~
zxoq
Don't forget set / get, which ubiquitous in almost every language.

------
simias
I would like to see what a heatmap generated from a "real" typing session
would look like (with a keylogger). You could see the influence of the editor
as well.

Since these are generated offline, the keyboard heatmaps are meaningless and
the representation is slightly misleading IMO.

------
schme
A live sample would be much more interesting, both would be best. I'd be most
interested in the meta keys. As a scandinavian especially {}, [] etc. buttons
are very awkward to press. Infact, so are most special characters used in
programming.

As mentioned, auto-complete and similar functionality change the heatmap, but
that's what people actually press. This data would be alot better for actual
use.

Though I don't mean it as a scold, it wasn't really in the hands of the author
to collect such vast amounts of live data, and surely a lot more work than was
his intension.

------
mattparlane
It's not really measuring "pressed keys", it's measuring a final product --
I'd be interested to see which languages highlight the backspace/delete keys
more.

------
eddie_the_head
I'd be interested in seeing which keys I press most when I'm programming in
APL.

~~~
drivingmenuts
They tried measuring that and accidentally created the runes to summon an
angry Elder God, with predictable consequences.

------
einhverfr
Next bit would be interesting to look at hand position of programmers in
different languages. I know when I am programming Perl my right hand tends to
move back and forth into different positions while my left hand stays in the
standard position. Wondering if I am unique there or if it is common, and how
other languages affect this.

------
joestringer
I'd be curious to see the same analysis done but with alternate keyboard
layouts, such as dvorak or colemak.

~~~
lmm
It'd come up exactly the same, unless you're going to pick out the dvorak-
users on github. Even then I wouldn't expect much difference. We're measuring
characters in code, not actual keypresses.

------
nix
"Shift" is a big omission, though you can guess at it from the emphasis on
certain numeric keys. One of the great things about Python is that there are
fewer chorded characters. It's also one of the worst things about Lisp on a
standard keyboard.

~~~
Zakiazigazi
Hi, I added shift counting to Mahdi's code a while ago:
(<https://github.com/zaki/Keyboard-Heatmap-1>), so it should be easy to
generate more correct heatmaps (in multiple keyboard layouts too) if you are
interested.

------
nkoren
Interesting, but a different visualisation would be even more interesting:
heat maps showing deviations from the mean. This would highlight the
differences between the languages, which (except for the case of Lisp) are
rather subtle.

------
tunnuz
It would be nice to consider how auto-completion actually biases the real
distribution of key-presses, e.g. I wouldn't expect closing parenthesis or
brackets to be pressed as often as their opening counterparts.

------
mahmud
Not if you're an emacs user. I use key-chords most programming language forms.

Also, for Lisp, I never touch the closing paren. M-( does both at the same
time.

------
godDLL
Here are the keys directly under my fingers on the home row: A R S T N E I O

And I can visually see the reasoning behind Colemak being like this, now.

------
iamwil
It'd be useful to use the histograms to distinguish between different
programming languages for automatic language detections of something like
gists.

~~~
obtu
highlight.js [1] does this, though by running highlighters rather than using
some learning-based mechanism. It feels wasteful to run this on display rather
than storage, though. SourceClassifier [2] also works, though with less
languages. And here's [3] an implementation made with Bayes and Go.

[1] <http://softwaremaniacs.org/soft/highlight/en/>
<http://softwaremaniacs.org/media/soft/highlight/test.html>

[2] [http://blog.chrislowis.co.uk/2009/01/04/identify-
programming...](http://blog.chrislowis.co.uk/2009/01/04/identify-programming-
languages-with-source-classifier.html)

[3] <https://github.com/octplane/go-code-classifier>

------
philwelch
Interesting how "i" is more common in some languages than others. C and C++
make sense (for(i = 0; i < n; i++)), but Ruby is a puzzler.

~~~
riffraff
"nil" "if" and "elsif"

EDIT: though looking at my sources, where it seems to also be popular, it
mostly seems to match inside non-syntax (field, nickname, to_i, strip, index,
client).

~~~
minikomi
How about |i|?

~~~
riffraff
not popular in my sources apparently, though there are many |obj, idx| :)

------
jakejake
I can understand 0 and 1 being frequently used. But I wonder why 5,6 & 7 seem
to be under-used compared to the other numbers?

~~~
zdw
I was thinking the same thing until I looked at Perl, which has 4 being quite
frequent.

The reason, of course, is that 4 is also $, which is used to denote a scalar
in Perl.

Thus, because 5,6,7 correspond to %,^,&, which generally get used to a lesser
degree for things like modulo, hashes, exponentiation and logical-and, they're
used less.

------
DeepDuh
What looks strange to me is that semicolon is is not one of the most common in
C-derived languages.

------
PerryCox
Why is E the most common across multiple languages (except Lisp (which I
assume is due to the parentheses))? I assume it's usage is higher because it's
a vowel, but none of the other vowels are nearly that high.

~~~
sosuke
E is the most common letter in the English language. There was even a fun book
written without the e: <http://en.wikipedia.org/wiki/Gadsby_(novel)>

~~~
tikhonj
There is also _A Void_ : it was originally written in French, which (I think)
uses the letter "e" more often than English and then translated to English. In
both cases it did not use the letter "e" at all.

I only know about this because it was referenced in a book on cryptanalysis.
The simplest sort of cipher can be broken by paying attention to the relative
frequency of letters in the original text. I remember a useful mnemonic for
remembering the most common letters: the sentence "a sin to err" contains
them. E, followed by t and a, are the most common out of those (t and a are
very close).

~~~
zeroonetwothree
'r' and 'h' are very close, and some sources have 'h' as more common.

------
cstavish
What kind of C programming is this guy doing? '*' is relatively untouched.

~~~
boryas
Guess: maybe all the pointers to structs are typedef-ed away?

------
AndyKelley
Dear author: would you consider also generating the results for Dvorak?

~~~
ibotty
and programmer dvorak :D

------
cpeterso
I wonder what a programming language designed to minimize shifting would look
like. Python does a pretty good job because it uses few curly brackets and no
semicolons.

~~~
riffraff
sadly, highly dependent on keyword layout, e.g. my keyboard has square
brackets and the equals sign only accessible via a key combination, while the
US keyboard does not.

It seems the only safe character across many countries are 0-9a-z.,-\ plus
space/tab/return. Not a lot to work with :)

~~~
lucian1900
That's part of why I always use a US layout, even on UK keyboards. Also
because I've grown up with it.

------
stigi
Gotta say, I expected more square brackets for Objective-C.

