
What science says about naming in programming - fagnerbrack
http://www.felienne.com/archives/5452
======
gluczywo
Naming is related to scope and I'm surprised there is so little about it in
the article. While trying to find an universal principle I came up with this
rule: "identifier's entropy should be proportional to its scope".

Global variables (if any) are long explicit phrases while inner loop variables
can be a single letter.

Without taking scope into account the naming guidelines become dogmatic and
impractical trivia.

EDIT: wording

~~~
eecks
I always name my variables descriptive names. For example, if I am in a loop
and the variable represents a "record" I will name it record rather than r.

It's much more readable and you're less likely to get name clashes with other
functions (r could be lots of things (even in the same file) - render, record,
row, red, right)

~~~
tomatsu
I always go with plural for lists and singular for iteration variables.

    
    
      for (let thing of things) {...}

~~~
TeMPOraL
Personally I have a habit of avoiding that, especially in dynamic languages,
because a single typo could have nasty consequences there. So I always figure
out a synonym, or use a name for the variable that's either more descriptive:

    
    
      for (let particularThing of things)
    

or at least visibly different:

    
    
      for (let th of thigns)

~~~
tomatsu
The dynamic languages I primarily use are TypeScript and Dart. I always have
enough type information floating around to catch this kind of thing. But even
in JavaScript it wouldn't be much of an issue since it will instantly explode
on the first run, which is certainly inconvenient, but nothing major.

------
barrkel
Naming is a social problem, and there's a strong risk of correlation with
hidden social variables in any empirical study, while any non-empirical study
is suspect because humans are what counts.

In particular the most likely explanation for correlation between bad names
and bad code is unskilled or inexperienced programmers. This needs to be
controlled for.

My personal observation is that well-factored code can use shorter identifiers
because it deals with fewer concerns. Long identifiers frequently indicate
muddy thinking that mixes concepts, where the identifiers are burdened with
disambiguation.

Further, in languages with particularly verbose identifiers (Java especially),
it can get hard to see the wood for the trees: the density of text obscures
code and data flow in the program. There's a similar argument to be made about
the tradeoff between symbolic operators and textual operators: the more
frequently they're used in a domain, the better they're represented by a
symbol. Symbols, of course, are especially short identifiers.

~~~
sh_tinh_hair
"In particular the most likely explanation for correlation between bad names
and bad code is unskilled or inexperienced programmers..."

Totally disagree with this assertion: Laziness and two fingered typing are to
blame.

~~~
barrkel
Two fingered typing causes bugs directly? Or it causes bugs because it
encourages short names? The article is about the relationship between names
and code quality.

Naming is one of the hardest things to get right, I often have to comment on
it in code reviews. Long names are usually more lazy than concise ones. The
other problem is naming things more specifically than necessary, obscuring
genericity of functions.

------
nv-vn
I disagree with a lot of this, but I'll focus on the worst offender. The rule
against identifiers shorter than 8 characters specifically disallows `name',
which makes absolutely no sense to me. This particular word says a lot, and it
can easily have its context implied with no problems. Example, if we're
working on a database that stores people's personal information, it's easy to
guess what `name' means. If we're processing HTML forms, again, name has a
very clear meaning here. We could also use it to describe syntax (i.e.
identifiers, or as an alternate term for JSON-style keys). If I tried for
another couple minutes I'm sure I could think of hundreds of use cases for
that identifier with a clear context. What I don't understand, though, is the
exceptions they list. Such as "c, d, e ... in, inOut, ...". These, on the
other hand, have very few contexts where they're at all clear. What happened
to the whole stigma against 1-char identifiers? How could a 7-char identifier
be any worse than a 1-char identifier? This whole rule seems like nonsense.

~~~
junke
Yes, many useful names are short: point, input, output, x, y, z, origin, r, g,
b, alpha, color, vector, pi, string, file, buffer, open, close, print, format,
copy, move, list, load, run, id, pop, push, stack, context, env, user, draw,
update, refresh.

Also, notice how R, G, and B have a very clear meaning when put near "color"?
Context is a thing our brain manages very well.

~~~
vorg
Many of those short names are keywords (or pre-defined identifiers) in many
programming languages, e.g. string, close, print, copy in Go. Having to use
another name instead can also be difficult.

~~~
junke
Poetry time:

    
    
        (let ((copy (copy-list list)))
          (list copy 'copy 'let))
    

A 13 syllabic verse (7+6, with caesura)

(but yes, you are right)

~~~
fiddlerwoaroof
FWIW, I never have really found the ambiguity caused by using the same symbol
as both a variable and a function to be a readability problem: I think people
are generally pretty good at things like noun/verb ambiguity and so this kind
of ambiguity sort of "clicks" with us.

What has bitten me many times is accidentally overwriting a builtin function
like max or len in python and then scratching my head about the strange errors
that result from this.

~~~
junke
Overwriting is not so much a problem of naming than one of dynamic updates:
some environments do actually display a warning when you change a global
binding, which helps.

~~~
fiddlerwoaroof
That helps, but I find that a Lisp-2 like Common Lisp (I.e. A language with
different namespace for variables and functions) is a more intuitive
programming environment than languages with a single namespace.

~~~
junke
I prefer Lisp-2 too.

~~~
kazinator
I'm extremelly well informed about Lisp-1 versus Lisp-2. I chose Lisp-2 for my
own Lisp dialect, with special support for working in the Lisp-1 style: there
is a way to code with higher-order functions without ever using the dialect's
equivalents of the funcall function or function operator. I thereby consider
the Lisp-1 vs. -2 question to be a solved problem.

The requirements which drive the preference for either are all valid. I, the
designer, have simply taken both requirements and integrated them. Of course,
requirements like, "I want to use list as a variable name without shadowing
the lisp function" directly conflicts with "I want to reference functions and
variables the same way without function or funcall". However the conflict can
be confined down to the argument list of an individual form. I.e. "Sure, you
can have both, just not in the argument list of the same form."

~~~
fiddlerwoaroof
Incidentally, I've been using TXR a bit for text-processing recently, it's
generally been pretty nice once I've figured out what exactly needs to be done
to get the matching to work, but the various failure modes for the pattern
language aren't very intuitive.

~~~
kazinator
I don't disagree. If debugging that sort of thing were intuitive, we'd all be
coding in Prolog. It's like an invisible "if" hidden in every statement which
branches elsewhere if there is no match. @(assert) can help; if you know that
something that follows must match, you get an exception if it doesn't. I used
asserts in the man page checker:

[http://www.kylheku.com/cgit/txr/tree/checkman.txr](http://www.kylheku.com/cgit/txr/tree/checkman.txr)

------
wruza
What we need is better tooling. First of all, stop pretending that structured
(one of world's most complex structured) description of commands and
expressions is "just a text". If programming environments stored sources as
structured items, it would be easy to give an entity many names (e.g. human
description, full qualified name, short name in current module, and shortest
scope/mind-local alias). And then switch modes from 'first time reading' to
'familiar development' (or 'lawyer style', where descriptive names appear only
in appropriate places and assertions are visible). Similar issues we have with
formatting, ridiculous tabs-vs-spaces wars, encodings, even with line endings.

Try to open raw html/css instead of using browser to view modern website. Hard
to get the picture? Yeah, and that's where we are in programming today.

~~~
alistproducer2
upvote for the interesting idea. To your knowledge is there any IDE out there
with such a feature?

~~~
wruza
Nope. At least my annual efforts to dig into that direction always fail. It
seems, that there is something: literal programming, html-based prolog posted
here recently (someone please help to recall its name). But it is not exactly
that. And it always contains esoterics under the hood. I want C, C++, Perl,
Java, anything mainstream; not Prolog.

I'm too busy and too single to make prototype on my own (and more related
ideas on top of that in thoughts.txt). I've posted this in forum comments for
years, and sometimes got critics that made me temporarily think it is just
another view on existing techniques (as of company-wide, not person-wide). Now
I _run_ the company and this got only more demand.

Honestly, maybe I'm just doing existing things wrong, idk.

~~~
ilaksh
[http://projectured.org](http://projectured.org)

[https://www.jetbrains.com/mps/](https://www.jetbrains.com/mps/)

[https://en.m.wikipedia.org/wiki/Structure_editor](https://en.m.wikipedia.org/wiki/Structure_editor)

[https://en.m.wikipedia.org/wiki/Intentional_programming](https://en.m.wikipedia.org/wiki/Intentional_programming)

[https://martinfowler.com/bliki/ProjectionalEditing.html](https://martinfowler.com/bliki/ProjectionalEditing.html)

[https://news.ycombinator.com/item?id=12205757](https://news.ycombinator.com/item?id=12205757)

~~~
wruza
Word 'Jetbrains' in this list looks very promising, I'll definitely give it a
try on spare time. Thanks!

------
xkxx
The article cites some scientific papers, but I'm quite skeptical about them.
How do they define "good" and "bad" identifiers? In the end, what's a good
naming convention for one person or actor may be not good for another. For a
human actor `getBase` is an undoubtedly better identifier than
`sdfjkjlfsdkjfdskjsdfkjfs`; for a computer, though, there is no much
difference between both. For somebody from the UK `getBehaviour` is better
than `getBehavior`; for an American it's vice versa. Those papers take some
arbitrary naming convention and postulate that it's either "good" or "bad".
Postulating something unless absolutely necessary is not how science is
supposed to work.

DISCLAIMER: I may be wrong because my judgment of those scientific papers is
based only on this article. If they do define "good" and "bad" in some
meaningful way, then it's the article's fault for not pointing it out.

~~~
falsedan
That's not how references work in scientific articles: you're expected to read
the referenced papers (or at least the abstract) & understand the
implications/conclusions, or get someone else to summarise it for you.

------
stabbie_mcgee
DB is cactus:

[http://webcache.googleusercontent.com/search?q=cache:http://...](http://webcache.googleusercontent.com/search?q=cache:http://www.felienne.com/archives/5452)

~~~
transposed
hn hug of death. I tend to use shorter names, but maybe because most of my
code isn't very complex. Sometimes I just change the spelling if I want a
similar but new variable.

------
userbinator
_In general, programming language does not influence the quality of names,
although Java code tends to exhibit better naming, perhaps due to the early
popularity of Java coding style guidelines._

I completely disagree, having actually worked with a lot of Java once. The
language in general is extremely verbose, and the names make it worse. What
saddens me is that it seems a lot of the "better naming" discussed here is the
complete opposite of what I'd consider good --- I mean, one of the guidelines
listed there is "Identifiers should not consist of fewer than eight
characters, with the exception of {short list containing very few dictionary
words}"!? In the (not Java) code I work with, the majority of identifiers,
i.e. locals, are below 8 characters.

IMHO identifiers should be short and memorable, preferably even having a
unique and equally memorable pronounciation (I think "strlen", given as a
"flawed" identifier in one of the tables, is an example of a good one, as are
the others in the C library), but it seems a lot of "modern" code is going in
the opposite direction. "Long and descriptive" identifiers may seem more
"friendly" at first, but I feel it gets rather tiring after having to read
plenty of code in that style --- especially if many of the identifiers share a
common prefix, forcing you to read through the extra noise but having to
compare to ensure they're actually the same.

Thus there appears to be two distinct style groups, one with the long and
verbose "dictionary word" names and traditionally represented by Java and C#.
This group is spreading, I think partly because of the "ease of entry" noted
above. The other group uses the short, succint, abbreviated naming
traditionally represented by C, Asm, sh, awk, and most of the existing Unix
conventions. (As another example to contrast sh, PowerShell is a member of the
former group.) This article seems to say the former is better than the latter.
I am not convinced.

Some examples for concreteness:

Former style:
[https://github.com/dotnet/roslyn/blob/master/src/Compilers/C...](https://github.com/dotnet/roslyn/blob/master/src/Compilers/CSharp/Portable/Compiler/Compiler.cs)

Latter style: [http://v6.cuzuco.com/v6.pdf](http://v6.cuzuco.com/v6.pdf)

Edit: Another example of the latter style, from something more vaguely related
to the example in the former style:
[https://news.ycombinator.com/item?id=5748672](https://news.ycombinator.com/item?id=5748672)

~~~
manyxcxi
I use probably 70% Java in a given week, there was a time when I was 100%
.NET, and there have been long stretches where the majority was overwhelmingly
C, Node, Python, PHP. This is a long way of saying I've spent a lot of time
floating around some very stylistically different languages so I don't really
adhere to one dogma over another and I've grown to write code that mostly kind
of/sort of feels the same between languages. Naming convention-wise at least.

Java/.NET naming verbosity is great when I'm trying to figure out what the
heck I'm receiving without tracking down the class definition. The thing is,
you're only willing to be that verbose because you've got autocomplete in an
IDE so peeking into that long object/class/parameter name is only a click
away.

The naming conventions like
OncePerRequestHttpWebHandlerSecurityContextAwareInterceptorFactoryImpl are
obsurd, but there's no reason you can't come to a more concise descriptive
name. Frankly, if you're implementing the interface you can call it whatever
you want.

On the other hand I've seen some C and JS code that I could SWEAR was run
through an obfuscator before I got to it, but no, the devs refused to use more
than about three letters max. This is not helpful.

Names like strlen, idx, propCnt are concise AND descriptive. I generally like
to put units on things if they're some kind of measurement so I typically do
stuff like intvlMS, distKM.

There's nothing wrong with being descriptive, but you can really do a
disservice to those (including yourself) coming after you when it becomes very
hard to fit the mental model.

~~~
milansm
English is not my native language, and it took some time for me to actually
parse those short variable names you provided as examples. Though, I still
don't know what propCnt is supposed to mean. But even if I did, it takes some
cognitive effort to parse shortened names. I think that's the case with native
speakers as well, but it's more noticeable at non-natives because the time
interval is longer. IMO those little parsing efforts add up and are tiresome
if you have to do it all day long. Imagine if the books are written that way.

~~~
andreareina
I suspect it's supposed to be popCnt (no "r"), which is short for "population
count", which is the number of set bits in a byte/short/int/long/what-have-
you.

~~~
userbinator
"prop" is a common abbreviation for "property".

~~~
janekm
Kind of a perfect example of the ambiguity in over-shortened names ;) I also
read it as propertyCount (which in my style shouldn't even be a variable...)

------
GnarfGnarf
"[http://www.felienne.com/archives/5452"](http://www.felienne.com/archives/5452")
leads to "Error establishing a database connection".

~~~
briantakita
[https://web.archive.org/web/20161227035000/http://www.felien...](https://web.archive.org/web/20161227035000/http://www.felienne.com/archives/5452)

------
ninjakeyboard
Busted under load. Anyone have a working cache? edit: here she is:
[http://webcache.googleusercontent.com/search?q=cache:http://...](http://webcache.googleusercontent.com/search?q=cache:http://www.felienne.com/archives/5452)

------
mannykannot
Judging programs by looking at the words in isolation is about as effective as
it is for any other form of writing.

------
akkartik
Science? _Click on link._ Oh, they mean _computer_ science. In particular,
empirical "software engineering" studies. That's not science at all. Computer
science is not science, it's math + art + alchemy. And I strongly believe
empirical "software engineering" studies fall under the alchemy category.
(Which is why I can't stand to take those quotes out.)

Take this survey, for example. Any paper that purports to provide naming
recommendations "scientifically" has to explain the Linux Kernel at least on a
level footing with Eclipse or whatever other Java programs they're looking at.
Otherwise it's hard to take them seriously.

In general, programming has just too many variables to be amenable to
scientific analysis at this time.

a) Start with the question of how you define success. Are projects that have
lots of adoption what we should be trying to emulate, even if the insides are
ugly and the project just happened to be first to market, and its users don't
care about how clean its code is?

b) Then there's the question of whether a study is reproducible using other
codebases. Most such studies are not. The ones in OP demonstrably are not as I
have hopefully proved above.

c) Software engineering studies are extremely weak at identifying or
controlling for confounding variables. Most have no such discussion. The ones
that have some mild discussion fail to put into context what variables they
failed to account for, often the reader can come up with a new one with a few
moments of thought. Given this, the conclusions they arrive at are
overreaching, where good science has a tradition of modesty, of under-stating
the applicability of one's conclusions.

Summary: science doesn't just happen when you want it to. A field becomes
scientifically "rationalized" past a certain point in maturity, when a
consensus emerges after many iterations of theory, experiment and argument
between alternatives. It seems unjustifiable to say "science says" something
if you can find opposing hypotheses about it in equivalent journal articles.
Software engineering is early in this life cycle. I suspect it's hindered by
bad axioms, such as that what we do with software today can be termed
"engineering".

------
morbidhawk
> Name suggests boolean but type does not

I've seen something similar with xUnit in C# where an Assert statement returns
a value. `var single = Assert.Single(collection);` It does the assertion and
returns the single item in the collection, which is awkward, but also familiar
(from Linq's Single) and convenient. Here there are tradeoffs and I could care
less about the dogmatic "never do this" complainers. This might break the
'Principle of Least Astonishment' the first time you see it, but after seeing
it once it makes sense and is no longer surprising.

Additionally, in the case of this criteria, where there is a boolean-looking
function that doesn't return boolean, it can often be quickly assumed that
something more than just a boolean needed to be returned.

This also reminds of command-query separation which makes a lot of sense to
follow in most cases ('get' shouldn't change state and state changing methods
like 'save' shouldn't return a value). But there are going to be exceptions to
this rule and there are scenarios where no matter if you return data or not
you might need to know the HTTP status code, etc. Grabbing code and doing
dogmatic research on it doesn't make a whole lot of sense unless a developer
who really knows the code well can justify themselves and discuss the why.

------
guenp
google cache version for those who also encounter a database error:
[http://webcache.googleusercontent.com/search?q=cache:EpuQn8E...](http://webcache.googleusercontent.com/search?q=cache:EpuQn8EBUlMJ:www.felienne.com/archives/5452+&cd=1&hl=en&ct=clnk&gl=dk)

------
lolive
My experience is that poor naming sometimes comes as a consequence of poor
types modeling. Poor naming then makes IRL discussions about the code very
clumbersome. So step 1 of good naming (and good discussions) is good types
modeling. And yes, from that perspective, languages with poor typing suck
BADLY !

------
fallous
Hard sciences also have to deal with naming conventions, but manage to do so
without resorting to the recommendations provided by the parent.

The periodic table for example manages to do quite well with one or two-letter
variables, which then leads to concise equations for chemistry.

~~~
chris_st
Given the size and expansion rate of the Periodic Table, this makes sense.

Programs, on the other hand...

~~~
jstimpfle
Programs might expand (mostly by addition of parts), but not so much the
individual parts (functions, classes, modules, etc) they're made of, which are
the closing scope for most variables.

~~~
jstimpfle
I should not be as concerned about getting downvoted as I am, but I think the
discussion culture in this thread is really mad.

Why am I being downvoted for a civilized and reasonable objection which
refutes the parent's claim? (of course without a counter-argument).

------
ilaksh
What if we apply this context to names used in mathematics? Such as classics
like a single greek letter.

------
mavhc
[http://www.byteshift.de/msg/hungarian-notation-doug-
klunder](http://www.byteshift.de/msg/hungarian-notation-doug-klunder)

