Hacker News new | past | comments | ask | show | jobs | submit login
What science says about naming in programming (felienne.com)
93 points by fagnerbrack on Jan 8, 2017 | hide | past | favorite | 117 comments

Naming is related to scope and I'm surprised there is so little about it in the article. While trying to find an universal principle I came up with this rule: "identifier's entropy should be proportional to its scope".

Global variables (if any) are long explicit phrases while inner loop variables can be a single letter.

Without taking scope into account the naming guidelines become dogmatic and impractical trivia.

EDIT: wording

In addition to scope, I think language syntax plays a role and naming is thought of differently in JS vs C#. In C# you could have a method that takes in 2 points: Point p1 and Point p2 parameters. Since you can see the Type is Point in the parameters it doesn't matter that you abbreviated it so much. In JS though you don't specify types, so point1 and point2 (or pointA/pointB) might be better, unless of course the function name is so obvious (like getMidpoint) in which case p1/p2 or a/b naming would still be clear.

I actually really like reading/writing code that uses really small variable names but are still very obvious given it's context (usually never acronyms though, I despise those). I'd always rather working with something small like p1 rather than something very descriptive like sceneOffsetPoint1, especially if I can read the code above it that shows it being assigned based from the scene center point: let p1 = scene.center + Point(2,3);

I've been reading Code Complete recently, and this is exactly what the book recommends for variable names :).

I had no idea the rule is a classic. It's frustrating that with a little more education I would have spared plenty of time spent reinventing the wheel.

I don't shy away from reading programming books and I have close to two decades of programming experience now (including hobby times), and this is the first time I've actually read this advice. It doesn't seem to be widely spread. So I guess the choice of books matters...

I identify OO culture as a possible culprit. OO is not so fond of structuring and scoping mechanisms besides classes.

I always name my variables descriptive names. For example, if I am in a loop and the variable represents a "record" I will name it record rather than r.

It's much more readable and you're less likely to get name clashes with other functions (r could be lots of things (even in the same file) - render, record, row, red, right)

I always go with plural for lists and singular for iteration variables.

  for (let thing of things) {...}

Personally I have a habit of avoiding that, especially in dynamic languages, because a single typo could have nasty consequences there. So I always figure out a synonym, or use a name for the variable that's either more descriptive:

  for (let particularThing of things)
or at least visibly different:

  for (let th of thigns)

The dynamic languages I primarily use are TypeScript and Dart. I always have enough type information floating around to catch this kind of thing. But even in JavaScript it wouldn't be much of an issue since it will instantly explode on the first run, which is certainly inconvenient, but nothing major.

If you name a variable with broad scope "record", though, you deserve hell.

This is the best single guide on naming ever.

why "entropy" and not "length" ?

I guess he means entropy as in probability of uniqueness. Which is related to length.

AbstractLocalObjectFactory is a long name but carries little meaning.

Naming is a social problem, and there's a strong risk of correlation with hidden social variables in any empirical study, while any non-empirical study is suspect because humans are what counts.

In particular the most likely explanation for correlation between bad names and bad code is unskilled or inexperienced programmers. This needs to be controlled for.

My personal observation is that well-factored code can use shorter identifiers because it deals with fewer concerns. Long identifiers frequently indicate muddy thinking that mixes concepts, where the identifiers are burdened with disambiguation.

Further, in languages with particularly verbose identifiers (Java especially), it can get hard to see the wood for the trees: the density of text obscures code and data flow in the program. There's a similar argument to be made about the tradeoff between symbolic operators and textual operators: the more frequently they're used in a domain, the better they're represented by a symbol. Symbols, of course, are especially short identifiers.

I also think that smaller scopes/function bodies help to reduce variable name length. When you write small functions that solve a single problem, the variable names tend to be very concise. This approach helps me try to keep my code under 80 columns, and I feel that it greatly improves the readability.

Regarding the verbose identifiers, I recall the first time I started looking at Objective C (and NextStep I believe) while attempting to get into iPhone/iOS development. I couldn't believe what I was seeing. IIRC there were function names that were damn near 80 characters long. I played around with it and built some simple iphone apps, but I just couldn't take it.

"In particular the most likely explanation for correlation between bad names and bad code is unskilled or inexperienced programmers..."

Totally disagree with this assertion: Laziness and two fingered typing are to blame.

Two fingered typing causes bugs directly? Or it causes bugs because it encourages short names? The article is about the relationship between names and code quality.

Naming is one of the hardest things to get right, I often have to comment on it in code reviews. Long names are usually more lazy than concise ones. The other problem is naming things more specifically than necessary, obscuring genericity of functions.

I disagree with a lot of this, but I'll focus on the worst offender. The rule against identifiers shorter than 8 characters specifically disallows `name', which makes absolutely no sense to me. This particular word says a lot, and it can easily have its context implied with no problems. Example, if we're working on a database that stores people's personal information, it's easy to guess what `name' means. If we're processing HTML forms, again, name has a very clear meaning here. We could also use it to describe syntax (i.e. identifiers, or as an alternate term for JSON-style keys). If I tried for another couple minutes I'm sure I could think of hundreds of use cases for that identifier with a clear context. What I don't understand, though, is the exceptions they list. Such as "c, d, e ... in, inOut, ...". These, on the other hand, have very few contexts where they're at all clear. What happened to the whole stigma against 1-char identifiers? How could a 7-char identifier be any worse than a 1-char identifier? This whole rule seems like nonsense.

Yes, many useful names are short: point, input, output, x, y, z, origin, r, g, b, alpha, color, vector, pi, string, file, buffer, open, close, print, format, copy, move, list, load, run, id, pop, push, stack, context, env, user, draw, update, refresh.

Also, notice how R, G, and B have a very clear meaning when put near "color"? Context is a thing our brain manages very well.

There's also the implicit assumption of underlaying knowledge. I remember writing some graphics routines and looked at some examples, but they always seemed to consist of a mess of single variable names. After reading up on the theory it became clear that they were all the conventional mathematical constants used in the formulas to derive the algorithm. If you were intimate with the math these names would be the obvious choice, yet if you were an outsider they would seem indecypherable. I now prefer to use simple english terms and put the math terms next to them as a comment, rather than the other way around. A caveat is that actual formulas might become a bit lengthy.

Many of those short names are keywords (or pre-defined identifiers) in many programming languages, e.g. string, close, print, copy in Go. Having to use another name instead can also be difficult.

Poetry time:

    (let ((copy (copy-list list)))
      (list copy 'copy 'let))
A 13 syllabic verse (7+6, with caesura)

(but yes, you are right)

FWIW, I never have really found the ambiguity caused by using the same symbol as both a variable and a function to be a readability problem: I think people are generally pretty good at things like noun/verb ambiguity and so this kind of ambiguity sort of "clicks" with us.

What has bitten me many times is accidentally overwriting a builtin function like max or len in python and then scratching my head about the strange errors that result from this.

Overwriting is not so much a problem of naming than one of dynamic updates: some environments do actually display a warning when you change a global binding, which helps.

That helps, but I find that a Lisp-2 like Common Lisp (I.e. A language with different namespace for variables and functions) is a more intuitive programming environment than languages with a single namespace.

I prefer Lisp-2 too.

I'm extremelly well informed about Lisp-1 versus Lisp-2. I chose Lisp-2 for my own Lisp dialect, with special support for working in the Lisp-1 style: there is a way to code with higher-order functions without ever using the dialect's equivalents of the funcall function or function operator. I thereby consider the Lisp-1 vs. -2 question to be a solved problem.

The requirements which drive the preference for either are all valid. I, the designer, have simply taken both requirements and integrated them. Of course, requirements like, "I want to use list as a variable name without shadowing the lisp function" directly conflicts with "I want to reference functions and variables the same way without function or funcall". However the conflict can be confined down to the argument list of an individual form. I.e. "Sure, you can have both, just not in the argument list of the same form."

Incidentally, I've been using TXR a bit for text-processing recently, it's generally been pretty nice once I've figured out what exactly needs to be done to get the matching to work, but the various failure modes for the pattern language aren't very intuitive.

I don't disagree. If debugging that sort of thing were intuitive, we'd all be coding in Prolog. It's like an invisible "if" hidden in every statement which branches elsewhere if there is no match. @(assert) can help; if you know that something that follows must match, you get an exception if it doesn't. I used asserts in the man page checker:


I've gotten the habbit of removing the last vowel in those cases. Not always pretty but when documented it's pretty transparent.

If you're processing an HTML form then 'name' could be the form name attribute, the input element name attribute, the person's name, the person's username, the person's identifier within the application or database, and so on. Try reading some badly written PHP form handling code where the logic and the HTML are in the same script and you see this precise problem.

This is definitely a problem when you mix different semantic concerns in one level of code.

Doctor, it hurts when ...

Surname is only seven letters and it doesn't suffer those problems

It suffers a different problem though: Name implies your given name, possibly including your family name. Surname is much more specific (family name only) and thus not suitable in any case where that's not what you meant.

This is not an easy problem to solve. For every "easy" example of a good name, one readily encounters an easy counterexample that renders the name ambiguous or unfit for a particular purpose.

Names must necessarily include context to be meaningful. If "name" is an unambiguous identifier in your scope and it's obvious what it does, great! Don't fix what isn't broken. But when you catch ambiguity or uncertainty, revisit the code, so future developers have an easier time understanding what you meant.

> It suffers a different problem though: Name implies your given name, possibly including your family name. Surname is much more specific (family name only) and thus not suitable in any case where that's not what you meant.

And this digs out a tangential problem - that often, "name" is as specific as you can get (or at least as specific as you should get) - http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-b....

I like the rule-of-thumb that short identifiers are fine if their context is limited in length - particularly if proper lexical scoping is supported. Humans do context very well, and imposing some arbitrary length limit ignores this.

This is important.

When you're inside a function, dealing with very limited scoped variables, people get that.

People also 'get' attributes of classes and can grasp that context.

You know what's rational - that the 'length' of an array is just called 'length' :)

Anything else would be missing the point.

I would argue that "name" by itself is actually quite a bad name. What is it naming? It should "firstName" or "databaseName", etc.

I have learned over many years that longer, highly readable and descriptive names make life extremely easier in the long run, at the expense of a few extra characters to type.

Surely the context it's in would help you understand whether its referring to a person or a database?

So many unreadable and confusing code bases exist precisely because people made that assumption.

For example, in the case of "name" there are typically many different things that need a "name" in the same context. First name, last name, etc. "name" by itself (and other short names) is honestly asking for lots of headaches in the long run.

The CRUD database tables I make usually have "id" and "name" columns. "name" is always a human-readable descriptive string (few words). No need to have a more specific name for what "name" is.

If you write SQL, you can disambiguate by table name prefix. If you iterate over result sets for example in python, you can easily be more specific on tuple destructuring time ("for personName, carName in ...").

When defining a table "Person", I consider it bad style to prefix all column names with "person". Btw this overzealous disambiguation falls flat as soon you do a self-join.

A DB has a lot more obvious and strict context than a file of source code, and the law of demeter typically doesn't affect DB contexts as much as it does code, in practice.

In real code, it's unwise to expect the reader to study context just to parse a name.

Are you agreeing with me here? I think so but can't tell for sure...

I'm saying that a DB's organization is not the same as that of source code, and that surrounding context is a fundamental part of working with a DB and reading its layout, so perhaps the naming rules can be relaxed a bit in some cases.

>The rule against identifiers shorter than 8 characters specifically disallows `name', which makes absolutely no sense to me. This particular word says a lot, and it can easily have its context implied with no problems. Example, if we're working on a database that stores people's personal information, it's easy to guess what `name' means.

Is it "first name", "last name" or "full name"?

'name' is the one worst variable names possible. Its like calling your program 'program', your function 'function'. There is always a better choice.

Depends on the context. If other vars are 'addess' and 'phoneNumber', 'name' looks pretty reasonable.


Is it a person's full name? legal name? first name? last name? nick name? desired call name? a company name? the name (alias) of the stored address?

If you're digging into that specific meaning of the word "name" (as a property of a person), you're likely doing something wrong:


> If you're digging into that specific meaning of the word "name", you're likely doing something wrong:

If you're not you're definitely doing something wrong.

> (as a property of a person)

The original comment does not specify that it's "a property of a person". It could be a company, it could be a non-specific delivery address, etc...

legal name in which country? name order varies between UK and China. What's a "call"? Whose company? what kind of address?

Following this line of thinking, communication is impossible because everything is ambiguous.

> Following this line of thinking, communication is impossible because everything is ambiguous.

And making everything as ambiguous as possible makes communication easier… how?

When I taught programming to freshmen, the example I used to use when they called variables "number" or "string" was: "that's like calling your baby 'baby'."

People do that and in many situations it's the most suitable choice.

Why would anyone downvote this? Without any comments? Did you never say or hear "the baby is crying"?

I will now end my concern about being downvoted and get a life. The discussion culture in this thread is mad.

I didn't downvote, but what I really meant is "giving your baby the name 'Baby'".

Very fair - thanks for the update. Maybe it helps to think not as giving variables fixed names (like Luke or Lisa) but seeing them as data accessed by descriptors that describe their qualities (the ones that matter in the current context).

Btw humans also have many names, depending on context. You might have had funny nicknames in your childhood, your friends speak to you by your first name, other might speak to you by your last name, others might call you "darling" etc... likewise it's also not totally uncommon to speak to babies by the handle "Baby", like "how are you doing Baby?".

What we need is better tooling. First of all, stop pretending that structured (one of world's most complex structured) description of commands and expressions is "just a text". If programming environments stored sources as structured items, it would be easy to give an entity many names (e.g. human description, full qualified name, short name in current module, and shortest scope/mind-local alias). And then switch modes from 'first time reading' to 'familiar development' (or 'lawyer style', where descriptive names appear only in appropriate places and assertions are visible). Similar issues we have with formatting, ridiculous tabs-vs-spaces wars, encodings, even with line endings.

Try to open raw html/css instead of using browser to view modern website. Hard to get the picture? Yeah, and that's where we are in programming today.

This isn't a million miles from what you get with Javadoc + Eclipse, where if you hover over an item you see the snippet of Javadoc attached to it where it was defined (i.e., in some other sourcefile, or jar).

A few key points that make this useful:

The documentation snippets are attached to specific methods, parameters, classes, etc - like in Javadoc or Doxygen. You don't have to wade through stuff you care about to find out about the the thing you care about (under your mouse).

upvote for the interesting idea. To your knowledge is there any IDE out there with such a feature?

Nope. At least my annual efforts to dig into that direction always fail. It seems, that there is something: literal programming, html-based prolog posted here recently (someone please help to recall its name). But it is not exactly that. And it always contains esoterics under the hood. I want C, C++, Perl, Java, anything mainstream; not Prolog.

I'm too busy and too single to make prototype on my own (and more related ideas on top of that in thoughts.txt). I've posted this in forum comments for years, and sometimes got critics that made me temporarily think it is just another view on existing techniques (as of company-wide, not person-wide). Now I run the company and this got only more demand.

Honestly, maybe I'm just doing existing things wrong, idk.

Word 'Jetbrains' in this list looks very promising, I'll definitely give it a try on spare time. Thanks!

The html based prolog thing you mention is probably eve.

I could see this being an eclipse or notepad++ plug in.

Closest aspects I saw that touch this idea:

- automatic refactors in Java IDEs, especially "rename" - they understand what you mean by "this variable" or "this function", at least most of the time

- Paredit mode in Emacs, for working with Lisp code; if you learn it, you start feeling you're actually working with a tree, and not its textual representation

Charles Simonyi has been trying for over 20 years


http://www.intentsoft.com/us/ <-- in "stealth mode" for 10 years!

The article cites some scientific papers, but I'm quite skeptical about them. How do they define "good" and "bad" identifiers? In the end, what's a good naming convention for one person or actor may be not good for another. For a human actor `getBase` is an undoubtedly better identifier than `sdfjkjlfsdkjfdskjsdfkjfs`; for a computer, though, there is no much difference between both. For somebody from the UK `getBehaviour` is better than `getBehavior`; for an American it's vice versa. Those papers take some arbitrary naming convention and postulate that it's either "good" or "bad". Postulating something unless absolutely necessary is not how science is supposed to work.

DISCLAIMER: I may be wrong because my judgment of those scientific papers is based only on this article. If they do define "good" and "bad" in some meaningful way, then it's the article's fault for not pointing it out.

That's not how references work in scientific articles: you're expected to read the referenced papers (or at least the abstract) & understand the implications/conclusions, or get someone else to summarise it for you.

>The article cites some scientific papers, but I'm quite skeptical about them. How do they define "good" and "bad" identifiers?

Wouldn't that be answered in the content of those papers?

hn hug of death. I tend to use shorter names, but maybe because most of my code isn't very complex. Sometimes I just change the spelling if I want a similar but new variable.

In general, programming language does not influence the quality of names, although Java code tends to exhibit better naming, perhaps due to the early popularity of Java coding style guidelines.

I completely disagree, having actually worked with a lot of Java once. The language in general is extremely verbose, and the names make it worse. What saddens me is that it seems a lot of the "better naming" discussed here is the complete opposite of what I'd consider good --- I mean, one of the guidelines listed there is "Identifiers should not consist of fewer than eight characters, with the exception of {short list containing very few dictionary words}"!? In the (not Java) code I work with, the majority of identifiers, i.e. locals, are below 8 characters.

IMHO identifiers should be short and memorable, preferably even having a unique and equally memorable pronounciation (I think "strlen", given as a "flawed" identifier in one of the tables, is an example of a good one, as are the others in the C library), but it seems a lot of "modern" code is going in the opposite direction. "Long and descriptive" identifiers may seem more "friendly" at first, but I feel it gets rather tiring after having to read plenty of code in that style --- especially if many of the identifiers share a common prefix, forcing you to read through the extra noise but having to compare to ensure they're actually the same.

Thus there appears to be two distinct style groups, one with the long and verbose "dictionary word" names and traditionally represented by Java and C#. This group is spreading, I think partly because of the "ease of entry" noted above. The other group uses the short, succint, abbreviated naming traditionally represented by C, Asm, sh, awk, and most of the existing Unix conventions. (As another example to contrast sh, PowerShell is a member of the former group.) This article seems to say the former is better than the latter. I am not convinced.

Some examples for concreteness:

Former style: https://github.com/dotnet/roslyn/blob/master/src/Compilers/C...

Latter style: http://v6.cuzuco.com/v6.pdf

Edit: Another example of the latter style, from something more vaguely related to the example in the former style: https://news.ycombinator.com/item?id=5748672

Maybe "good" is relative? Good for whom? For the active developers that work with the same identifiers frequently for a long time, the short names are convenient and sufficient. All abbreviations are obvious to them, and all contextual implications are clear.

For new developers or developers that frequently switch between working with many different environments the short names can be a huge problem.

"strlen" is not the biggest problem; you can look up what that means, as it's well documented. The problem is much worse when short names and abbreviations are introduced ad hoc without sufficient contextual clues.

Actually bothering to document abbreviations helps. Does "proc" mean procedure, process, processor, ...?

    double chnopsnrg;
    double chnopsEnergy;
    double carbonHydrogenNitrogenOxygenPhosphorusSulfurEnergyInKiloJoulePerGram;
    double chnopsnrg; // C.H.N.O.P.S. (Carbon, Hydrogen, Nitrogen, Oxygen, Phosphorus, Sulfur) Energy in kJ/g
I'll take the 4. over the others, but anything over the 1.

Also scope matters. In local scope, names can (and should) be shorter. The context is clear.

    let fooBar = ... where fooBarPath = .., foorBarFile = File(fooBarPath)
    let fooBar = ... where path = ..., f = File(path)
Public global names are a different matter.

Also tooling matters. In old shells and editors without auto-completion long names of course quickly get annoying. With auto-completion and highlighting of identical words this is less problematic. On the other hand a good "go-to-definition" or "peek-documentation" feature makes short (documented) names less problematic.

I give credit to PowerShell to at least attempt to balance the two aspects, as it supports both long and short names for most things. The tooling aspect seems to have been ignored though.

With auto-completion and highlighting of identical words this is less problematic.

Autocomplete only helps when writing, not reading; and it doesn't solve the problem of long names which have the same or similar prefices, obscuring the important differences. Long names still require reading. Here's a commmon pattern in Java and C# code:

    FooBarBazIntervalFactory fooBarBazIntervalFactory = new FooBarBazIntervalFactory(...);
    fooBarBazIntervalFactoryType = fooBarBazIntervalFactory.GetFooBarBazIntervalFactoryType();
    fooBarBazIntervalFactorySize = fooBarBazIntervalFactory.GetFooBarBazIntervalFactoryType();
I prefer something more like

    FooBarBazIntervalFactory fbbif = new FooBarBazIntervalFactory(...);
    fbbifType = fbbif.type();
    fbbifSize = fbbif.size();

As I said, highlighting of identical words helps a lot with that. Example: https://code.visualstudio.com/images/language-support_docume...

Also, I always wondered if something like this would work well: https://github.com/ankurdave/color-identifiers-mode

IMHO something is broken with these guidelines and the IDE's we have. Isn't this a contradiction? On the one hand it is argued that long variable names are needed to improve readability. On the other hand tools are needed to read the resulting mess.

I've had to develop team codebases in both Eclipse and Visual Studio, and the experience was not good. Not only were the IDEs slow and buggy, but also the code (which followed the long names cult) was terrible to work with. No cohesion, no separation of concerns.

Personally, most of the time I'm happy to use vim without any plugins. Most time is not spent typing, but thinking about factorization of the program. Depending on scope I mostly use short variables. I don't expect to be able to dive into code without understanding the context -- the idea that this is possible is a misconception.

But it could very well be that the people arguing for long names do e.g. business code (where concepts might or might not be less clearly scoped and cohesive) and people arguing for short names do e.g. more algorithmic code. Without proper scope (ha!) this discussion is worthless.

There is a copy-paste error ;-)

I use probably 70% Java in a given week, there was a time when I was 100% .NET, and there have been long stretches where the majority was overwhelmingly C, Node, Python, PHP. This is a long way of saying I've spent a lot of time floating around some very stylistically different languages so I don't really adhere to one dogma over another and I've grown to write code that mostly kind of/sort of feels the same between languages. Naming convention-wise at least.

Java/.NET naming verbosity is great when I'm trying to figure out what the heck I'm receiving without tracking down the class definition. The thing is, you're only willing to be that verbose because you've got autocomplete in an IDE so peeking into that long object/class/parameter name is only a click away.

The naming conventions like OncePerRequestHttpWebHandlerSecurityContextAwareInterceptorFactoryImpl are obsurd, but there's no reason you can't come to a more concise descriptive name. Frankly, if you're implementing the interface you can call it whatever you want.

On the other hand I've seen some C and JS code that I could SWEAR was run through an obfuscator before I got to it, but no, the devs refused to use more than about three letters max. This is not helpful.

Names like strlen, idx, propCnt are concise AND descriptive. I generally like to put units on things if they're some kind of measurement so I typically do stuff like intvlMS, distKM.

There's nothing wrong with being descriptive, but you can really do a disservice to those (including yourself) coming after you when it becomes very hard to fit the mental model.

English is not my native language, and it took some time for me to actually parse those short variable names you provided as examples. Though, I still don't know what propCnt is supposed to mean. But even if I did, it takes some cognitive effort to parse shortened names. I think that's the case with native speakers as well, but it's more noticeable at non-natives because the time interval is longer. IMO those little parsing efforts add up and are tiresome if you have to do it all day long. Imagine if the books are written that way.

You're not taking something into account with the shortened names: they are common and have consistent meaning. So after a few exposures it becomes trivial to read.

I find it frustrating. Why not make it trivial to read the first time? I didn't get that intvl was interval until I read the reply. Why not just write interval or propertyCount? It doesn't make it much longer, and it's much easier to read.

Long names only becomes a problem when you're concatenating more than three words for one variable. And that's really the problem, the number of words in a variable, not its character lenght.

You can answer it yourself, but here goes: Because the trivial investment you have to make to learn these ubiquitous abbreviations might (no general judgement made) be outweighed by the benefit of better readability once you made that investment.

"propertyCount" is 6 characters longer than "propCnt". In arithmetic expressions this can easily amount to 12 or 18 characters saved.

I suggest you point out a specific example and show how you would do it in a better way.

I suspect it's supposed to be popCnt (no "r"), which is short for "population count", which is the number of set bits in a byte/short/int/long/what-have-you.

That one specifically strikes me as a counter-example if anything, you don't lose anything by adding two more characters and calling it popCount, and it's significantly easier to read and understand.

"prop" is a common abbreviation for "property".

Kind of a perfect example of the ambiguity in over-shortened names ;) I also read it as propertyCount (which in my style shouldn't even be a variable...)

I'm going to second that "intvlMS" is terrible. I didn't confuse it with "intval", but it's like making people play Wheel of Fortune with your variable names. It takes so long to read. Even if you called it "intervalMS", that is not how you pluralize things (which would be "Ms") or how you abbreviate meters ("m"). It could be "intervalMicroSeconds". Just call it "intervalMeters" and save everyone from parsing your "concise" hand-minified code.

That was probably a bad off the cuff example, my point was that there's a happy medium between concise and verbose, and I think we're better off erring on the side of verbose.

"strlen" is a bad name partly because it's not immediately obvious what is meant, and partly because it contains the name if the type.

One if the things I think Java actually does well is to provide good namespacing facilities for methods (eg. classes). That allows you to omit the type in method names.

C: strlen(x)

Java: x.length()

But strlen is not a method. Why are you arguing? Besides, it's a very good name for a free-standing function.

That it's a free-standing function is no excuse, generic functions have been a thing for decades, likewise namespaces.

It's a C function. C doesn't have generic functions.

C's a shitty language, nothing new there, that doesn't make "strlen" any good it just means you get a historical baggage obsessed with nonsensically short names (hello creat(2), fcntl(2), fgetwln(3), wcsftime(3) or strpbrk(3)) and the additional constraints of a namespace-less language with no generics.

You are not even disagreeing with me.

> You are not even disagreeing with me.

You apparently aren't reading my comments. The first part of the comment is specifically about that, the naming has nothing to do with C's constraint and everything to do with the community surrounding it. While I am loathe to praise WinAPI, its designers did at least demonstrate that you can use descriptive symbol names in a C API.

It depends heavily on the type of code. C code with complex bit manipulations is different than C# or Java business code.

But as a general rule I like to think that if there are many variables whose purpose is so hard to remember that their names must be super descriptive, then the problem decomposition is probably lacking.

I also think that bad decomposition is often a consequence of OO style (where concerns tend to be intermingled rather than separated because objects tend to be trashcans for everything "related" to ill-defined concepts).

A good decomposition leads to source files with distinct concepts that lend themselves to short and memorable (but not overly descriptive) names within their limited scope.

Example: The following code has a very clearly defined scope and implements a data transformation with 5 distinct concepts (cols, rows, objs, spec, database). Incidentally these show up in just about every function of the file.


(I haven't bothered to document these concepts yet. Assuming documentation I'm convinced 4-letter variable names are just right, and repeating the documentation at every usage location is wrong).

I'm not sure if the examples support your case :)

The 'former style' is instantly readable code. You can see roughly what is going on and make a mental picture of it.

The 'later style' reads like Chinese. Everything requires a lot of context.

I should point out a major difference: in Java etc. you're often dealing with abstractions that have allegories in the real world, i.e. 'isEditable'. Well, that means something can be edited or not, obviously. But in C, you might be dealing with so many low-level things like memory addresses, pointers, complex data structures etc..

Software has to be maintained, and I'd personally rather have a little dose of 'tedious' than to have to decrypt some byzantine names every time I glance at some code.

I wonder what most people think about this?

Perhaps combine the benefits of both the 'former style' and 'later style' into one -- use actual Chinese so it's still readable by most of the world's programmers (who live in China) but also utilizes the short context-required names (e.g. two characters '会编' instead of 10 'isEditable').

To me, naming has two competing and somewhat contradictory goals. Names need to (1) descriptive and (2) short and memorable.

Often people who try to write real "good" code overdo the descriptiveness and end up with technically perfectly precise names that are awful to read and remember.

Learning how to find a short, yet descriptive name for things is real hard, and very important. It's the closest I come to being a poet in my work.

The third goal I do these days is that names should be unique, if possible. When I search the project, it's best if I don't find several things with the same name.

"http://www.felienne.com/archives/5452" leads to "Error establishing a database connection".

Busted under load. Anyone have a working cache? edit: here she is: http://webcache.googleusercontent.com/search?q=cache:http://...

Judging programs by looking at the words in isolation is about as effective as it is for any other form of writing.

Science? Click on link. Oh, they mean _computer_ science. In particular, empirical "software engineering" studies. That's not science at all. Computer science is not science, it's math + art + alchemy. And I strongly believe empirical "software engineering" studies fall under the alchemy category. (Which is why I can't stand to take those quotes out.)

Take this survey, for example. Any paper that purports to provide naming recommendations "scientifically" has to explain the Linux Kernel at least on a level footing with Eclipse or whatever other Java programs they're looking at. Otherwise it's hard to take them seriously.

In general, programming has just too many variables to be amenable to scientific analysis at this time.

a) Start with the question of how you define success. Are projects that have lots of adoption what we should be trying to emulate, even if the insides are ugly and the project just happened to be first to market, and its users don't care about how clean its code is?

b) Then there's the question of whether a study is reproducible using other codebases. Most such studies are not. The ones in OP demonstrably are not as I have hopefully proved above.

c) Software engineering studies are extremely weak at identifying or controlling for confounding variables. Most have no such discussion. The ones that have some mild discussion fail to put into context what variables they failed to account for, often the reader can come up with a new one with a few moments of thought. Given this, the conclusions they arrive at are overreaching, where good science has a tradition of modesty, of under-stating the applicability of one's conclusions.

Summary: science doesn't just happen when you want it to. A field becomes scientifically "rationalized" past a certain point in maturity, when a consensus emerges after many iterations of theory, experiment and argument between alternatives. It seems unjustifiable to say "science says" something if you can find opposing hypotheses about it in equivalent journal articles. Software engineering is early in this life cycle. I suspect it's hindered by bad axioms, such as that what we do with software today can be termed "engineering".

> Name suggests boolean but type does not

I've seen something similar with xUnit in C# where an Assert statement returns a value. `var single = Assert.Single(collection);` It does the assertion and returns the single item in the collection, which is awkward, but also familiar (from Linq's Single) and convenient. Here there are tradeoffs and I could care less about the dogmatic "never do this" complainers. This might break the 'Principle of Least Astonishment' the first time you see it, but after seeing it once it makes sense and is no longer surprising.

Additionally, in the case of this criteria, where there is a boolean-looking function that doesn't return boolean, it can often be quickly assumed that something more than just a boolean needed to be returned.

This also reminds of command-query separation which makes a lot of sense to follow in most cases ('get' shouldn't change state and state changing methods like 'save' shouldn't return a value). But there are going to be exceptions to this rule and there are scenarios where no matter if you return data or not you might need to know the HTTP status code, etc. Grabbing code and doing dogmatic research on it doesn't make a whole lot of sense unless a developer who really knows the code well can justify themselves and discuss the why.

google cache version for those who also encounter a database error: http://webcache.googleusercontent.com/search?q=cache:EpuQn8E...

My experience is that poor naming sometimes comes as a consequence of poor types modeling. Poor naming then makes IRL discussions about the code very clumbersome. So step 1 of good naming (and good discussions) is good types modeling. And yes, from that perspective, languages with poor typing suck BADLY !

Hard sciences also have to deal with naming conventions, but manage to do so without resorting to the recommendations provided by the parent.

The periodic table for example manages to do quite well with one or two-letter variables, which then leads to concise equations for chemistry.

Given the size and expansion rate of the Periodic Table, this makes sense.

Programs, on the other hand...

Programs might expand (mostly by addition of parts), but not so much the individual parts (functions, classes, modules, etc) they're made of, which are the closing scope for most variables.

I should not be as concerned about getting downvoted as I am, but I think the discussion culture in this thread is really mad.

Why am I being downvoted for a civilized and reasonable objection which refutes the parent's claim? (of course without a counter-argument).

What if we apply this context to names used in mathematics? Such as classics like a single greek letter.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact