Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Long Names Are Long (2016) (stuffwithstuff.com)
94 points by tosh on Dec 21, 2019 | hide | past | favorite | 41 comments


This touches on one of my least favorite things when reviewing code: "stutter".

A common example is what happens when you have an object that should just be a function (or static in javaland):

    from io import reader
    
    reader = reader.Reader()
    reader.read(file)
Oof. Just

    import io
    io.read(file)
Similarly when you have method names that share information with the class and arguments.

    BufferedReader bufferedReader = new BufferedReader()

    bufferedReader.readBuffer(buffer)
Now languages support stuff to reduce this, like `val`, but still! When designing apis/interfaces, consider what they'll look like to a user and don't force your end users to stutter.


    from io import reader
    
    reader = reader.Reader()
    reader.read(file)
Isn't this why you can say:

  from io import reader as r
And then:

  reader = r.Reader()
  reader.read(file)

?


Sure but you still have `read` 4 times (down from 6), while I have it only once. And that comes at the cost of forcing your users to alias a variable which can be a maintenance burden long term.


Fair enough. Though two of the "read" are "reader" and "Reader" and you can tell them apart easily from the capitalisation. But I don't completely disagree with your point.


I've referred to an aspect of this before with repeated prefixes within a scope as Smurfing, where you have

    SmurfLib::SmurfPtr<SmurfObj> pSmurf
        = new SmurfLib::SmurfObj::CreateSmurf();


My least favorite would be putting the name of the pattern in the class name. E.g. "FooService", "BarRepository", "XInterface". Code is most readable when names describe business and not implementation detail.


either `golint` or `go vet` will complain about stutter of this kind


Unfortunately I don't think this is particularly complete advice, because it totally ignores the question of scope.

Yes, variable names serve to disambiguate. But at what level -- function? File? Component? Business process? There's no clear rule here. And are they disambiguating merely between other variables that have already been defined, or from the wider scope of potential variables that a programmer might likely otherwise accidentally confuse them with?

If I have 20 different functions that all deal internally with only a single "price" variable, it may be legitimately helpful in preventing bugs to explicitly name these "priceWithTax" and "priceWithoutTax", or "priceLocalCurrency" and "priceForeignCurrency", or "priceList" and "priceSold". Or even use "priceWithTax" even if I never once use or define a "priceWithoutTax", because I know from experience that people are going to make a mistake otherwise.

In the end we're all trying to balance brevity with descriptiveness that prevents someone from accidentally misinterpreting and therefore misusing the variable when calling or modifying later on.

Sometimes I agree that yes variable names seem too long... but then I realize it may well be worth the peace of mind knowing that it's reducing chances of mistakes and bugs.


The article is explicitly about object oriented code, which pretty much tells you exactly how scoping will work, and in which contexts you drop parts of a name because they're contextually obvious.

If you have 20 different functions that all internally deal with `price`, then either you're not writing OO code, or each of those functions is in its own object type, providing a clear context on what your code is talking about.


Readers of your code are not reading only a single class's code; when I hear "20 different functions that all internally deal with price" that doesn't mean the state is shared, but the linguistic context of the project is still shared. Just because in the real world we have conversations among small groups ignored people within rooms that encapsulate sound and information doesn't mean that within a large organization we don't find ourselves confused when someone says "price" without indicating what might or might not be included in that price. If you are using terminology inconsistently within the implementations of separate classes--or as the person you are responding to notes even if you aren't but the mistake is common--you could at least provide a glossary (if not use a longer name).


> The article is explicitly about object oriented code

You're wrong, it's absolutely not.

It never once mentions being about objects, and the first half of examples don't involve objects at all.

I don't know where you're getting that from.

And even if it were, it has nothing to do with my point. Different objects could use different standards for a price variable. You claim that objects would provice a "clear context", but if it's not through variable names, then how? Just through comments around the variable names?


From TFA:

"Obviously it’s an object. Everything is an object. That’s kind of what “object-oriented” means".

So ... TFA is perhaps not explicitly, but _implicitly_ about object oriented code?


Well, I would say that goes under the second point made in the article. It doesn't mention scopes directly though.


While I certainly think that naming is important, I don't quite agree with all the improvements suggested here:

> // Bad:

> List<DateTime> holidayDateList;

> Map<Employee, Role> employeeRoleHashMap;

> // Better:

> List<DateTime> holidays;

> Map<Employee, Role> employeeRoles;

I'm not sure this is better. Yes, it's obvious that both are collections, but it's not obvious what you can do with them. Is `holidays` a map of date to information about the holiday? Is `employeeRoles` a set of all possible roles?

And once you remember that `employeeRoles` is a map, you're still a little lost about how to look into it. Is the key the employee id? Email?

Perhaps it could be called employeeToRoleMap? That distinguishes it from employeeIdToRoleMap, or possibleEmployeeRoles.

Maybe this is me disagreeing with a later statement by the author:

> Some people tend to cram everything they know about something into its name. Remember, the name is an identifier: it points you to where it’s defined. It’s not an exhaustive catalog of everything the reader could want to know about the object. The definition does that. The name just gets them there.

I don't know if I want to know everything, but I think I lean towards more than the author does. A variable name I changed today was originally called `initial-points`. It designated the set of intermediate points directly from a source to a destination, which would later be offset randomly (example: https://imgur.com/a/UxkGPfX).

While refactoring related code, I realized that although it was descriptive, it was descriptive _temporally_, rather than describing the identity of the points. I ended up changing it to `points-in-direct-line`. It still feels subpar to me, but at least it's clearer: the points exist not because they'll be changed later (initial-points), but as a direct line between the source and destination.


In a strongly typed language, you shouldn't encode type information in the name, it's statically derivable. Holidays isn't a map, it's a list. The name doesn't need to tell you that, you already know and the language will prevent you from misusing it.

With employees, you might be right. The comments on the original article give two suggestions:

1. `employeesById`. This may imply a mapping type, but you aren't stuttering (saying map twice in the decl) at least.

2. Create an employee ID type and make the type declaration Map<EmployeeId, Employee>. (Where your id type might just bsubclass int) This way the semantic information is encoded into the type, and the type system prevents your from misusing things. For example you'd need to explicitly cast from int to employee ID.


> In a strongly typed language, you shouldn't encode type information in the name, it's statically derivable.

This is completely true!

> The name doesn't need to tell you that, you already know and the language will prevent you from misusing it.

I think this is where we differ! I prefer to know what I can do with something without having to ask a compiler or IDE. This helps, for example, when looking at a pull request -- you don't have the code easily accessible for the compiler or IDE.

Possibly a difference is that I prefer to use languages that are more dynamically typed -- Clojure, Emacs Lisp, etc. And so you don't have `Map<EmployeeId, Employee> employeesById`; you only have the variable name.

But I do wonder -- even in the most explicitly typed, statically typed languages, what is the eser experience for finding this out? It seems that unless one is looking at the declaration, there must be at least one level of indirection to find out what the type of something is.

Say you're looking at a use of the variable `employees`. Here's the ways I can think of that would let you know what the type is:

1. Scan upwards until you find the declaration, if it even is on screen. Then look back to where the use was.

2. Move the cursor to the variable, and use the "go to definition" functionality built into the IDE. Then look at the declaration, and use the "go back" IDE function.

3. Move the cursor to the variable, and the IDE has somewhere that tells you the type. This is relatively simple, but still requires you to move to the variable, and to look somewhere else and back.

On the other hand, with a name like `employeesById`, all you need is in that thirteen characters.


> I think this is where we differ! I prefer to know what I can do with something without having to ask a compiler or IDE. This helps, for example, when looking at a pull request -- you don't have the code easily accessible for the compiler or IDE.

I think, in accordance with "variables won't and constants aren't", I would propose the less pithy: "any invariant that is not enforced is broken". The compiler doesn't check that the capabilities or roles advertised by your variable names are actually present, so eventually they won't be. That means that you can't just check the variable name when reviewing code, must must check the declaration (as well as possibly elsewhere) in case the variable names lies; and, once you're checking that, what have you gained in reviewability?


Nothing can prevent all errors. Having explicit typing does not prevent `List<Users> bankAccount`, but almost all programmers prefer variable names that encode things not in the data types.

So I disagree that you must check that `employeesById` or similar are still a map type if a PR uses it. If it wasn't a map type, I would have expected that to be flagged in the PR that introduced the variable.


> I think this is where we differ! I prefer to know what I can do with something without having to ask a compiler or IDE. This helps, for example, when looking at a pull request -- you don't have the code easily accessible for the compiler or IDE.

Same (I mostly use python, which puts your dynamic languages to shame), so structure your code such that this is possible.

Keep your functions relatively short. Then the declaration of a type, if its a local or an argument, will be close by (usually the same screen).

Globals should be defined at the top, so those are easy too. That leaves only class members that might require spelunking. Be judicious with those, they're often more pain than is worth it.

As for your statement about employeesById not being a map should be called out in review, that's true. But over time those things can change unless enforced. You're not just relying on the initial review, but all follow up changes not breaking the invariants.


It depends on who the readability is for, if it is outside the IDE like on GitHub you might have to explore a project to find that information.


Mostly agree, but I still think we should be using shorter variable names and put all that information and context elsewhere.

Opinion: It's (nigh) 2020. If you aren't using an editor which supports inspection and jump-to-declaration, you are doing it wrong. Similarly, if you are using a dynamic language and not leveraging type hinting or similar, you are doing it wrong. Where "doing it wrong" == doing a disservice to yourself, your coworkers, and future you.

We have so much powerful metadata available right now. So with the examples above, you should be able to focus on `employeeRoles`, hit `hotkey`, and see a gist of the types, key format, what to do with that thing, and why it's there.

Names should be unique and memorable enough that once you form that mental schema, you can quickly scan through code and understand its role in the greater machinations, without too much mental re-parsing.


The benefit of naming things to explain them is to obviate the need for jump-to-declaration. Yes, any time I need to know what 5*6 is, I can walk into the other room and get my calculator -- but if I have that knowledge in my head, I don't need to go look it up.

I think it's relatively agreed-upon that single-character variable names are bad, because they don't tell us enough about what the variable holds. So, too, do I feel about the type, when it's not obvious from context. So `age` would be clear enough that it's numeric^1. But what about the other example of `employees`? I would not know what I can do with that without having to jump back to the definition! Naming it `employeesById` tells you what you can do with it. And that seems like a win to me.

[1] It might not differentiate between integers and floats, but that feels too nitpicky to me to put in the name.


as someone who works on a large c++ project, I disagree, mainly just because it can take a long time for intellisense to actually jump me to the definition. there are also many ways intellisense can get confused and present me with a page of 100 similar function definitions that I need to scroll through. few things break my concentration like waiting 10+ seconds for intellisense to decide whether it's actually going to be able to jump me to the definition.

imo, functions and variables should be named in such a way that the reader can get a pretty good idea of what's going on without jumping into a bunch of other scopes. try not to make them too long, but err on the side of too long rather than too short.


Map<Employee, Role> employeeToRole?


I often use the name pattern `rolesByEmployee` in that situation.


Personally, working in a variety of python and javascript shops, I have seen problematically-short variable names way more often than problematically-long variable names, probably on the order of 10:1.


I've been burned far too many times by ambiguous names. The edict on hashmaps in here is a particular one that has gotten me more than once, so i always name them "ValuesByKey" style. Alike "AccountsByID".

It's a risk/reward thing. Overlong names are annoying. Brief names are destructive. So I always err on the side of length.


This is one of the pros of using a strictly typed language: there errors like these can only go unnoticed if the other object has the same type. And in languages like Rust with a concept of ownership even those might be cought in compilation..

Naming things is hard. But it is a lot easier if you can rely on it not breaking things all the time.


Hashtables full of record-like objects are the usual problem. I often find myself throwing records into a hashtable and then keying it by one of its properties, and that means it's a simple primitive and there may be many of the same type.

If I've a record with a half-dozen int properties that are semantically IDs, I have to clarify which is the key for this hashtable.


This is a symptom of OO languages. You don’t nearly get as much word soup in good functional languages.

People really diss Erlang for its syntax but I find it to be extremely readable and understandable in most cases because pattern matching is a first class citizen in the language. Humans are excellent at recognizing patterns.

Same goes for the ML family.


Interesting observation. Do you have some real world DP codebase examples where one can see how they avoid word soup (without sacrificing understandability)?


FWIW, and definitely somewhat off-topic, when they say code is reviewed "twice," they don't mean necessarily literally. When someone gets "readability" in a language, they can approve changes in that language for readability; if neither the reviewer nor the person writing the code yet have readability in a given language, they will need someone who does to approve it in addition. The idea is to have someone who has pretty good knowledge of the style and conventions used for a given language to have reviewed it.

More information is available on the net if you're curious. A cursory Duck Duck Go leads me here: https://www.pullrequest.com/blog/google-code-review-readabil...


I personally rather like the idea of making type names long but variable names short.

Instead of

    DockableModelessWindow window;
I might write

    DockableModelessWindow dmw;
if this is a local variable whose type is nearby. Using acronyms also help people remember its type.

For global variables, though, it would not be appropriate.


If abbreviating, why 'dmw' and not 'win'? Is the fact that it's dockable and modeless really important enough to use two-thirds of the variable name on a cryptic abbreviation? (And if it is, why not call it by what it is rather than how it's shaped, like 'toolbar' or 'tbar'?)


I agree (at least as far as function/block local variables are concerned), except that I'd probably just name it w. If it's not obvious from context that w stands for window, then the function probably needs to be refactored anyway. That said, dmw is fine if the function needs to deal with more than one type of window.


That's why I like Prolog:

  factorial(0,F,F). 
  factorial(N,A,F) :-  
      N > 0, 
      A1 is N*A, 
      N1 is N -1, 
      factorial(N1,A1,F).
Because 90% of variables in 90% of the Prolog code I've ever read * are named A, B, C, D, H, I, J, K, L, M, N, P, Q, R, S, T, U, V, W, X, Y or Z. Or with an "s" (as in Xs, Ys, Zs...) to denote a list. Or a number to denote a new version (where you'd like to put a "'" but syntax doesn't let you).

Theeere you are. No need to think hard.

:dusts hands:

(Tail-recursive factorial predicate from here: https://www.cpp.edu/~jrfisher/www/prolog_tutorial/2_2.html)

______________

* Including my own code that I've written before reading it.


https://talks.golang.org/2014/names.slide

Very cool presentation by Andrew Gerrand, with the following rule of thumb:

The greater the distance between a name's declaration and its uses, the longer the name should be.


Funny that three are so many typos on a treatise about short precise names.


> We have rightfully abandoned Hungarian notation. Let it go.

Earth to FreeRTOS ...


I had a look at FreeRTOS for a project, and liked the architecture with support for concurrent tasks and all the tooling included in Atmel Studio... I would not let something as superficial as the naming conventions influence my tooling choice, but for sure was left wondering who and why in the world would opt for such awkward and (almost objectively) ugly names for their rtos project.

Seems like if they tried hard to join all the worst things from all worlds: mix your ALLCAPS with lower case, add a pinch of CamelCase mingled with some snake_case, and top it with a confusing blend of semantic + sintactic prefix notation.

Seriously wtf were they thinking about?

EDIT: Adding the first example that I could find, which already shows all of the mentioned "features" (scroll down for the code): https://www.freertos.org/Hardware-independent-RTOS-example.h...


Well, Google must've changed course, seeing as Go made keyboard farts fashionable again. I guess that's where Go's C heritage comes in.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: