Hacker News new | past | comments | ask | show | jobs | submit login
Programming: Choosing the right name is everything (sites.google.com)
84 points by clawrencewenham on Sept 16, 2009 | hide | past | favorite | 58 comments

  "There are only two hard problems in Computer Science:
   cache invalidation and naming things."

   -- Phil Karlton
The article is absolutely touching a worthwhile subject but imho an important tip is missing: Don't be afraid of renaming things later. Modern editors make mass-renaming easy and it's almost impossible to get everything right from the start.

Modern editors make mass-renaming easy

I'm in Java-land where we have, bar none, the best refactoring support in the world.

Unless you consider the settings files, documentation, publicly exposed APIs, existing customer installations, projects dependent on the current project, web services, Apache configurations, and XML soup.

But yeah, other than that, renaming something is a simple matter of alt-shift-R and typing the new name.

- Documentation: Is yours not auto-generated? We document our code pretty extensively here and have never had a renaming problem. Use the right tools and code flux won't ever bother you.

- Exposed APIs: I don't think the OP was talking about public APIs, which are admittedly much harder to change after the fact. There is plenty of code that is still easy to refactor underneath. That's why we have APIs, right?

- Existing installations: Barring API changes, what changes internally would affect your customers?

Renaming stuff is still really simple.

Is Java renaming better than Objective C renaming? I would imagine few people can answer that question with any authority.

Java/Mac programmers out there?

The expensive IntelliJ editor makes refactoring a non-event.

Distributed algorithms that have to tolerate random machine failure and network partitions are incomparably harder than either of those.

I'm pretty sure that quote is a bit old.

It's definitely true that one of the largest problems facing computer science today is how to parallelize across cores/systems/networks. This problem may only be particularly hard because we're still used to thinking in terms of discrete processors each doing discrete tasks in sequence.

It's also somewhat tongue in cheek.

You're right, I think I'll put that in.

I learned about 7 years ago that "choosing the right name" for things was slowing me down. I was breaking flow every 2 minutes or so to come up with function names. It was like an unintentional form of procrastination.

It was a hard habit to break, but now I just write a first draft, and go fix the names afterwards if I have to.

I disagree. I figure if I can't come up with a meaningful and reasonable name, then my understanding of the problem is incomplete. That is, if I've never named something before, that's because I've never had to think about it explicitly before. I'd rather sit and make sure my understanding is correct than plowing on through.

My understanding of the problem is always incomplete when I start writing, even if my lizard brain is telling me I the opposite. So why waste time?

When my understanding of the problem is incomplete, I like to go through and just write functions and data structures first. It builds the high-level structure of the program before I get bogged down by details, and its easier to change the way it's structured when you only have function definitions than when you have half the functions written and realize you should have organized your data structures differently. When I write like this function names are pretty obvious because each function is named after exactly what it's supposed to do.

If you think about it, this is basically outlining. You know, that thing your English teacher always bugged you to do because it would make your essays more coherent. If one insists on writing code before understanding the problem, this is probably one of the better ways to do it. Recognizing that function names are obviously correct because you know exactly what their purpose is in relation to the whole is a good sign.

When I first started programming I wrote a lot of code without much planning, and it was good because it allowed me to learn a language faster by interacting with it. When what should have been pretty straightforward programs started to get unwieldy, I realized that I had to do something different. Now I go at it with pencil and paper before writing any code, and I find it much more productive and fun because the level of understanding is better.

It's so easy to write code and it seems like it's so easy to change, but with any non-trivial problem I think that this is really an illusion. There are many times when it is useful to write exploratory code without much planning, but I think it's mostly useful for learning a new environment or banging out a minor variation of something you've done before. Otherwise it seems more useful to do some fairly systematic planning, even if it's for a prototype that may get completely reworked over a number of iterations where requirements are explored and may change.

I think you're right that it's basically outlining.

As a practice, it started for me when I first started working on writing some literate code. I would basically write a paragraph about what the function should do, then write up the function definition, then move on to the next program.

It has definitely made my code more coherent and feels like a necessity for anything more than a few hundred lines.

Lots of stuff I write I still just dive in and code because it's "obviously" simple. Sometimes that simple problem grows into a mess of 1k+ LoC, sometimes not.

Because I want to make sure I understand the implications of what I've written so far.

I would agree, and add to this comment. If I can't figure out what the sensible name is then what exactly is it that I am writing? I have found that writing the wrong name even for a local private variable will often get me at least a little confused about what I am doing because it doesn't read correctly.

I'm someone else who renames things after writing the code to start with. It's not that my understanding is incomplete - it's more like I'm using different parts of my brain for holding the general shape of the code and coming up with words to explain it.

If I'm trying to get the shape of the code down before I lose it, then I'm thinking about what it does and how it does it. When I'm naming variables etc I'm thinking about explaining what it does in a way that's clear to me now, me later, and whoever else has to understand / maintain my code.

Yeah, that's what I was trying to say.

This is a common problem that I have heard many folks describe with tools like PowerPoint and Word Processing software. People tend to putz around with formatting and layout and get distracted from the actual content.

This even goes back to tools as early as the hypertext tool Notecards. I remember they had a qualitative study where they found that people got really sidetracked trying to think of the name for a Notecard. I think the title of the paper with the study was "Reflections on NoteCards: seven issues for the next generation of hypermedia systems". I can't seem to find an electronic copy online.

Anyway, I have found a similar thing with the naming in my own programming. It is very hard to break habits with stuff like this. For example, for writing tasks, I finally had to switch to something like Notepad to write the first draft so I wouldn't be tempted to muck with formatting, spell check, grammar check, etc.

Totally agree with this point. I understand the importance of naming and so I try to put some though into it, only to realize I’ve lost my focus on the initial problem. Now I get a first draft down and then go through and rename methods & variables appropriately. I usually end up breaking apart classes and methods on my revision.

Not to detract from the actual point of the article, which is extremely good, but this gem popped out at me:

"Creating a root namespace named after your corporation will haunt you. Companies and brands get renamed, acquired [...] Oracle's Vending Machine division was once Sun Microsystems"

CamelCase in the DataBase? Insane in the brain!

Pushing type information and other metadata into names is bad software engineering. Names should be from the problem domain and not co-mingled with implementation concerns.

The database section is so wrong!

(1) "driverLicenses": it's a "driver's license", you don't want to put an apostrophe in the table name, so go with driversLicenses

(2) SQL's not real good with case-sensitivity, use underbars to separate words in SQL, so DRIVERS_LICENSES

(3) Since you're going to use the table name in your query anyway, there's no reason to repeat it in the column names; naming your tables and columns consistently makes maintenance and automation easier: ID not driverLicense_id If you really, really want to push metadata in there, do it as a rename so you don't step on everyone else's toes: SELECT id as myReallyLongAndUnnecessaryTablePrefixedIdentity_id…

(4) Pluralization of tables stalls your mental cache especially in combination with the table-name-in-the-column-name idea: SELECT driverLicense_id from driverLicenses, SELECT medium_id from Media? Some nouns have multiple valid plurals (do I select from the Persons or People table to find a Person?), some nouns are already collective, etc.

(5) Prefix views with "v_": uh yeah, because its much easier to rewrite queries and mappings if a view is replaced with a table than it is to type, say in postgres, "\dv".

(6) "Use a postfix to show the kind of key": uh no, use the database's type system to show the kind of key through the use of types, domains and check constraints.


_skidmarks for private fields? uh, if that's what your language does, fine, otherwise how about using the language's built-in visibility keywords?

StudlyCaps for method names and class properties: better to stick to the language's style guides, teaching a junior programmer to write [myDocument SendToHNews: dvmby_timeStamp] or myDocument.SendToHNews(dvmby_timeStamp) is giving them some unlearnin' to do at their next job.

"Don't make them singular because you have an ancient ORM", this is a straw-man argument that applies equally to all his naming advice as well: if you have a good ORM you can call tables "quetzalcoatl" and remap to "MesoamericanDeity". Use appropriate names in the schema and object model and let the ORM sort-out the differences.

"Oracle's Vending Machine division was once Sun Microsystems" that's funny!

1) Fair enough, up to the designer's taste

2) Do you mean "not real good with case-sensitivity" or just "case insensitive"? If the latter database still preserves the case, then the underscore on an all-caps name is even uglier.

3) driversLicenses.id is not the same thing as vehicles.id. The point is to name the column what it is so the meaning is preserved everywhere. If you don't use aliases then you can't distinguish them in the result set. If you do use aliases then an app that reads and updates has to now remember that the alias isn't its real name.

4) I haven't noticed the mental cache problem myself, sorry. But a table called "vehicle" isn't a vehicle. I don't believe the issues with "person" vs. "persons" or even "sheep" vs. "sheeps" is worth the mental cache problem of remembering if you're dealing with a row or the entire table in the context of the application.

5) Conceded

6) The point of the article is about using naming to improve clarity, including where metadata like domains and check constraints aren't visible, such as printouts. The postfixes don't even encode type, but information about how the value is defined. A books table with a "ddc" column isn't hurt by adding "_class", but you're also communicating that the value comes from an international standard classification system (Dewey Decimal). I don't know of a database system that can encode this meaning in types, domains or check constraints.

On critique of _skidmarks and StudlyCaps, I guess I should have emphasized the concept of coding to what the toolchain and environment's conventions suggests, such as putting "When In Rome, Always Do As Romans Do" in boldcase at the top of the sidebar. Like I had.

(1) is a ticky-tack foul, just pointed it out because "Choosing the right name is everything" is pretty hyperbolic and I was being hyperbolic, too, with the "insane" bit.

(2) I'm biased toward _-naming in general, I think it's less ugly than camelCase. But my point (2) is just from experience: developing on MySQL on a Mac (case-insensitive filesystem, case-insensitive naming by default) and Linux (case-sensitive) and porting from MySQL to Oracle. Under-barring is least-common-denominator approach that's likely to save some time and frustration.

(3) You'll have name-collided columns in the result set regardless of what they represent. PERSON and PET may both have a nickname column, when you join pets and their owners, you're going to need to qualify or alias nickname.

(4) You're right, a table called "vehicle" isn't a vehicle. But I don't think of it as a list of vehicles either, I think of the table definition as being SQL approximation of the definition of "vehicle-ness", each tuple is a vehicle and a select from the table will get you a list of vehicles, but the definition is "what it means to be a vehicle" approximated in SQL as CREATE TABLE VEHICLE (...)". This helps to maintain the "relational mindset" and avoid slipping into the VisiCalc mindset when dealing with relationships between relations.

(6) You're right on about the importance of clarity, but I think clarity improves with concision. Qualified names are clearer in this respect than names overloaded with metadata. I realize that a lot of databases don't have good support for defining types and domains (postgresql seems to, though), when there are ways to encode the metadata you want without polluting the name space use them.

You're right on about the prominence of "When In Rome…", sorry, it didn't even reach my consciousness.

4) I think of the table name as describing the tuples it contains, not the table itself. SQL reads better that way.

Regarding (1): Pull out the card and look at it. It says "driver license".

Mine says "Driver's License"

"If your table is called "driverLicenses" and needs an ID column for its primary key, then call it "driverLicense_id" instead of just "id". This will make the origin of the column clearer in result sets"

I really would rather type in "id" when I writing a SQL query against my data, than, "driverLicense_id".

Naming isn't about what you'd rather type now, but what you'd rather read later.

When you have to join two tables, both with an unrelated field called "id", you'll regret that.

Or when you have to join a foreign key, and you end up with two fields both called "id" in the same table - and don't ever give a foreign key a different name in the parent and child tables.

And even if you say: I'll just prefix the column name with the table name in the query, remember that when your app gets the field, it'll just be called "id", and you'll be tempted to alias it just so you know which id field it is, at which point you'll realize that you should have called it something else from the start.

When you join those tables, you won't regret it. You may have to think about it for a moment; if you need the ids in the join they're there: SELECT A.id FROM A JOIN B ON A.project = B.project; this is concise and makes sense.

When you do a self-join, the table-name-prefixed-column-names strategy doesn't buy you anything anyway, you'll have to explicitly qualify or rename the column(s) regardless.

Don't name foreign keys the same as the referenced column, name them to make the represented relationship clear, if you have a parent-child relationship, name the column "parent" (or if you must, "parent_id"; redundant if every table has a synthetic primary key named "id").

And when you do: SELECT A.id, B.id FROM A JOIN B ON A.project = B.project

How are you going to know which id field your app has?

And there are a lot more relationships than just a parent/child. You often has a table with many child keys, each to a different attribute.

I should tell you, I used to do like you - I would name all the primary keys ID, and the foreign keys tablename_id.

I've learned it's not actually a good way to structure a database. It's much much clearer is parent/child columns have the same name. You'll see it once you start making complicated databases.

"How are you going to know which id field your app has?"

Depends on how your interface to the data-layer layer behaves. If you're going right into an associative array, you may get "A.id" and "B.id" for free or you may have to explicitly rename A.id and B.id, e.g., "SELECT A.id as A_id...". That's not a consequence of using "id" as a column name, you'd have the same problem with any other name collision in a join unless you named every column databaseName_schemaName_tableName_columnName and you'd still have the problem in a self-join.

"name all ... the foreign keys tablename_id."

Sorry, I wasn't clear, I'm saying don't do that! Use a name that describes the relationship between the entity in the referencing relation and the entity in the referenced relation!

In the parent-child case, I meant the parent column as a foreign key back to PERSON stored in a column called "parent", like CREATE TABLE PERSON (id int primary key.... parent int foreiegn key references PERSON(id)...)

Using tablename_id is only going to get you one relationship per ordered-pair of tables, so its a non-starter, you may want to have parent and spouse relationships between PERSON and PERSON for instance, naming them both PERSON_ID... well you don't want to marry your mother :)

Hungarian Notation? Gaah! If I ever have to read a mluspidBlah = ConvertToSomething( mlusptBleah ) again I'll barf. That's the kind of stuff that keeps me up at night.

do you mean hungarian notation to track types or do you mean hungarian notation to track semantics?

If you mean tracking types, I agree. If you mean semantics, I am nowhere near certainity about it being either good or bad. I can certainly see the use of using sInput, uCheckLimit(sInput) and uOutput. It is usually just a single character, and thus, not reading it does not require much effort, so I don't think it is diabolically evil.

I agree that systems Hungarian (tracking types) is a lose. Apps Hungarian (tracking semantics) is a win in languages like Python, but the need for it is a symptom of a wimpy or non-existent static type system. If you can encode a semantic property into a variable name, then you also ought to be able to reify that property into a type and get the type checker to catch your mistakes. Here's the most impressive example of this I've ever seen: http://www.cis.upenn.edu/~stevez/papers/LZ06a.pdf

Yes, I very much agree with this. My major problem with type systems is, though, that I have yet to see a static type system which does not get in the way all the time. Haskell is getting close, and I am sure that Haskell will reach the point of allowing pretty much sane python in haskell, which will be impressive and interesting :)

Another thing where apps hungarian notation saves me pretty often is in C, with multidimensional arrays and tracking the meaning of each dimension in the array. "rcFoo" (which encodes that it is rcFoo[row][column], added with rIndex or cIndex (resulting in rcFoo[rIndex][cIndex] drastically reduced the number of transposition errors I made, because the order of the indizes is just in plain sight. :) Again, of course, you are right that the right way would be to declare this as Foo :: Array RowIndex * ColumnIndex -> Value.

I rarely use it myself, but it's been useful in the few cases where I wanted a shorthand to note something about how the data should be used.

And I, too, think it's counterproductive when it gets overdone to your example's extent.

I very much agree with the authors promotes insight assertion. Accurately naming at least functions and classes increases the value of your conceptualization of a problem. I have even taken to writing a non-programming description of my problem first, perfecting that, and then programming it.

The only useful naming convention was described by Bjarne Stroustrup in his C++ book: "capitalize type names, don't capitalize anything else".

If you want to prefix your private members with "_", you are using the wrong IDE (or the wrong language, like here: http://news.ycombinator.com/item?id=826119). If you want to prefix sanitized variable names with "s", you should find a better way of sanitizing them and read less of Joel. If you want to name hard-working method like "DistinctRequests", remember that you are "hiding behind your names" and contradicting yourself at once.

Your second point makes sense only if everyone who reads your code only does so in your preferred IDE. That kind of tool dependence is unhealthy.

For myself, I rather like things like the prepended underscore (or m_*, the microsoft convention for object members) because it makes up for a shortcoming of C++ and its derivatives: there are multiple non-local scopes for symbols, and it's not always obvious which one you mean. Is that unbound variable a class member? static class member? top-level member of a namespace? global? Having prefixes that make this obvious can be really helpful.

Now, that's secondary to the point of whether this scoping rule is a good idea. I'm generally of the opinion that it's not, and that languages (e.g. python) which force object scoping to be explicit are more readable. But that's just because "self.field" is 1:1 with the "_field" convention from C++.

Sometimes issues like this are less about the language than the code conventions for a company/project, though. If (for example) prefacing variables with a _ is a short convention for saying, "While the language we're using doesn't enforce private variables, we only expect this to be used internally, and will feel free to make major changes to it. Consider yourself warned.", that's still highly useful.

Of course, if you have such conventions, be consistent.

Everybody Always Has The Wrong Language (tm)

I believe a lot of this is just an attempt to make up for what the language lacks, in this case it lacks stronger scoping and typing systems. It would be a bit of a pain to create a SanitizedString type for a sanitized string that is only used in two functions, but this is a good compromise between speed and efficiency. It may not always be the right trade off, but that's how trade offs work.

Granted, the SanitizedString type is checked by the compiler, while the s* variable name is checked by the programmer, each having its own problems.

I like using _name for private variables, because when I have a constructor that accepts that value, I can do this:

MyClass(string name) { _name = name; }

"The correct naming of things [...] makes you attractive to women."

I'm curious about this kind of writing in a programming related article. Did any women who read this article feel discomfort when they read this sentence?

I am guessing no, but I would like for the women of this community to answer.

The only naming guide you'll ever need:



I'm all in favor of finding good names for things. In fact, I obsess over it. But it's so far from everything that this title is just dumb.

Funny this ends up here. I was asking earlier on twitter today about whether there was a C# convention for naming private variables.

Seems that it is indeed _variablename , while I don't know if I like it, it would seem that it is more common then I thought and specifically mentioned in this article.

The official .NET class design guidelines only cover public interface: http://msdn.microsoft.com/en-us/library/czefa0ke(VS.71).aspx

_privates is common, but in my experience at the largest of all C# shops, the explicit this.privates is more common. I use Resharper to enforce this style, but then again I use Resharper for every damn feature that wonderful wonderful magically tool provides.

Because Perl's OO stuff is screwed up and has now private/public/protected distinction I used a _ and __ convention for protected and private member variables.



There is also a trend to use variablename_ instead, a subtle change, but feels a bit better.

I disagree with "Avoid discussing hard work" Things should have names indicative of what they do.

Giving a costly operation an attribute sounding name is going to give some poor future developer an unpleasant surprise.

"you shouldn't even be using that because it means you're in a for loop when you should be using foreach."

I am not sure whence arises this nugget of alleged wisdom...

IMHO, Android has one of the worst mainstream APIs

The statement about Hungarian notation and identifying datatypes in names is wrong.

Simonyi's intention was for the notation to encode semantics, not data-type. Some say it was Petzold who popularized what's now called "Systems Hungarian"

Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact