
Programming: Choosing the right name is everything - clawrencewenham
http://sites.google.com/site/yacoset/Home/naming-tips
======
moe

      "There are only two hard problems in Computer Science:
       cache invalidation and naming things."
    
       -- Phil Karlton
    

The article is absolutely touching a worthwhile subject but imho an important
tip is missing: Don't be afraid of _re_ naming things later. Modern editors
make mass-renaming easy and it's almost impossible to get everything right
from the start.

~~~
patio11
_Modern editors make mass-renaming easy_

I'm in Java-land where we have, bar none, the best refactoring support in the
world.

Unless you consider the settings files, documentation, publicly exposed APIs,
existing customer installations, projects dependent on the current project,
web services, Apache configurations, and XML soup.

But yeah, other than that, renaming something is a simple matter of alt-
shift-R and typing the new name.

~~~
potatolicious
\- Documentation: Is yours not auto-generated? We document our code pretty
extensively here and have never had a renaming problem. Use the right tools
and code flux won't ever bother you.

\- Exposed APIs: I don't think the OP was talking about public APIs, which are
admittedly much harder to change after the fact. There is _plenty_ of code
that is still easy to refactor underneath. That's why we have APIs, right?

\- Existing installations: Barring API changes, what changes internally would
affect your customers?

Renaming stuff is _still_ really simple.

------
tptacek
I learned about 7 years ago that "choosing the right name" for things was
slowing me down. I was breaking flow every 2 minutes or so to come up with
function names. It was like an unintentional form of procrastination.

It was a hard habit to break, but now I just write a first draft, and go fix
the names afterwards if I have to.

~~~
scott_s
I disagree. I figure if I can't come up with a meaningful and reasonable name,
then my understanding of the problem is incomplete. That is, if I've never
named something before, that's because I've never had to think about it
explicitly before. I'd rather sit and make sure my understanding is correct
than plowing on through.

~~~
tptacek
My understanding of the problem is always incomplete when I start writing,
even if my lizard brain is telling me I the opposite. So why waste time?

~~~
Periodic
When my understanding of the problem is incomplete, I like to go through and
just write functions and data structures first. It builds the high-level
structure of the program before I get bogged down by details, and its easier
to change the way it's structured when you only have function definitions than
when you have half the functions written and realize you should have organized
your data structures differently. When I write like this function names are
pretty obvious because each function is named after exactly what it's supposed
to do.

~~~
toadpipe
If you think about it, this is basically outlining. You know, that thing your
English teacher always bugged you to do because it would make your essays more
coherent. If one insists on writing code before understanding the problem,
this is probably one of the better ways to do it. Recognizing that function
names are obviously correct because you know exactly what their purpose is in
relation to the whole is a good sign.

When I first started programming I wrote a lot of code without much planning,
and it was good because it allowed me to learn a language faster by
interacting with it. When what should have been pretty straightforward
programs started to get unwieldy, I realized that I had to do something
different. Now I go at it with pencil and paper before writing any code, and I
find it much more productive and fun because the level of understanding is
better.

It's so easy to write code and it seems like it's so easy to change, but with
any non-trivial problem I think that this is really an illusion. There are
many times when it is useful to write exploratory code without much planning,
but I think it's mostly useful for learning a new environment or banging out a
minor variation of something you've done before. Otherwise it seems more
useful to do some fairly systematic planning, even if it's for a prototype
that may get completely reworked over a number of iterations where
requirements are explored and may change.

~~~
Periodic
I think you're right that it's basically outlining.

As a practice, it started for me when I first started working on writing some
literate code. I would basically write a paragraph about what the function
should do, then write up the function definition, then move on to the next
program.

It has definitely made my code more coherent and feels like a necessity for
anything more than a few hundred lines.

Lots of stuff I write I still just dive in and code because it's "obviously"
simple. Sometimes that simple problem grows into a mess of 1k+ LoC, sometimes
not.

------
tow21
Not to detract from the actual point of the article, which is extremely good,
but this gem popped out at me:

"Creating a root namespace named after your corporation will haunt you.
Companies and brands get renamed, acquired [...] Oracle's Vending Machine
division was once Sun Microsystems"

------
kevbin
CamelCase in the DataBase? Insane in the brain!

Pushing type information and other metadata into names is bad software
engineering. Names should be from the problem domain and not co-mingled with
implementation concerns.

The database section is so wrong!

(1) "driverLicenses": it's a "driver's license", you don't want to put an
apostrophe in the table name, so go with driversLicenses

(2) SQL's not real good with case-sensitivity, use underbars to separate words
in SQL, so DRIVERS_LICENSES

(3) Since you're going to use the table name in your query anyway, there's no
reason to repeat it in the column names; naming your tables and columns
consistently makes maintenance and automation easier: ID not driverLicense_id
If you really, really want to push metadata in there, do it as a rename so you
don't step on everyone else's toes: SELECT id as
myReallyLongAndUnnecessaryTablePrefixedIdentity_id…

(4) Pluralization of tables stalls your mental cache especially in combination
with the table-name-in-the-column-name idea: SELECT driver _License_ _id from
driver _Licenses_ , SELECT _medium_ _id from Media? Some nouns have multiple
valid plurals (do I select from the Persons or People table to find a
Person?), some nouns are already collective, etc.

(5) Prefix views with "v_": uh yeah, because its much easier to rewrite
queries and mappings if a view is replaced with a table than it is to type,
say in postgres, "\dv".

(6) "Use a postfix to show the kind of key": uh no, use the database's type
system to show the kind of key through the use of types, domains and check
constraints.

upshot: CREATE TABLE DRIVERS_LICENSE (id ...)

_skidmarks for private fields? uh, if that's what your language does, fine,
otherwise how about using the language's built-in visibility keywords?

StudlyCaps for method names and class properties: better to stick to the
language's style guides, teaching a junior programmer to write [myDocument
SendToHNews: dvmby_timeStamp] or myDocument.SendToHNews(dvmby_timeStamp) is
giving them some unlearnin' to do at their next job.

"Don't make them singular because you have an ancient ORM", this is a straw-
man argument that applies equally to all his naming advice as well: if you
have a good ORM you can call tables "quetzalcoatl" and remap to
"MesoamericanDeity". Use appropriate names in the schema and object model and
let the ORM sort-out the differences.

"Oracle's Vending Machine division was once Sun Microsystems" _that's_ funny!

~~~
clawrencewenham
1) Fair enough, up to the designer's taste

2) Do you mean "not real good with case-sensitivity" or just "case
insensitive"? If the latter database still preserves the case, then the
underscore on an all-caps name is even uglier.

3) driversLicenses.id is not the same thing as vehicles.id. The point is to
name the column what it is so the meaning is preserved everywhere. If you
don't use aliases then you can't distinguish them in the result set. If you do
use aliases then an app that reads and updates has to now remember that the
alias isn't its real name.

4) I haven't noticed the mental cache problem myself, sorry. But a table
called "vehicle" isn't a vehicle. I don't believe the issues with "person" vs.
"persons" or even "sheep" vs. "sheeps" is worth the mental cache problem of
remembering if you're dealing with a row or the entire table in the context of
the application.

5) Conceded

6) The point of the article is about using naming to improve clarity,
including where metadata like domains and check constraints aren't visible,
such as printouts. The postfixes don't even encode type, but information about
how the value is defined. A books table with a "ddc" column isn't hurt by
adding "_class", but you're also communicating that the value comes from an
international standard classification system (Dewey Decimal). I don't know of
a database system that can encode this meaning in types, domains or check
constraints.

On critique of _skidmarks and StudlyCaps, I guess I should have emphasized the
concept of coding to what the toolchain and environment's conventions
suggests, such as putting "When In Rome, Always Do As Romans Do" in boldcase
at the top of the sidebar. Like I had.

~~~
kevbin
(1) is a ticky-tack foul, just pointed it out because "Choosing the right name
is everything" is pretty hyperbolic and I was being hyperbolic, too, with the
"insane" bit.

(2) I'm biased toward _-naming in general, I think it's less ugly than
camelCase. But my point (2) is just from experience: developing on MySQL on a
Mac (case-insensitive filesystem, case-insensitive naming by default) and
Linux (case-sensitive) and porting from MySQL to Oracle. Under-barring is
least-common-denominator approach that's likely to save some time and
frustration.

(3) You'll have name-collided columns in the result set regardless of what
they represent. PERSON and PET may both have a nickname column, when you join
pets and their owners, you're going to need to qualify or alias nickname.

(4) You're right, a table called "vehicle" isn't a vehicle. But I don't think
of it as a list of vehicles either, I think of the table definition as being
SQL approximation of the definition of "vehicle-ness", each tuple is a vehicle
and a select from the table will get you a list of vehicles, but the
definition is "what it means to be a vehicle" approximated in SQL as CREATE
TABLE VEHICLE (...)". This helps to maintain the "relational mindset" and
avoid slipping into the VisiCalc mindset when dealing with relationships
between relations.

(6) You're right on about the importance of clarity, but I think clarity
improves with concision. Qualified names are clearer in this respect than
names overloaded with metadata. I realize that a lot of databases don't have
good support for defining types and domains (postgresql seems to, though),
when there are ways to encode the metadata you want without polluting the name
space use them.

You're right on about the prominence of "When In Rome…", sorry, it didn't even
reach my consciousness.

------
euroclydon
"If your table is called "driverLicenses" and needs an ID column for its
primary key, then call it "driverLicense_id" instead of just "id". This will
make the origin of the column clearer in result sets"

I really would rather type in "id" when I writing a SQL query against my data,
than, "driverLicense_id".

~~~
ars
When you have to join two tables, both with an unrelated field called "id",
you'll regret that.

Or when you have to join a foreign key, and you end up with two fields both
called "id" in the same table - and don't ever give a foreign key a different
name in the parent and child tables.

And even if you say: I'll just prefix the column name with the table name in
the query, remember that when your app gets the field, it'll just be called
"id", and you'll be tempted to alias it just so you know which id field it is,
at which point you'll realize that you should have called it something else
from the start.

~~~
kevbin
When you join those tables, you won't regret it. You may have to think about
it for a moment; if you need the ids in the join they're there: SELECT A.id
FROM A JOIN B ON A.project = B.project; this is concise and makes sense.

When you do a self-join, the table-name-prefixed-column-names strategy doesn't
buy you anything anyway, you'll have to explicitly qualify or rename the
column(s) regardless.

Don't name foreign keys the same as the referenced column, name them to make
the represented relationship clear, if you have a parent-child relationship,
name the column "parent" (or if you must, "parent_id"; redundant if every
table has a synthetic primary key named "id").

~~~
ars
And when you do: SELECT A.id, B.id FROM A JOIN B ON A.project = B.project

How are you going to know which id field your app has?

And there are a lot more relationships than just a parent/child. You often has
a table with many child keys, each to a different attribute.

I should tell you, I used to do like you - I would name all the primary keys
ID, and the foreign keys tablename_id.

I've learned it's not actually a good way to structure a database. It's much
much clearer is parent/child columns have the same name. You'll see it once
you start making complicated databases.

~~~
kevbin
"How are you going to know which id field your app has?"

Depends on how your interface to the data-layer layer behaves. If you're going
right into an associative array, you may get "A.id" and "B.id" for free or you
may have to explicitly rename A.id and B.id, e.g., "SELECT A.id as A_id...".
That's not a consequence of using "id" as a column name, you'd have the same
problem with any other name collision in a join unless you named _every_
column databaseName_schemaName_tableName_columnName and you'd _still_ have the
problem in a self-join.

"name all ... the foreign keys tablename_id."

Sorry, I wasn't clear, I'm saying _don't_ do that! Use a name that describes
the relationship between the entity in the referencing relation and the entity
in the referenced relation!

In the parent-child case, I meant the parent column as a foreign key back to
PERSON stored in a column called "parent", like CREATE TABLE PERSON (id int
primary key.... parent int foreiegn key references PERSON(id)...)

Using tablename_id is only going to get you one relationship per ordered-pair
of tables, so its a non-starter, you may want to have parent and spouse
relationships between PERSON and PERSON for instance, naming them both
PERSON_ID... well you don't want to marry your mother :)

------
wheaties
Hungarian Notation? Gaah! If I ever have to read a mluspidBlah =
ConvertToSomething( mlusptBleah ) again I'll barf. That's the kind of stuff
that keeps me up at night.

~~~
tetha
do you mean hungarian notation to track types or do you mean hungarian
notation to track semantics?

If you mean tracking types, I agree. If you mean semantics, I am nowhere near
certainity about it being either good or bad. I can certainly see the use of
using sInput, uCheckLimit(sInput) and uOutput. It is usually just a single
character, and thus, not reading it does not require much effort, so I don't
think it is diabolically evil.

~~~
dfranke
I agree that systems Hungarian (tracking types) is a lose. Apps Hungarian
(tracking semantics) is a win in languages like Python, but the need for it is
a symptom of a wimpy or non-existent static type system. If you can encode a
semantic property into a variable name, then you also ought to be able to
reify that property into a type and get the type checker to catch your
mistakes. Here's the most impressive example of this I've ever seen:
<http://www.cis.upenn.edu/~stevez/papers/LZ06a.pdf>

~~~
tetha
Yes, I very much agree with this. My major problem with type systems is,
though, that I have yet to see a static type system which does not get in the
way all the time. Haskell is getting close, and I am sure that Haskell will
reach the point of allowing pretty much sane python in haskell, which will be
impressive and interesting :)

Another thing where apps hungarian notation saves me pretty often is in C,
with multidimensional arrays and tracking the meaning of each dimension in the
array. "rcFoo" (which encodes that it is rcFoo[row][column], added with rIndex
or cIndex (resulting in rcFoo[rIndex][cIndex] drastically reduced the number
of transposition errors I made, because the order of the indizes is just in
plain sight. :) Again, of course, you are right that the right way would be to
declare this as Foo :: Array RowIndex * ColumnIndex -> Value.

------
chasingsparks
I very much agree with the authors _promotes insight_ assertion. Accurately
naming at least functions and classes increases the value of your
conceptualization of a problem. I have even taken to writing a non-programming
description of my problem first, perfecting that, and then programming it.

------
smikhanov
The only useful naming convention was described by Bjarne Stroustrup in his
C++ book: "capitalize type names, don't capitalize anything else".

If you want to prefix your private members with "_", you are using the wrong
IDE (or the wrong language, like here:
<http://news.ycombinator.com/item?id=826119>). If you want to prefix sanitized
variable names with "s", you should find a better way of sanitizing them and
read less of Joel. If you want to name hard-working method like
"DistinctRequests", remember that you are "hiding behind your names" and
contradicting yourself at once.

~~~
ajross
Your second point makes sense only if everyone who reads your code _only_ does
so in your preferred IDE. That kind of tool dependence is unhealthy.

For myself, I rather like things like the prepended underscore (or m_*, the
microsoft convention for object members) because it makes up for a shortcoming
of C++ and its derivatives: there are multiple non-local scopes for symbols,
and it's not always obvious which one you mean. Is that unbound variable a
class member? static class member? top-level member of a namespace? global?
Having prefixes that make this obvious can be really helpful.

Now, that's secondary to the point of whether this scoping rule is a good
idea. I'm generally of the opinion that it's not, and that languages (e.g.
python) which force object scoping to be explicit are more readable. But
that's just because "self.field" is 1:1 with the "_field" convention from C++.

------
iman
"The correct naming of things [...] makes you attractive to women."

I'm curious about this kind of writing in a programming related article. Did
any women who read this article feel discomfort when they read this sentence?

I am guessing no, but I would like for the women of this community to answer.

------
BigZaphod
The only naming guide you'll ever need:

[http://developer.apple.com/mac/library/documentation/Cocoa/C...](http://developer.apple.com/mac/library/documentation/Cocoa/Conceptual/CodingGuidelines/CodingGuidelines.html)

:P

------
gruseom
I'm all in favor of finding good names for things. In fact, I obsess over it.
But it's so far from _everything_ that this title is just dumb.

------
mgrouchy
Funny this ends up here. I was asking earlier on twitter today about whether
there was a C# convention for naming private variables.

Seems that it is indeed _variablename , while I don't know if I like it, it
would seem that it is more common then I thought and specifically mentioned in
this article.

~~~
snprbob86
The official .NET class design guidelines only cover public interface:
<http://msdn.microsoft.com/en-us/library/czefa0ke(VS.71).aspx>

_privates is common, but in my experience at the largest of all C# shops, the
explicit this.privates is more common. I use Resharper to enforce this style,
but then again I use Resharper for every damn feature that wonderful wonderful
magically tool provides.

------
njharman
I disagree with "Avoid discussing hard work" Things should have names
indicative of what they do.

Giving a costly operation an attribute sounding name is going to give some
poor future developer an unpleasant surprise.

------
abalashov
"you shouldn't even be using that because it means you're in a for loop when
you should be using foreach."

I am not sure whence arises this nugget of alleged wisdom...

------
binarycheese
IMHO, Android has one of the worst mainstream APIs

------
kakal
The statement about Hungarian notation and identifying datatypes in names is
wrong.

~~~
clawrencewenham
Simonyi's intention was for the notation to encode semantics, not data-type.
Some say it was Petzold who popularized what's now called "Systems Hungarian"

