Hacker News new | past | comments | ask | show | jobs | submit login
Making Wrong Code Look Wrong (2005) (joelonsoftware.com)
112 points by aleyan 3 days ago | hide | past | web | favorite | 46 comments

Most modern typed languages support encoding this information in the type of the variable, so that the compiler catches it, instead of in the variable name. Alexis King wrote a blog post about it that reached the front page a few days ago. https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...

SMALL BRAIN: Hungarian notation is stupid and just duplicates the work of the type system

NORMAL BRAIN: Systems Hungarian may be silly, but Apps Hungarian has some merit

GALAXY BRAIN: Apps Hungarian is stupid and just duplicates the work of the type system

Yes, though the notation approach is worth following if you cannot use the type system (lang doesn't support it, or incurs unacceptable performance or complexity overhead).

But yeah, zero cost compiler magic is the reason Rust is so good.

I'll be using it in my JavaScript extensively. Very useful in languages with useless type systems.

Are you sure you cannot handle this using conditional types in typescript?

Any typed language can do this, not just Rust ;)

> GALAXY BRAIN: Apps Hungarian is stupid and just duplicates the work of the type system

Is this correct? From the article, Apps Hungarian represents your intent for the var (the kind), not the type. Systems Hungarian represents the type (duplicative).

He meant that the intent (kind) can be factored into the type system too. Instead of using type string for both “unsafe string” kind and “safe string” kind, you’d use two different types. So the type system aka compiler can catch a mistake, in addition to your eyes.

For example, see https://talks.golang.org/2012/goforc.slide#43.

Oh I see. Thank you!

Also this: Making illegal states unrepresentable


Very interesting, thanks for the link. Also, another instance of the word "cromulent" cropping up! Springfield teachers would be proud of their feat.


(Edit: Add link)

Amazing. I'm not a native speaker so I didn't follow the language's changelog... thanks for the info!

I currently use a mix of thin typedefs (using ClockT = uint32_t, the compiler doesn't catch mixups) and putting units in variable names. I've used this approach in Python, Rust, and C++, and in over 1 year have had zero bugs that could've been caught via compiler-enforced units. (I don't know if this approach will work for others though.)

I've tried using a unit system library in Rust, for audio programming. It was not worth it for me. I had to define types for clock cycles, audio sample durations, and amplitudes. Then cycles per second and samples per second were expressed as ratios. Then you need (cycle / second) to have type (cycles per second). The typenum-based error messages were dreadful. And also you sometimes need to use different base numeric types used to store these dimensioned quantities. And also write extra typecasts to box and unbox between unit-wrapped quantities and plain integers.

Did you read the article?

The correct way of doing Hungarian Notation, as demonstrated in this article, is to encode in the name of the variable what it does, not its type.

No compiler can know that a string variable is a "name" or a "password".

> No compiler can know that a string variable is a "name" or a "password".

In pretty much any language supporting user-defined types and/or type aliases, types can be used for that information (with aliases, providing documentation the same as hungarian notation does; with real types, providing the possibility for enforced protection against accidental misuse.)

But the article doesn't actually use hungarian notation for that kind of thing, it uses it for traits that are not "what it does", but instead metadata analogous to Ruby's "taint". Which, again, is something easily encoded in typing, in a static language, either as a core feature of typing or via user-defined (possibly wrapper, depending on other features of the type system or general language semantics) types; type aliases, even, would work as well as hungarian notation, but using actual types lets you put guarantees in place rather than just providing documentation.

It still might be possible to have types like `UserId` or `UserCredential`, which get parsed at the request time and never roll back to plain strings. It is evident that they are indeed different subtypes of strings: user identifiers can't have a space for example.

That said, I generally agree that conventional type systems are not (yet) capable of encoding all those informations to a type. I prefer a mix of actual types and coding conventions for the practical matter.

As a thought experiment, I like to assume that there is a hypothetical language where everything is encoded in types, to the point where you don't have or need variable names anymore. Every typed language (and programming style) we use is some point in the spectrum of practical compromises between that hypothetical language where all pre-runtime information is in formal types and scripting languages where all pre-runtime information is in informal names.

Java for example has to leave quite a lot of that information in names because there are no type aliases and final types exist.

I'm curious what type of capabilities are lacking in the Ada or the F# type system to handle this. You can create 'new' (hear incompatible) types instead of subtypes and they won't be affectable to each other. So

   type User_Id_Type is new String;
   type User_Credential_Type is new String;
Then in your request deserializer convert once from String to the correct type, and at the use site only allow taking the correct type?

You can also add static predicates to ensure User_Id_Type doesn't contain spaces. You can also make the type 'opaque' and it won't be possible to convert to a String without a bespoke interface.

I'm not sure what's missing. I almost feel like it would be a great challenge :-)

I think in this particular case they suffice, because they are clearly different subtypes as I've mentioned.

It is much harder, if not impossible, if you have to describe a relation between two or more values of the same subtypes. For example, imagine that you already "know" certain indices never go off the boundary of given array but want to encode that information to types. Sure, there exists a Rust crate [1] that tracks a relation between indices and originating array via lifetime, but it is complicated.

[1] https://github.com/bluss/indexing

Mmmh thanks for the link and the paper about generative types. Seems very interesting!

I'm wondering if using type predicates or type invariants (+ proof for static verification, otherwise the check will mostly be at runtime) would help here.

Look up https://blog.adacore.com/spark-2014-rationale-type-predicate... if interested.

Hungarian notation solves one problem: it makes wrong code look wrong.

Type alias can also do that, but they can do so in a way where correctness can be statically enforced.

As an example, let’s say you have a Write function like in the article, but it only takes an HTMLString, which can’t implicitly be casted from String. Then, you can have Encode return an HTMLString, but take a String. Finally, you can use static analysis to only allow whitelisted functions to construct non-literal HTMLStrings (so anyone could do a literal of HTML, but every other usage would require encoding or whitelisting.)

This is a bit rudimentary of an example, but it is widely used in practice.

Here is a safe type used in Closure:


Actually it can, you just have to define types for it.

In Scala you have value class, that get replaced by the underlying type during compilation so there is no overhead.


It may sounds a hassle but you only have to handle it at the interfaces (for example for a web app in the controller and data layer), in the rest of the code you just manipulate your value like any other type.

But type is a machine-readable way to say what a variable does or is. A compiler can certainly know a name from a password if its type support is versatile enough and you use separate types for them. Or, for example, for things like "unchecked C string", "valid UTF-8 C string", "NKFD-normalized UTF-8 C string", etc. A separate type for password may be a jolly good idea with a logging system you can dump any data to; you certainly wouldn't want it to log passwords.

Using prefixes is a poor man's type system, a loose convention to be enforced by humans, while with types it would be a machine-readable and enforceable requirement.

In Go:

type Password string type Name string

func SayHello(name Name) { fmt.Println("Hello %s",string(name)) }

func Login(password Password) bool { if bcrypt.magicstuff([]byte(password)){ return true } log.Println("error: wrong password: ", string(password)) return false }

Now the type system does know whether it's a name or a password, and will throw a compiler error if you try to use the wrong one in the wrong place.

edit: really HN, no code formatting options? :(

Just looking at the safe/unsafe string example I wonder if another usable approach would be to ditch plain strings and instead use thin wrappers like UnsafeString and SafeString and a bunch of operations on them. But not assignment, for instance.

There wouldn't be much need to care about whether the code looks wrong or not (welll with regards to safe vs unsafe string:), because the compiler would do it for you (or runtime I guess, depending on which laguage it gets implemented in). I think all the examples Joel writes (the ones 'xxx is always ok' and 'xxx is not ok') are covered by it. It does mean you need a Write which only takes SafeString I guess, and it probably doesn't mean you can still do something wrong, but it should be much harder.

I had a similar thought- Rust has "tuple structs" which are effectively structs without property names. You could make singletons:

  struct Relative(i32);
  struct Absolute(i32);

  let foo = Relative(12);
  let bar = Absolute(12);

  fn useAbsolute(coord: Absolute);

  useAbsolute(foo); // Err!
and effectively get this idea enforced at the type level. Of course support for this pattern would vary by language, and there would probably be some small overhead, but it could be worth it.

You’ll be glad to learn that there is absolutely no overhead in execution time or memory consumed (and a minuscule one in compilation time).

This is my #1 complaint with static type systems. Static typing advocates say they allow the compiler to check for correctness, but all these type systems are only the thinnest possible wrappers on machine types.

An HTML-escaped string is not at all the same type as an untrusted user-entered string. They just happen to both be represented by a sequence of characters. An integer that represents my age in years is not at all the same type as an integer that represents my height in centimeters. They just both happened to be represented by a 64-bit two's-complement integer (probably).

The fact that the underlying physical machine types match is not helpful in determining whether it makes sense to add or concatenate these. (If anything, it provides a false sense of security.) The most advanced programming languages on my computer are still worse at dimensional analysis than a middle school kid with a pencil.

I'm starting to play around with a new type system to remedy this. I don't have a programming language for it yet, though.

I fully agree that the types offered by the programming language (wrapping the underlying types) are not enough. However, a static type language allows you to define your own types. So (as your sibling comment by brundolf suggests) you can create your own types around the existing types such as HtmlEscapedString and UntrustedUserEnteredString. The syntax is particularly convenient in Rust, but you can do the same in e.g. Java, or any other statically typed language, albeit more verbosely:

    class HtmlSnippet {
        public String html;

    void setWidgetContents(HtmlSnippet s) { .. }

    void foo() {
        // Compile error

This is one of those "Turing tar-pit" scenarios. As you point out, it's technically possible, but in most languages, it would drastically increase the verbosity of the code.

None of the built-in control structures will work with your custom types. In Java, for example, "int" is a primitive, and "Integer" is final. All the built-in functionality that works with numeric types won't work with your types. All of the standard and third-party libraries expect the built-in types, too. Java doesn't support operator overloading, so you wouldn't be able to + your numbers, either. Instead of helping you with every arithmetic operation, it would be a massive pain point at every arithmetic operation.

Strings and integers are fundamental types. Programming languages just aren't designed to let people substitute their own.

In a previous job, I wrote an autotrader which by now I believe will have handled hundreds of millions of pounds in untyped Python 2. Certainly tens of millions.

There were a number of approaches I used. This is a few years back now, so apologies if anything is unclear.

You can fake a type system to some extent in py2 by using methods and copious 'isinstance' checking.

For example, Money<EUR> + Money<GBP> can be made illegal by overloading addition operators. Strings which require some sort of meaning can be given classes and functions which use them can use isinstance and friends to perform runtime type checks.

Another is indeed the use of 'raw_whatever' and 'whatever' in identifiers, what I now know to be "apps hungarian" notation. 'raw_whatever' would have a similar definition to Joel's unsafe user input. It might come from an API of some sorts that you don't truly "trust".

Similarly, that sort of variable naming approach applied to function parameters. Passing a 'dog_id' to a 'cat_id' function _may_, in certain cases, be possible if both were flat strings (and not objects that could be isinstance checked), but I favoured a variable naming approach (along with calling functions by keyword argument) that would result in this at least being visible after problems came up (e.g. you'd see myfn(dog_id=cat_id) and feel an urge to hold your nose).

There were tons of these sorts of things all over the codebase, tests, etc, and the system outperformed the previous ones by a significant margin. My understanding is that it still hasn't suffered any significant losses; only some minor API issues that were outside of our control.

Super fun project. Nowadays I'd just use a typed language for it and interface with the py2 stuff via an API. Or at least make use of mypy. But that autotrader was what the company needed at the time.

Details are in my bio if anyone has further interest.

Another way to solve the escaping problem is by some kind of interpolation mechanism that takes care of escaping on its own: like JSX or template literals in JavaScript (although you have to remember to tag the latter), or prepared statements in SQL. Why fix a coding convention when you can fix the language?

I used to hate code linters. Now, when collaborating with others, I set an aggressive one.

They make wrong code look wrong, literally.

Of course, it does not catch all wrong code examples, but at least some most glaring examples that otherwise would need manual inspection.

This will sound simplistic but just avoid using C++. You'd have to pay me twice my current salary to go back to that language (or C).

What are you coding it right now? Haskell? F#? OCaml/ReasonML? ...maybe Go? Or Rust?

...because in almost all other languages the issues mentioned by OP are valid! You can use "exceptions for business logic" and have invisible goto-s in your Python or C# or Java or Scala code just fine. I don't think the advice is language dependent at all, it applies to any language!

Sure, with a "sufficiently advanced type system" you can have the compiler catch these kinds of errors and have wrong state unrepresentable. But in practice you either don't have that "sufficiently advanced type system", or you know that using it to prevent these errors will take a ton of extra work, or end up with something so complicated that you'd spend 80% of you brainpower wrangling abstract types algebra in your brain instead of spending that brain power son solving the actual domain problem, and have every new developer spend weeks before being productive in your overengineered system... variants of stuff like "Apps Hungarian" are often a lightweight and practical solution in any language.

Also, if operators overloading is what freaks you out, try writing code heavy on matrix/vector/tensor algorithms without it... it will end up looking so ugly that you'll drown in logic bugs hidden in plain sight simply because the code looks so different from the nice math it came from that your brain is not powerful enough to diff it correctly anymore... the zero cost operator overloading abstraction in C++ can also be very powerful at preventing algorithm bugs if used in the right domain - for "solid abstraction" with sane origins in mathematics it works fine!

> variants of stuff like "Apps Hungarian" are often a lightweight and practical solution in any language.

Except in an untyped language, the type system usually presents a better solution with no more effort than “Apps Hungarian”, and for many cases even without static typing you can enforce rules rather than do advisory annotations with similar effort (e.g., for the specific unsafe-data case addresses in the article, a Ruby's taint system.)

> Also, if operators overloading is what freaks you out, try writing code heavy on matrix/vector/tensor algorithms without it...

While overloading may be ideal for some subset of that, and tolerable for a larger subset, I think what that really calls for is more custom operator definition as supported I Haskell, Scala, Raku and some others more than overloading.

The checker framework (https://checkerframework.org/) can make the compiler understand about stuff like this without introducing additional types in Java.

I have never tried it myself, maybe someone with experience can chime in?

I was a PhD student in the group that makes on the Checker Framework.

It's actually pretty easy to use it to automatically enforce Hungarian notation. I have a blog post/sample implementation here: https://toddschiller.com/java-hungarian-notation-checker.htm...

Most people commenting here seem to be missing Joel's point that the Hungarian notation makes it possible to read the code without having to continually search elsewhere for information about what it means. I'm pretty sure that he would be all in favour of declaring sub-types in languages that support it but an awful lot don't or don't make it convenient.

This was a really good read! Highly recommended for everyone. I'll highly likely incorporate it into my own code.

Nice. It's the most lucid explanation I've seen of what Hungarian notation really is, what it can do, and how different it is from sticking "ul" in front of every unsigned long variable.

WRT exceptions I'm curious what his thoughts are on the go way of handling them (just return an exception, don't throw)? Clearly this article was written pre-go.

Error values have a long history. Plenty of C stdlib functions return a status/error code, with the result being provided via a pointer the function was passed. Common lisp lets you return multiple values, which can be used to signal success/failure (it's how retrieving nil from a hash map is disambiguated from not finding the key in the map) or include related data, e.g. the fractional part of a truncated number.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact