
Is Hungarian notation a workaround for insufficient type systems? (2011) - rcthompson
http://programmers.stackexchange.com/questions/113576/is-hungarian-notation-a-workaround-for-languages-with-insufficiently-expressive
======
tikhonj
One thing that often comes up in these discussions is using types to manage
units of measure. I think this is an awesome example, so here's a sample of my
favorite library for it (unittyped)[1]:

    
    
        *Main> (1 meter / second) * 5 second
        5.0 m/s⋅s
        *Main> 2 meter + (1 meter / second) * 5 second
        7.0 m
    

One really cute trick is that the various si prefixes are actually functions.
So instead of defining every combination of milli-, centi-, deci and so on,
you can write them like this:

    
    
        Main*> 1 kilo meter + 324 centi meter
        1.00324 km
    

The library can convert between compatible units automatically, so you never
have to worry about the conversion factor:

    
    
        Main*> 1.23 kilo meter `as` mile
        0.7642865664519207 mile
    

Since the prefixes are systematic, you can come up with your own bastardized
units:

    
    
        Main*> 1 centi mile `as` kilo inch
        0.6336000000000002 kin
    

Of course, if we try to combine incompatible units, we get a type error. And
this shows the most glaring problem with this library :P.

    
    
        Main*> 10 second + 2 inch
    
        <interactive>:38:11:
            Couldn't match type 'GHC.Types.False with 'GHC.Types.True
            When using functional dependencies to combine
              UnitTyped.And 'GHC.Types.False 'GHC.Types.False 'GHC.Types.False,
                arising from the dependency `a b -> c'
                in the instance declaration in `UnitTyped'
              UnitTyped.And 'GHC.Types.False 'GHC.Types.False 'GHC.Types.True,
                arising from a use of `+' at <interactive>:38:11
            In the expression: 10 second + 2 inch
            In an equation for `it': it = 10 second + 2 inch
    

This message is _absolutely hideous_. But it's still better than nothing. I
think some mechanism for letting libraries customize type error messages would
be awesome, but I don't know what that would look like.

[1]: [https://bitbucket.org/xnyhps/haskell-
unittyped/wiki/Examples](https://bitbucket.org/xnyhps/haskell-
unittyped/wiki/Examples)

~~~
johnbender
The difficulty with type errors partly arises out of inference.

The program assumes NOTHING about the type of the expressions and instead
collects constraints as it descends into sub-expressions. Once it has finished
collecting constraints for the all the sub-expressions it attempts to satisfy
those constraints.

If the constraints can't be satisfied there are two problems. First, you have
to know where you got the constraint from. This is fairly easy to solve in
implementation you can tag the constraints for example. Second, is that the
way in which the constraint "fails" or is unsatisfiable doesn't always give
you enough information.

For example:

\x -> if x then x else 1

Here the constraints basically say that x has type Bool and 1 has type Int.
This is obviously wrong because the `if x then x else 1` has to be either Bool
or Int. Now the question is, which one is wrong? Is the use of the second x
wrong or is the use of the 1 wrong. You can't know so you have to report
something and this is only a trivial example.

[edit]: I said "mostly" but I changed that to be "partly" because type systems
like Haskell's are very complex and in some cases type inference is
undecideable (rank n polymorphism) so that obviously makes reporting an error
very hard :)

------
jerf
I haven't really done a lot of professional-scale programming in C or C++, and
never in a context where I'm designing the types, so I assume that it must be
a legitimate annoyance in C or C++. But in better languages, it's just as easy
to use types. Here's how easy it is in Go, for instance:

    
    
        package main
    
        import "fmt"
    
        type inches uint
        type feet uint
    
        func main() {
            i := inches(12)
            f := feet(1)
    
            fmt.Printf("i + i value: %d, type: %T\n", i + i, i + i)
            fmt.Printf("f + f value: %d, type: %T\n", f + f, f + f)
    	
            // automatically will promote bare numbers, a matter of taste, I suppose
            fmt.Printf("i + 5 value: %d, type: %T\n", i + 5, i + 5)
    
            // but will refuse this:
            // "invalid operation: i + 5 (mismatched types inches and uint)"
            // fmt.Printf("Value: %d, type: %T\n", i + uint(5), i + uint(5))
    
            // will not compile: "invalid operation: i + f
            // (mismatched types inches and feet)"
            // fmt.Printf("Value: %d, type: %T\n", i + f, i + f)
        }
    

Prints:

    
    
        i + i value: 24, type: main.inches
        f + f value: 2, type: main.feet
        i + 5 value: 17, type: main.inches
    

If you'd like to fiddle with it:
[http://play.golang.org/p/_VPWoE901Q](http://play.golang.org/p/_VPWoE901Q)

It is not _quite_ as convenient to be clear about types in Go as it is to
continue just sloppily passing ints and strings everywhere, but it's not
_that_ much harder, and it does catch things!

~~~
vinkelhake
I would say that adding inches and feet is a well-defined operation, you just
need to take the different scales into consideration. How much more
complicated does the Go code get if you want to allow for addition of inches
and feet, but still prevent nonsensical things like adding inches and seconds?

How much more complicated does the code get if you then want to allow for the
division of length and time to produce velocity?

An example:
[http://www.boost.org/doc/libs/1_54_0/doc/html/boost_units.ht...](http://www.boost.org/doc/libs/1_54_0/doc/html/boost_units.html)

~~~
pcwalton
> How much more complicated does the Go code get if you want to allow for
> addition of inches and feet, but still prevent nonsensical things like
> adding inches and seconds?

You can't do it, because Go doesn't have operator overloading.

Also, I'd argue that the automatic promotion of bare numbers to a wrapped type
is not really "a matter of taste" but rather severely hurts the ability of
Go's newtype feature to catch bugs. In languages that have stricter newtypes,
such as ML or Haskell, you can design your APIs such that the caller has to
think about which unit to use. Someone using your code might accidentally
write the number "12" to mean 12 inches when your API takes feet, but the Go
compiler will silently let this mistake pass.

~~~
jerf
"Also, I'd argue that the automatic promotion of bare numbers to a wrapped
type is not really "a matter of taste" but rather severely hurts the ability
of Go's newtype feature to catch bugs."

I agree, but in general, Go is a pragmatic language rather than a pure one. In
Haskell it would certainly be the wrong choice. You can pull some fun tricks
in Go, such as
[http://www.jerf.org/iri/post/2923](http://www.jerf.org/iri/post/2923) , but
it's not interested in a Haskellian "correctness" in general.

~~~
pcwalton
I see nothing pragmatic about the automatic wrapping in Go, though. It feels
like a pure loss--language complexity, safety, correctness--over not doing it.

------
ivanhoe
Speaking of this, I've always liked Perl's $,@ and % prefixes that distinguish
between scalars, arrays and hashes. It is like a very minimalistic hungarian
notation, but that's exactly why it's so great: just a single extra character
gives you all the extra info, while imposing a minimal additional mental
strain on your brain while scanning the code

------
ggchappell
This is actually a pretty thought-provoking question. Here's my take on it.

First of all, the _expressiveness_ of a type system can mean two rather
different things. It can refer to what kinds of information the type system is
_capable_ of encoding/checking/etc. It can also refer to what kinds of
information it can handle _conveniently_.

With that in mind, I think Hungarian notation gets us three things.

(1) Hungarian notation allows the encoding of information that some type
systems cannot encode.

(2) Hungarian notation allows for easy, simple, readable encoding of
information that many type systems cannot encode _conveniently_.

(3) Hungarian notation presents type information at the point of _use_ , as
opposed to _declaration_.

Of course, Hungarian notation has disadvantages as well, but these are
thoroughly discussed all over the place. (Regardless, anyone who wants to talk
intelligently about Hungarian notation needs to read the Joel on Software
article.[1])

So, certainly HN (not referring to this site!) can be a workaround for an
inexpressive type system. It can also be a crutch for poor coding tools (as in
#3 above).

But even assuming awesome type systems and tools, I take issue with anyone who
answers the question with an unwavering, "Yes, so don't use HN." To a person
who makes such an argument, I must ask: are you really arguing that
identifiers need not give any hint as to what they are bound to? Because that
idea is kinda problematic.

[1]
[http://www.joelonsoftware.com/articles/Wrong.html](http://www.joelonsoftware.com/articles/Wrong.html)

P.S. Regardless of our answer to the original question, I think the fact that
arguments like this occur at all underscores a point: naming things well is
_hard_. I think that, in practice, this is one of the harder problems involved
in software design.

~~~
tel
I agree there's a grey line, but I'll take a stab at arguing that, yes, any
time you rely on your parameter name for semantics it is an unforced error.

I'll argue in metaphor to documentation with similar caveats. Documentation is
valuable and lack of it is confusing, but it's also impossible to ensure
documentation is current (unless it's encoded in some kind of static
verification system like a type system). This makes it so that your
documentation can be a liability when describing your program semantics as it
may simply be wrong.

Likewise, using semantically meaningful variables names can be misleading as
there's no way to ensure that the name is correct (unless you augment your HN
system with a static verifier, which I've heard of people doing). These names
can aid understanding but also can desynch and become confusing.

I think of both like the colors of wires on a breadboard. They can be helpful,
but fundamentally the value is in how the wires connect, not what their color
is.

~~~
gfodor
So to extend your breadboard analogy, Hungarian says you should have a
dictionary of colors and everyone should learn the colors, since coloring a
wire provides an informationally-dense, well suited way of expressing
information un-ambiguously to those versed in the wire color convention. Non-
Hungarian says you should attach a sticker with a small free-form written
sentence explaining what each wire does, and each person can have their own
general style for how they write these things, but the upside is they are
intuitive and everyone can read them without learning a color dictionary.

In either case the labeling/coloring of the wire is not fundamentally
enforcing the wire's use, but experts are better served by conventions that
may take time to adopt but convey information as efficiently as possible given
the medium they must be used in.

~~~
tel
I'm definitely not trying to defend the stickers, nor do I feel that taking
this argument to its ultimate point retains any logic. My argument is that
colors and tags are both potentially misleading, while the actual information
and structure comes out of their connectivity diagram.

Any time you put trust in anything else you're trading off safety for
convenience. That isn't a bad thing, but it's a tradeoff that should be done
conscientiously.

Finally, I think it's illustrative to think about what would happen without HN
(or, in analogy, without colored wires or little tags) because that drives you
to solve the harder problem of how to represent the connectivity diagram of a
breadboard directly, expressively, and in a way that solves the problems
people face when building them.

Analysis trumps documentation, except when it doesn't or is too expensive.

------
strlen
Perhaps the Platonic Ideal of Hungarian notation, but most code bases I've
dealt with that use Hungarian notation would simply do (not a real example,
but reminiscent of Win32 sockets APIs):

    
    
      bool bRateLimitViolation(uint32_t ui32Count, DWORD* dwOutPtrAddr, 
        std::string& sRefId);
    

In other words, this does nothing that hitting M-. in emacs+etags (or ctags
equivalent for vi(m), etc...) won't do. Instead, consider this code:

    
    
      bool checkRateLimit(uint32_t attempts_per_sec, 
        const std::string* resource_hash,
        DWORD* raw_violator_ip);
    

Note a few changes here: there's an implicit convention -- references are only
used if they're const references (otherwise pointers are used), out parameters
go left, names of functions are lowerCamelCase, while local variables and
parameters are snake_case (presumably cases can be UpperCamelCase), variable
names are meaningful (if a variable is too long I can hit M-/ or -- again --
the vi(m) equivalent, plus the Hungarian notation advocates aren't into the
whole brevity thing either).

What does work is using Hungarian notation to document exceptions rather than
the rules:

e.g.,

    
    
      enum {
        kMajorVersion = 3
        ...
      };
      
      SomeThingStupid* g_YesIUsedAGlobalHere;
    
    

I do also see this argument, but I am not sure even the best type systems
(e.g., Haskell) do a great job here. Yes, this:

    
    
      data HashString = HashString String
      unHashString (HashString s) = s
      hs = HashString "foo"
    

... is fewer lines than:

    
    
      struct HashString {
         std::string s;
         std::string as_string() const { return s; } 
      };
      auto hs = HashString {"foo"};
    

But fundamentally it may be easier for most to simply do:

    
    
      type HashString = String {- merely an alias, like typedef -}
    
      checkHash :: HashString -> Bool
      checkHash hashString = ....

~~~
strlen
s/const std::string* resource_hash/const std::string& resource_hash/

------
kybernetikos
Depends what you mean by insufficient.

You could argue that unit tests are workarounds for type systems that are not
sufficient to encode all of the acceptance criteria, but then again, encoding
all the acceptance criteria into the type system is massively difficult even
in languages that allow it, and most practical languages don't even try.

So yes, theoretically you could put all of the information you would otherwise
have put in prefixes (or unit tests!) into a 'sufficient' type system and have
it checked for you and that would be excellent, but then again, the
difficulties of creating a type system expressive enough to allow you to do
that, plus the difficulties of actually allowing you to specify all that
information in a sensible fashion would put some serious constraints on your
language, and many language designers don't feel that it's worth the effort.

Expressiveness and ease of use of the type system are just a small part of
programming language choice, and if other factors cause you to choose
something with an 'insufficient' type system, then Hungarian notation can
definitely help you avoid bugs.

~~~
rcthompson
The title here is a truncated version of the original question because HN only
allows 80-char titles. The question clarifies what I mean.

~~~
kybernetikos
The question seems to indicate that you think of Haskells type system as
sufficiently expressive (as opposed to some dependently typed language like
agda or coq). There's lots of possible semantic information that might be
useful to maintenance developers that would be a massive pain to encode into
Haskells type system.

More generally though, all of this seems to ignore what it might take to get
the type system to do something for you. Just because you can make the type
system check some property doesn't mean that it's a good use of your time. I
can indicate to other developers a bunch of useful information by simply
prefixing with c for Celsius if that's the convention for the codebase, but
actually encoding all the rules that go along with that into the type system
(e.g. no values allowed below -273.15) might not be a good use of my time.

I think in the final analysis, the answer is no, because I could have a
Hungarian prefix that meant 'is a string encoding a turing machine that halts
when run on itself', but that's not something you're going to want your type
system to try to check for you.

Hungarian notation is all about encoding arbitrary semantic intention about
variables into their name. Type systems can also encode arbitrary semantic
intention (if they're expressive enough), but the effort associated with using
them like that is much higher, just as the benefit is greater (since they're
automatically checked). Therefore, while there is overlap, from an engineering
point of view they don't cover exactly the same ground.

~~~
rcthompson
I don't necessarily think of Haskell's type system as sufficiently expressive,
it's just the most expressive type system that I'm familiar with, and it's
also the type system in which I've seen cool examples implemented, such as the
"catch SQL injections at compile time" example. I think you make some
excellent points, and perspectives like this are exactly what I was looking
for when I originally asked the question.

So, I accepted an answer that says "yes", but "yes" won by default because
every answer was either "yes", "no because HN is not even good enough to be a
workaround", or irrelevant.

------
zamalek
I can't be bothered to find the source, but I have read that:

Interestingly the Windows kernel type of Hungarian notation (iFoo) was set up
because of a misunderstanding of the Office team's process. The Office team
used the "sensible" form of hungarian notation (e.g. rPrice, cPrice - row and
column respectively). The kernel team decided to borrow the idea, however,
they didn't actually understand it completely when they did - leading to the
mess of hungarian notation in the windows APIs we have today.

All of this would have been avoided if we had language support for units of
measure[1] back then.

 _Edit:_ Seems like it was Spolsky's article:
[http://www.joelonsoftware.com/articles/Wrong.html](http://www.joelonsoftware.com/articles/Wrong.html)

[1]: [http://msdn.microsoft.com/en-
us/library/dd233243.aspx](http://msdn.microsoft.com/en-
us/library/dd233243.aspx)

------
amaks
Hungarian notation is not helpful at all especially when descriptive name
could be used, IDE with rich intellisense, or just good code organization.
Does anyone use hungarian notation besides Win32 APIs and Windows user level
code (FWIW the NT kernel does not use Hungarian notation).

~~~
strager
In Haskell, I often suffix lists with 's' or 'es':

    
    
        f (x:xs) = ...
        g args = ...
    

I also often prefix 'Maybe' types with 'm':

    
    
        mValue <- gets (Map.lookup key)

------
rcthompson
So, I came across this today, and my first throught was that HN would probably
like it. My second thought was "Oh wait, that's me 2 years ago."

------
wirrbel
one aspect to see with hungarian notation is also the general quality of
compilers and development tools such as IDEs or editors. Its just way easier
nowadays to jump to the declaration of a variable.

------
aswanson
I find hungarian useful only in shitty environs like Java and c++. In sane
dynamically typed languages, the cognitive burden is reduced enough that it
becomes overkill.

~~~
CmonDev
Now that's a cool oxymoron "sane dynamically typed"!

PS: AFAIK Java doesn't generally use Hungarian notation because of the power
of being statistically typed and the first/second best IDE in the world -
IntelliJ IDEA.

------
gfodor
Hungarian notation is not a workaround, but a _medium_ for expressing
semantics about something that cannot be done with the type system. It is
literally by definition this. If you think about it, this is the point of
naming variables in the first place. No matter how advanced your type system
is, there will be semantic information that it does not or cannot encode.
Hungarian serves to systematize the definition and use of this semantic
information. Hungarian is intended to maximally leverage the semantic bits of
information (in a literal sense) we can put into names to the fullest extent
possible. You're already making this tradeoff when you name something, it's
unavoidable, and Hungarian is an attempt to find a local optima that falls out
of specific assumptions around the purpose of names.

For example, in a simple discussion board app, you may have a string of text
in memory that has the following attributes:

\- It is null terminated

\- It is ASCII encoded

\- It contains properly formatted HTML

\- It has been sanitized against script injection attacks

\- It was received by an end user, not rendered from a template

\- It is one in a specific series of strings being processed by a overarching
process

\- It contains content that is intended to be injected onto a page

\- It has been matched to contains swear words that need to be censored

\- It is longer than a certain number of characters and hence needs to be
stored in a text field, not a varchar

\- It potentially contains copyright infringing text

I could go on and on. The point is not to get hung up on specific examples,
but to note that there is a _ton_ of semantic information about a piece of
data, any of which may be a) relevant to the user and b) irrelevant to the
goal or capabilities of the compiler's type system. In many of the cases
above, one has a hard time imagining what exactly a type system would even do
with this information, since it's fuzzy and not really something that should
be used to strongly enforce contracts.

This is where the system of Hungarian comes in: it forces you to build a
taxonomy of this semantic information so you have a way to leverage it and
reduce the amount of entropy a system can have in variable names.

The bottom line is you are going to name a variable _something_. So what do
you name it? If you think about it logically, it makes sense to name it in a
way that the most semantic information that is the most relevant is conveyed
to the user. English is poor at being efficient and packing in dense
information, but it is comfortable and ramp-up time is near zero. Hungarian
optimizes this differently, and trades ramp-up time and comfort for
efficiency, precision, and density. It is really the vim of variable naming
conventions, and like vim, it is not for everyone.

~~~
quanticle
_I could go on and on. The point is not to get hung up on specific examples,
but to note that there is a ton of semantic information about a piece of data,
any of which may be a) relevant to the user and b) irrelevant to the goal or
capabilities of the compiler 's type system._

But doesn't that just mean you have a bad type system? I mean, the entire
point of a type system is that it automatically checks these sorts of
constraints for you. Otherwise, why have types at all? Why not just use chunks
of bits?

~~~
gfodor
You can flip it around and ask why have names at all? What is the point of a
name? Why not just have randomString() choose your names? The name is an
important dimension that is provided by a programming language to describe
some piece of information. Hungarian is a way of leveraging this dimension,
which is separate from the type system, that is optimized for efficiency at
the cost of intuitivity.

Also, many of the examples I gave above have little relevance to enforcing
constraints on the use of data, which is the job of the type system, but are
still relevant to the programmer in helping them understand and properly work
on the code.

~~~
noblethrasher
Among the things that names/identifiers do:

(1) They act as indexes or discriminators.

(2) They describe.

(3) They prescribe.

(1) will be useful for at least as long as we program in a textual medium. (2)
is also useful, but type systems make it ancillary. (3) is dangerous since
it's merely an _ought statement_ that can easily get divorced from reality;
decent type systems obviate the need for that role.

~~~
gfodor
You're completely ignoring the role and importance of notation. Type systems
do not take on the role of efficiently conveying information to a reader
visually through symbology, which is part of the role of notation. Types in a
type system have names, after all, and the choice of those names is also one
of notation.

Good notation comes largely in part from good names and has little to do with
type systems. Hungarian is an exercise in notation, hence the name.

------
laveur
Absolutely not

