
An Old Article I Wrote: "What To Know Before Debating Type Systems" - blasdel
http://cdsmith.wordpress.com/2011/01/09/an-old-article-i-wrote/
======
xiongchiamiov
Everyone saying that there's no real definition of strong and weak typing
always confuses me. In my view, strong typing prohibits implicit conversion
between types, and weak typing allows it.

This is, of course, a scale - for instance, Python, generally viewed as a
strongly-typed language, allows you to say something like

    
    
      foo = 'lolsomestring'
      if foo:
          [code]
    

where a string is implicitly converted to a boolean (as opposed to using
bool()). However, an exception is thrown when attempting to add 5 and "5",
where a language like PHP will happily provide an integer result.

~~~
jholman
> In my view, strong typing [means such-and-such].

In your view, it means this. In someone else's view, it means something else.
Unlike "static", "dynamic", "explicit", "implicit", "structural", even "duck",
all of which people more-or-less agree about (well, sort of), there's this
problem with the definition of the "strong" or "weak" typing.

1) If you use them in conversation without a definition, there's a very high
chance that they mean something different to the listener. (Unless of course
it's a listener with whom you've already had this conversation.) 2) If you
stop to define them, the whole conversation gets derailed to argue about the
definitions.

That is why people say these words "have nearly no meaning at all". Does that
help your confusion?

So, if you want to communicate about type systems, your communication will
contain less misunderstanding if you simply omit the words "strong" and
"weak". And for myself (although this may be too snotty a move for you), if
someone else uses them, I refuse to hear them, and ask them to use different
words and define those words.

My experience, in practice, is that when people say "strong" and "weak", I do
indeed find that as cdsmith says, "strong" mostly corresponds to "it makes me
comfortable".

~~~
derefr
> My experience, in practice, is that when people say "strong" and "weak", I
> do indeed find that as cdsmith says, "strong" mostly corresponds to "it
> makes me comfortable".

But even people who are very comfortable in C, do not say that C is strongly-
typed.

In my experience, people agree on _which_ languages are strongly- or weakly-
typed, and that can be used to reverse-engineer the definition, based on the
properties those languages differ on. Assembly, C, and PHP are all weakly-
typed, at the very least, and their proponents agree with this. They don't
agree on _why_ they are weakly-typed, simply that they are.

The original definition I ever heard for weak-typing, back when C and
Assembler were the only things that had it, was "a weakly-typed language
allows you to take the _raw, in-memory representation_ of data of type A, and
perform operations on it as if it were data of type B that had an equal _raw,
in-memory representation_." The most famous example of this is Quake's fast
invsqrt(), where the bytes making up a float have int operations applied to
them, as if they were bytes making up an int.

Of course, PHP doesn't have this capability, but we still call it weakly-
typed. So the definition must have moved on from its previous strict form.
What does PHP allow you to do, that gets it called weakly-typed? This:

    
    
        echo "(" + 5 + ")"; # prints "5"!
    

PHP, here, is taking the Strings "(" and ")" and interpreting them as numbers
for the purposes of the addition operator. However, the strings "(" and ")"
are not _valid_ numbers—but PHP is fine with this, and uses the default
numeric value of 0 to represent this invalidity.

The similarity with the original definition, is that the data is transformed
from one type to another, not _implicitly_ , and not that the _raw_
representation of the data is used, but rather _without a guarantee that the
datatypes have the same capacity for informational entropy_. In other words,
casting "(" to a number _silently loses data_ , just as storing an _int_ in a
_char_ variable in C silently loses data. This fact seems to be universal to
all languages that get called "weakly-typed", and non-existent in any language
that has never had that epithet applied to it.

It has nothing to do, notice, with whether a language will implicitly cast
values—as long as all implicit conversions happen _upward_ to types that have
"room" for all the information in the original representation (char -> int,
int -> String, int -> float) the language is still strongly-typed. And notice
that the datatype _itself_ being lost in a typecast (String -> Object) cannot
be a reason to call a language weakly-typed because, in an interface-like cast
like this, all the _data_ is still retained, and its original form can be
reclaimed simply by casting it back (Object -> String).

~~~
blasdel
I like your style but your definition doesn't hold — there's plenty of
implicit downcasts that are still 'strong', like boolean tests on data bigger
than 1 bit or gt/lt comparisons between different numeric types — a type
system that allows those is totally sound.

It's not your fault though, the problem is really just that 'strong' and
'weak' are just fundamentally _unsound_ nomenclature.

~~~
derefr
> The problem is really just that 'strong' and 'weak' are just fundamentally
> unsound nomenclature.

As I said, we all know, and agree upon, which languages _are_ "strong" or
"weak", whatever those terms mean. Thus, the two words really do partition the
world in some way; they do real Bayesian work. We just have to figure out what
that work _is_.

> there's plenty of implicit downcasts that are still 'strong', like boolean
> tests on data bigger than 1 bit or gt/lt comparisons between different
> numeric types

Notice, above, that I said that explicitness _isn't_ a part of the definition,
as much as people like to think it is. In fact, it could be completely
explicit at all times that you're casting (int)s to (char)s—and that would
still be weak typing. The real property a type-system has that tells you that
it is "weak" is that it allows casts between types that _do not have valid
surjections_. "(" has no number it maps to—and yet the type system pretends it
does. IEEE754 Infinity has no integer it maps to—and yet the type system says
that's alright. (Note that in this special case, the _hardware_ notices this
loss of information, and sets a hardware exception flag.)

Operations retain strong-typing if they have _well-defined one-way
transformational characteristics_ , such as when casting (int) to (bool) using
C's "!= 0" branching-requirement. "x % 6" compresses the field of integers
into a set of six elements—but it does this in a way which maps every x to a
valid new value on the "x % 6" ring. However, Infinity _does not_ have a valid
(int) representation—any value the computer does choose to represent it with
will be "wrong."

To sum up: if, in your language runtime, there exists at least one type-
transformation-implementation that goes like this:

    
    
        try
          a_entropy = f( a )
          B.construct( a_entropy )
        catch( DomainError e ) # A is not a well-defined B
          B::SomeDefaultElement # who cares!
        end
    

...then you have weak typing.

~~~
blasdel
Except that now you're no longer talking about something that requires the
fuzzy handwaving of 'strong' vs. 'weak', which lack the _well-defined one-way
transformational characteristics_ of which you speak :)

The type-system property you're dancing around is _soundness_. A well-defined
term that you'd know if you'd _read the fucking article_ , which was written
explicitly to teach it to you.

As for your example of what you consider 'weak' type-checking, I can see only
two ways to make it 'strong'. The normal thing to do is a dynamic type-check
resulting in a runtime exception if there's an invariant — in Python
_int(float("Inf"))_ raises _OverflowError_ and _int(float("NaN"))_ raises
_ValueError_ — would that satisfy you as 'strong'? The other option would be
to use BigFloats and a fancy static type checker to ensure that you can never
construct a zero value that could be processed as a float. Hilariously, GHC
takes neither of these approaches:
<http://hackage.haskell.org/trac/ghc/ticket/3070>
[http://hackage.haskell.org/trac/ghc/wiki/Commentary/CmmExcep...](http://hackage.haskell.org/trac/ghc/wiki/Commentary/CmmExceptions)

~~~
derefr
> Except that now you're no longer talking about something that requires the
> fuzzy handwaving of 'strong' vs. 'weak', which lack the well-defined one-way
> transformational characteristics of which you speak :)

You missed a (perhaps subtle) point: the thing that has "well-defined one-way
transformational characteristics" is an _operation_ (an algorithm), not a
type-system. A type-system can be thought of as a set of operations, a digraph
specifying all the ways to go from each type to each other type in your
system. This set has the property of being _strong_ by default—but becomes
_weak_ if any of the operations is non-surjective. This is separate from a
_sound_ system, which is one that _does not permit_ non-surjective operations.

For an example of the crucial difference: I can program in "a _strong_ subset
of C" (C with all the non-surjective functions/casts removed.) I cannot
program in "a _sound_ subset of C." There is no such thing.

~~~
blasdel
There is still an even more subtle point that you are absurdly close to
understanding, and that the article touches on a bit — there really is no such
thing as a type system, only a set of times and places where types are(n't)
checked — _operations_. You even say so yourself!

You'll find programming language theorists using the phrase "a sound subset"
with no qualms.

------
blasdel
This wonderful article has helped me help a great many people over the years
and I'm glad he's rehosted it again after the original domain got snatched.

There is only one thing I find missing — I think it would be prudent to avoid
the use of "typed" or "typing" wherever possible. I much prefer to refer to
properties being "typechecked" or systems using "typechecking" — it's
otherwise quite easy it forget that the types are only manifest when and where
they are observed.

------
kenjackson
Nice article. Should be renamed, "What to know before debating type systems...
and then don't debate them".

~~~
jerf
Oh, no, type systems need to be debated. It's just that some care about
definitions needs to be taken lest you end up with a megabyte or two of text
between many participants going on for several days with no two participants
actually sharing a definition, and thus with none of the participants actually
understanding each other.

My personal preference is to identify the language is question, which is
usually the important and relevant point anyhow. Strong vs weak typing? Too
vague. Haskell vs Python or Java vs C++, now we've got something we can sink
our teeth into and we're not just spinning in space.

------
steveklabnik
Hey, thanks for posting this somewhere. When I found that it was gone, I
grabbed a cache and posted it with as much attribution as I could. Now I can
actually give you a real link. :)

