
A Primer on Type Systems - hacknrk
https://www.cs.uaf.edu/users/chappell/public_html/class/2018_spr/cs331/docs/types_primer.html
======
DonaldPShimoda
I really liked this brief overview of type systems overall, but I especially
liked that the author took the effort to move away from the terms "strongly
typed" and "weakly typed". Too often these terms cause confusion. Even worse
is when people confuse them for "statically typed" and "dynamically typed"
(respectively), leading to people making claims like "Python is weakly typed"
and "C is strongly typed" (which, depending on your definitions, are not true
statements).

This material wasn't covered in my core curriculum for my CS degrees, which I
think is a shame. Some students might find this stuff interesting, and I think
introducing it to undergrads at least a little bit and saying "By the way,
this is a thing you can research" could help some students with breaking into
the academic side of CS.

~~~
msla
It's more interesting to answer two questions:

1\. Can I easily subvert this type system?

2\. Can I make this type system map to concepts in the problem domain?

In C, the answer to 1 is "yes" and the answer to 2 is "to a limited extent".
In fact, in C, you can subvert the type system so easily that it's considered
impossible _not_ to, to the extent that using the standard library requires
throwing type information away, by using pointers-to-void or generic "char"
buffers, and as for the second... well, Apps Hungarian notation exists for a
reason. (Systems Hungarian exists for a reason, too, but I'll try to keep the
insults out of this post.)

[https://en.wikipedia.org/wiki/Hungarian_notation](https://en.wikipedia.org/wiki/Hungarian_notation)

The first question is interesting because it tells you if the type system can
be used to enforce invariants, and the second question is interesting because
it tells you if those invariants will be _worth_ enforcing. If you're writing
in Standard Pascal, and your type system enforces array length as a part of
array type, well, to quote a metal album: "So Far, So Good... So What?" What
does that tell me about whether my program is semantically correct as regards
the problem domain? Similarly with enforcing size specification types:
Differentiating between a short, an int, and a long tells me Sweet Fanny Adams
about what values of those sizes _mean_ in my program. Typing isn't "strong"
if it's enforcing rules which ultimately make no sense, and if it's "sound",
well, mathematical proofs of something I'm not interested in are themselves
uninteresting.

~~~
bogomipz
>"well, Apps Hungarian notation exists for a reason. (Systems Hungarian exists
for a reason, too, but I'll try to keep the insults out of this post.)"

Can you elaborate? I read the wiki page on Apps Hungarian. I am not following.
How does/would this address issues with the type system in C?

>"Can I make this type system map to concepts in the problem domain?"

I didn't really follow your Pascal example explanation of this, you first
mentioned "because it tells you if the type system can be used to enforce
invariants." But the went on to talk about array lengths. How does whether or
not the invariants are enforced by the type system help you answer the
question of whether or not it maps to the problem domain? Might you be able to
provide another example? Thanks.

~~~
msla
> Can you elaborate? I read the wiki page on Apps Hungarian. I am not
> following. How does/would this address issues with the type system in C?

Apps Hungarian allows you to encode information about the value a variable
contains which you can't capture in the type system.

For example, imagine you're writing a program which handles user input. You
need to distinguish between Sanitized Strings and Unsanitized Strings because
if you don't, you open yourself up to security problems. Absent a way to
extend the type system to put this information in a type, you do this:

    
    
        char *usStr; /* An Unsanitized String */
    
        char *snStr; /* A SaNitized string */
    

You add a couple letters to the beginnings of the variable names to encode
what the type system doesn't.

You can see that if you have a line of code which says:

    
    
        snStr = usStr;
    

it is wrong, and your brain can print a full-color warning message with
highlighting.

Bottom Line: Apps Hungarian makes it easier for _you_ to be the type checker.

> I didn't really follow your Pascal example explanation of this, you first
> mentioned "because it tells you if the type system can be used to enforce
> invariants." But the went on to talk about array lengths. How does whether
> or not the invariants are enforced by the type system help you answer the
> question of whether or not it maps to the problem domain? Might you be able
> to provide another example? Thanks.

Array lengths are an invariant. They're just one I don't care about, because
nothing in the problem domain is modeled by how long a given array is. The
_contents_ of those arrays are much more important, and the types should vary
based on that, instead.

A somewhat simplistic example:

You have a program which prints travel itineraries for people who must be
addressed _correctly_ along the way, where _correctly_ means using their
prenomial and postnomial titles. For example "Dr. Mary Richards, Ph.D." would
be insulted to be merely "Mary Richards" or "Dr. Mary Richards, MD". You also
have to print route information, which is a list of towns and states, like
"Baltimore, MD".

Therefore, you have to ensure three things: Prenomials are always _prepended_
to names, postnomials are always _appended_ to names, and town names are
always suffixed with state abbreviations. Oh, and if you print a name without
prenomials and postnomials, you'll not escape unscathed.

Wouldn't it be nice if the type system were capable of keeping track of which
strings were prenomials, postnomials, personal names, town names, and state
abbreviations, to ensure you never try to append a state abbreviation onto a
personal name, and never try to print a personal name alone? _Those_ are the
kinds of _interesting_ invariants I'm talking about.

~~~
bogomipz
Thanks for the thorough explanations. This is really helpful. Cheers.

------
munk-a
Interestingly "statically typed" and "dynamically typed" are becoming less
exclusive, there are code sniffers that can run static analysis of code to
detect potential type incompatibilities during inspection and PHP even now has
a group of parse errors that will be raised if incompatible classes
definitions are detected i.e.

    
    
        class Foo { function do(): int { return 1; } }
        class Bar extends Foo { function do(): string { return 'a'; } }
    

while PHP will also do dynamic type checking during execution. I don't think
this is surprising since static type enforcement has never been a requirement
of a compiler/interpreter of a executable code page without run-time type
checking - it's just a sort of bonus utility that is packaged into the process
to make things easier. Whether your static code analyzer is in gcc or is just
some separate utility it all works out in the end, so any sort of code linters
are essentially doing a static type analysis (among other things, and usually
language constraints prevent this analysis from being as much of a solid proof
as languages designed to specifically be statically typed).

So I pretty strongly disagree that typing systems can either be static or
dynamic, because I think the separation of manifest vs. implicit is a false
one, lots of languages (again) support partially explicit type declarations
while allowing some sort of support for a generic declaration.

~~~
sifoobar
Snigl [0] has statically typed function signatures, struct fields, bindings
etc; but values still carry their type.

It also traces code before running it to eliminate as many type checks as
possible.

The categories aren't very helpful from my perspective, same goes for compiler
vs. interpreter. All it leads to is endless arguing about definitions and
discouraging of novel approaches.

[0]
[https://gitlab.com/sifoo/snigl#types](https://gitlab.com/sifoo/snigl#types)

~~~
munk-a
I agree, and I don't think this is a bad trend, once upon a time I think
static vs. dynamic was a clear distinction, that distinction is being worn
down as we learn more about the value and costs of those approaches - and as
that happens we're compromising the approaches to gain more of the value.

In PHP the (mostly _) JIT interpretation exposed a weakness where infrequently
executed pieces of code were harder to have confidence in. Unit tests and such
have raised our confidence but explicit typing can also raise our confidence
in an easier manner, both are good to have but having access to more tools
just makes our lives easier.

_ In fact PHP can be pre-cached in bytecode now (which is basically just a
traditional compiled approach) and in the default run mode PHP code files are
interpreted and then cached to minimize the number of times code needs to be
interpreted. The lines between compiled vs. interpreted are getting really
fuzzy now and were pretty fuzzy as far back as Java bytecode (which is as far
as my memory goes, others may have a better handle on earlier experiments in
partial compilation)

~~~
naasking
> I agree, and I don't think this is a bad trend, once upon a time I think
> static vs. dynamic was a clear distinction

I think there's still a very clear distinction, but there's often value in
having access to both in one language. The dynamic typing features can often
make up for limitations of your type system, for instance.

------
fpoling
C++ templates are not typed structually. The compiler does not assign types to
them at all. Rather they are treated as macros and the type assignment and
checking happens during the expansion. For example, one can write a template
that has no valid expansion, but as long as nobody uses it, the compiler
accepts it.

In sense templates are similar to dynamic languages. Except in C++ it is the
compiler, not runtime, that evaluates the templates.

------
trurl
> There does not seem to be any equivalent of the notion of soundness in the
> world of dynamic typing. However, we can still talk about whether a dynamic
> type system strictly enforces type safety.

My understanding is that there are well understood definitions of soundness
for dynamically typed languages. The soundness theorem will generally be
weaker ("your program will evaluate to a value or abort on a type confusion")
but it is certainly possible to be sure a given dynamically typed language
definition will never make mistakes like confusing an integer for a pointer.
Though in the scheme of things such theorems are not much weaker than what you
get for a statically typed language with exceptions ("your program will
evaluate to a value with the given type or throw some exception").

~~~
naasking
It actually is significantly "weaker". You make the wrong analogy when you say
the dynamically typed case maps to a statically typed one with an exception
(which presumably is a value in your language).

The correct analogy is to a segfault, meaning the program exited abnormally
because it's inconsistent in an important way. It's the reason statically
typed languages are faster than dynamically languages: expressions have a
fixed meaning than enables optimisation guarantees you can depend on.

------
gryfft
As someone who is extremely unfamiliar with category theory, types, and C++,
something is bothering me that I would love to have clarified.

Polymorphic function templates are introduced with addEm:

    
    
      template <typename T, typename U>
      T addEm(T a, U b)
      {
          return a + b;
      }
    

It seems to me that this is incorrect. From the Wikipedia page on C++
templates [1], a polymorphic function template is described like this:

    
    
      template <typename T>
      inline T max(T a, T b) {
          return a > b ? a : b;
      }
    

The first example seems to define a template so that the function takes two
arguments of _different types,_ then relies on the compiler to reject inputs
for `+' of types not defined for it. (This does not seem to track with the
explanation following it, which indicates that addEm should take two arguments
of the same type.)

The implication in the first example seems to be that if you want your
template to be compatible with more than one type, you must explicitly
indicate so in your template parameters. However, the Wikipedia entry seems to
indicate that C++ implicitly instantiates object code for as many types as the
template is called on at compile time, a function instantiated for each of the
types (addEm<string>, addEm<int>, addEm<double>, etc.), and a compilation
error resulting if inputs of different types are passed to the template.

Am I missing something? For reference, here is how I would intuit the first
example should look, based on my current understanding.

    
    
      template <typename T>
      T addEm(T a, T b)
      {
          return a + b;
      }
    

I'll add that given my lack of knowledge of C++, it could well be that the
template syntax allows types T and U to be the same in a particular
instantiation of the template, but my understanding is that at least they
_can_ be different, and why would you intentionally do that in this example?

1\.
[https://en.wikipedia.org/wiki/Template_(C%2B%2B)](https://en.wikipedia.org/wiki/Template_\(C%2B%2B\))

~~~
AnimalMuppet
I believe that your understanding is correct.

    
    
       template <typename T>
    

is a template with one "type parameter" (may not be the correct technical
term), whereas

    
    
       template <typename T, typename U>
    

is a template with two type parameters, _and they can be different_. (That's
why you give two parameters, so that they don't _have_ to be the same.)

~~~
gryfft
I appreciate the sanity check with a depth and viscerality I can't properly
express.

I also apologize to you and anyone reading for any inaccurate terminology on
my part.

------
styfle
I have to post this parody lightning talk because Gary Bernhardt is hilarious
and the slides demonstrate some of the (exaggerated) confusion around static
vs dynamic types.

[https://www.destroyallsoftware.com/talks/useing-youre-
types-...](https://www.destroyallsoftware.com/talks/useing-youre-types-good)

------
g33k247
I found this article helpful. Since C is a "strongly typed" language it helps
to review what is meant by type. In addition, like #DonaldPShimoda, I liked
that the author challenged to the existing notions of type. Good read.

~~~
ternaryoperator
By all definitions, C is not strongly typed.

The author just points to selected links, without going to the links
themselves. If you go to the link he points you to [1] where C typing is
defined, you find that two of the three links there that claim C to be
strongly typed are broken and the third link immediately qualifies the
assertion with: "To be precise, relatively strongly-typed app-specific
dialects of the not-very-strongly-typed language, C."

[1] [http://wiki.c2.com/?StronglyTyped](http://wiki.c2.com/?StronglyTyped)

------
whage
The way the author describes static and dynamic typing seems to imply that
static == compiled and dynamic == interpreted. I probably misunderstand
something, can someone help me clear things up?

~~~
zozbot123
Dynamic typing boils down to runtime tagging of any value that could be
referred to by a "dynamically-typed" variable. That's not "interpreted" in a
strict sense, but it requires dispatching on _every single value access_ ,
which will definitely affect performance.

------
amyjess
This is an excellent article.

The only thing I'd want to see added to it is a section on gradual typing, a
la Perl 6.

