
Several types of types in programming languages - chesterfield
http://arxiv.org/abs/1510.03726
======
jasode
Over many years of reading various essays on " _types_ ", this is the list of
synonyms I've accumulated:

 _= "constraints"

="rules"

="policies"

="contracts"

="semantics" (separate concept of MEANING from underlying DATA BIT
representations)

="compatible semantics"

="capabilities"

="propositions" (and functions can be "theorem proofs")

="facts" about code that compiler attempts to "prove"

="tags" or "metatags" for compiler checking

="documentation" that's enforced by compiler specify what is NOT ALLOWED e.g.
"you can't do that"

="substitution", "compatibility"

="A set of rules for how to semantically treat a region of memory."_

Because the list is synonyms, many concepts overlap. In my mind, the highest
level of unification about "types" is the concept of " _constraints_ ".
Related topics such as polymorphism is a practical syntax pattern that arises
from data types' constraints.

Personal anecdote: I initially learned an _incomplete and misleading_ idea
about "types" from K&R's "The C Programming Language". From that book, I
thought of "types" as different data widths (or magnitudes). An "int" held
16bits and a "long" held 32 bits, etc.

It was later exposure to more sophisticated programming languages that a much
richer expressiveness of "types" is possible that the C Language "typedef"
cannot accomplish. For instance, if want a "type" called "month", I could
encode a "rule" or "constraint" that valid values are 0 through 11 (or 1 to
12). A plain "int" is -32768..+32767 and having " _typedef int month;_ "
provides a _synonym_ but not a new _policy_ enforceable by the compiler.

~~~
yarvin9
Urbit actually wound up with three different concepts which all want the word
"type," and can't all have it. Worse, none of these quite matches the PL-
theory meaning. So none of them wound up with the word.

One easy way to state a type is to state it as the range of a function. The
function accepts all values, and produces some subset of all values; ideally,
all values in its range are fixpoints, f(x) == f(f(x)). This is kind of nice
because you've defined both a constraint and a validation function. We often
want to validate untyped data and make it typed.

On the other hand, this range (Urbit's "span") is not the same concept as the
function itself (Urbit's "mold"). Moreover, when you send data over the
network, you don't want to send the range or even the function; you want to
send the _name_ of the function, which hopefully is bound to something
compatible on the other side.

This namespace becomes a third level of type (Urbit's "mark"): like a MIME
type, if a MIME type was an executable specification. One of the problems with
most PL type systems is that they predate ubiquitous networking; so they're
not designed to address the problem of describing external data, eg with MIME
types, XML DTDs, protobuf schemas or whatever.

~~~
wyager
No offense, but this is exactly the sort of nonsense I associate with urbit.

>ideally, all values in its range are fixpoints, f(x) == f(f(x)).

This doesn't make any sense whatsoever. The _only_ function for which this is
true is identity, and I assure you it takes more than just the identity
function to make a useful program. This isn't even possible unless the
function has the type "a -> a".

The only other thing I can see you trying to express here is some butchery of
set notation, where by f(x) you actually mean {f(x) | forall x of the correct
type}. Even then, what you're saying is still completely nonsensical.

I honestly cannot tell if urbit is some sort of coordinated trolling effort.

~~~
tel
Identity is not the only function for which that holds. As an immediate
example, constant functions have this property as well. More interesting
examples exist as well.

Also what was stated was not an equation for any old fixed point but instead
idempotence—a specific kind of fixed point. In particular, this describes
precisely the shape of a "validation" function so long as it either succeeds
on all inputs or has a notion of failure in either its computational context
or it's range ("span" I guess).

~~~
dllthomas
_' a "validation" function'_

Or "normalization"

~~~
yarvin9
Yeah, that's exactly right. I usually say "normalization."

------
asgard1024
I have realized for quite some time now that there are at least three
different uses of types in programming languages (with different goals),
namely:

1\. Specification of constraints on data - the function arguments and return
values. This corresponds to usage of types in logic. The goal here is the
correctness of the program (and ease of reasoning about it).

2\. A way to define polymorphic functions, i.e. functions that do the same or
similar operations with different kinds of data. Here we classify data as of
some type so that we can define two functions with the same name, but
different types, where the correct one is selected either during compilation
or run time based on the parameter type. The goal is conciseness, to avoid
explicit conditional statements everywhere or other ways of code duplication.

3\. Finally, to specify way how the computer is to store data, or generally
how to model abstract concepts (such as integers) within the computer. This
becomes less relevant with time (as languages increase in abstraction), but is
important use historically. For example, integers that can be modeled with
many different types in C. The goal here is to have type as a reference to
concrete and efficient representation of the abstract concept.

I think this is pretty much what the paper is suggesting, although IMHO not so
clearly.

~~~
arielby
I would split use #3 into two parts: 3a) disambiguation for builtin operators
- e.g. a float local needs to go in a float register, adding 16-bit integers
uses 16-bit addition, a struct local needs to go in an appropriately-sized
memory location. This is basically the same as #2. 3b) schema-ed data storage.
This allows laying out structured data in a compact way in memory (e.g.
`struct x { int a; int b; }` consists of 2 integers one after the other). This
functionality is useful even in unityped code.

~~~
asgard1024
While you are right that standard arithmetic operators are typically
polymorphic, but that doesn't mean there is another use case. Another example
I had in my head was little and big endian integers, or binary and decimal
integers. A language where arithmetic operators are not polymorphic is for
example assembler.

What I would say perhaps is that #2 and #3 are in a way complementary. In 2,
we describe a common operation with a more abstract type, while in 3, we
describe the specific representation with a more concrete type. So both #2 and
#3 address issues of naming things on different levels of abstraction, which
aren't directly related to the program specification or its correctness (use
#1).

~~~
arielby
The arithmetic operators in x86/x86-64 are certainly polymorphic (over word-
length, plus integer vs. x87 vs. SSE).

I think the distinction is that #3b-types, which denote the "encoding" of a
value, mapping it to its meaning, are often used as a basis for a #2-types
system (to parametrize operations).

~~~
asgard1024
> The arithmetic operators in x86/x86-64 are certainly polymorphic

I disagree - at least in machine code, the instruction code determines the
type of the operands. If they were to be polymorphic, the type of operands
would determine the specific instructions that are to be used (and so you
could for example reuse the same code for different word lengths). Maybe
modern assemblers can do that (and have a generic instruction name for
addition, for example), it's been 20 years since I programmed in x86, I only
recently used mainframe assembler where it is as I describe.

Polymorphism is all about names. You want the same name (and by extension, the
same code) to refer to potentially different operations on data.

> I think the distinction is that #3b-types

I am still not sure how it is different from #3a in your definition. When I
say 16-bit signed integer, I can also mean this as an encoding of abstract
"integer number". The whole point of declaring type in sense of #3 is to be
specific about encoding. In mathematics, you (typically) don't care about
that; but you sometimes care about use case #2, which is typically dealt by
mapping to more abstract concepts with some morphism.

------
chubot
How does this compare with this paper? They are both discussing the meaning of
the word "type".

[http://dl.acm.org/citation.cfm?id=2661154](http://dl.acm.org/citation.cfm?id=2661154)

There was a good blog post by Stephen Kell that I can't find now
([https://www.cl.cam.ac.uk/~srk31/blog/](https://www.cl.cam.ac.uk/~srk31/blog/))

"The concept of "type" has been used without a consistent, precise definition
in discussions about programming languages for 60 years. In this essay I
explore various concepts lurking behind distinct uses of this word,
highlighting two traditions in which the word came into use largely
independently: engineering traditions on the one hand, and those of symbolic
logic on the other. These traditions are founded on differing attitudes to the
nature and purpose of abstraction, but their distinct uses of "type" have
never been explicitly unified. One result is that discourse across these
traditions often finds itself at cross purposes, such as overapplying one
sense of "type" where another is appropriate, and occasionally proceeding to
draw wrong conclusions. I illustrate this with examples from well-known and
justly well-regarded literature, and argue that ongoing developments in both
the theory and practice of programming make now a good time to resolve these
problems."

------
chadaustin
Wow, apropos! I was just thinking about this in the shower.

For most of my life, I equated types with sets of values, but after learning
Haskell and working with higher-kinded types, type classes, and existential
types, I realized I don't know anymore what a type _is_. I know that type
systems provide proof that certain classes of operations are impossible (like
comparing a number to a string, or dereferencing an invalid reference).

It's pretty mindbending to use existentials or GADTs and pull two values of a
record and not know anything about those values except that, for example, they
can be compared for equality.

    
    
      data HasTwoEq = forall a. Eq a => HasTwoEq a a
    
      equalOrNot :: HasTwoEq -> Bool
      equalOrNot (HasTwoEq x y) = x == y
    

The example is contrived, but it illustrates the point that the types of x and
y are not known, _except_ that they can be compared.

That's not the kind of thing you can express with, say, Java or Go interfaces,
but it makes perfect sense once you start to break down the mental walls
you've built in your head over the years.

I'm thrilled to see a growing body of accessible* PL and type theory
literature, because these things are important to helping us develop software
at increasingly large scales, and it's clear that very few people -- including
myself! -- know enough about this topic.

* e.g. [https://cs.brown.edu/~sk/Publications/Books/ProgLangs/2007-0...](https://cs.brown.edu/~sk/Publications/Books/ProgLangs/2007-04-26/)

I tried to do my part here: [http://chadaustin.me/2015/07/sum-
types/](http://chadaustin.me/2015/07/sum-types/) and
[http://chadaustin.me/2015/09/haskell-is-not-a-purely-
functio...](http://chadaustin.me/2015/09/haskell-is-not-a-purely-functional-
language/)

~~~
woah
Do these things actually help us develop software at increasingly large
scales?

~~~
chadaustin
Absolutely. As a codebase grows and a larger and larger number of users depend
on it, you want to be able to statically verify things like "I never compare
an experiment ID to a customer ID" or "timeouts are always precisely specified
with their units" or "I never issue a MySQL query from a retriable Redis
transaction" or "this unit test is 100% deterministic" or "all code handles
the case that this optional field is missing", just to pick a few specific
examples that I've run into.

~~~
vezzy-fnord
Those sound like jobs amenable to design by contract and information flow
control, not necessarily type systems and type theory, per se (though there is
some overlap).

~~~
implicit
How is "information flow control" different from a type system?

I've always considered the primary purpose of type systems to be restricting
how data can flow through a program.

~~~
vezzy-fnord
IFC is orthogonal to type systems, having also been modeled in terms of
axiomatic semantics (Hoare logic) independent of a type system. It can also be
practically implemented through runtime monitoring, partial evaluation and
various hybrid approaches. Type systems are certainly _one_ way of providing
foundations for IFC, but few languages have it as a first-class construct.
Notable exceptions are SPARK (based on Ada) and Jif (based on Java).

See also "A Perspective on Information-Flow Control":
[http://www.cse.chalmers.se/~andrei/mod11.pdf](http://www.cse.chalmers.se/~andrei/mod11.pdf)

~~~
elcritch
Interesting, I'll definitely have to check this out. Any input on whether this
has been applied and/or could be applied to actor paradigms of computation?

It seems like modeling data flow via messages would be pretty natural
semantics for something like this. But first I need to dig into what Hoare
logic is and how it applies.

------
brianberns
I don't see why this has to be complicated. A type is just a (potentially
infinite) set of values. Notice, then, that a (pure) function is just a way to
map one set of values to another set of values.

~~~
mikekchar
It is really unfortunate that people down vote something that is wrong without
actually saying what is wrong about it. I think it would help, though, to
acknowledge that your opinion is unlikely to be shared with others and to
append something like, "Can anyone explain where I'm going wrong?"

Now, I don't actually know much about type theory at all and came here to try
to remove some of my ignorance. As such, I can't really help you much, but I
will try my best. You might want to look at:
[https://en.wikipedia.org/wiki/Type_theory](https://en.wikipedia.org/wiki/Type_theory)

If I read it correctly (and if it is written correctly), a "term" is
essentially anything with a "type". So 1, 2 and 3 are terms and they have a
type. A "function" is something that takes terms and "reduces" them to other
terms. You can "apply" a function to some arguments in order to reduce them.
So if we have a function, "myfunc x", we can apply that to 2 and have "myfunc
2". Both "myfunc x" and "myfunc 2" are terms, but they have different types.

Let's make a function that doubles an integer (let's call the type "int"). So
"double x" is a term with a type of "int -> int": it maps an int onto an int.
"double 2" is a term with a type of int. The term "double double 1" can be
"reduced" to the term "double 2". "double 2" can be "reduced" to the term "4".
4 is known as a "normalized term". It can not be reduced any further.

Now, you may be saying "How is this different from what I said?" From what I
can tell, the main difference is that a type is _not_ a set of "values" (or
normalized terms). Functions applied to terms also have a type. Functions with
free variables also have a type. Anything that is a term in an expression has
a type. So 1, 2 and 4 are of type int. "double 1", "double 2", and "double
double 1" are also all of type int. If we had a function called "fortytwo"
that just returned 42, it would also be of type int.

It may seem like a small (possibly insignificant) difference, but it really
depends on your point of view. We can completely ignore the type if we want
to. Many programming languages make that very easy. We could also subscribe to
the idea that a type is just a set of values, like you said. We can write many
fine (and many not-so-fine) programs with that point of view.

These views limit the kind of analysis we can do on the code that is written,
though. The point of using type theory in computer languages is to help us
reason about the code analytically. So my answer to your (unasked) question
is, it doesn't _have_ to be complicated. It proves useful to include all of
the nuances that type theory gives us. Intentionally putting blinders on to
make it seem simple, just deprives us of potentially useful capabilities.

Hopefully someone who actually knows something about this topic will chime in
and correct whatever horrible mistakes I've made.

~~~
brianberns
You're right - I am thinking "How is this different from what I said?".

> Functions applied to terms also have a type.

Yes, and those types are sets of values. For example, "int -> int" is the set
of all functions that map integers to integers. (Remember that functions are
also values, just like any other value, in functional programming.)

------
kpmah
The terminology we have around types is really messed up. What's most annoying
is that 'static type system' and 'dynamic type system' are often seen as
opposites of each other, but the 'type' in each of the names is a completely
different concept.

