
What I Want from a Type System (2016) - grzm
https://gist.github.com/halgari/f431b2d1094e4ec1e933969969489854
======
ufo
There are some important requirements if you want to have "full type
inference" in your language. In order:

1 - For each term in the language, there must be an unique "best type" that
describes it. (principal type property)

2 - If principal types exist, then you also need to show that a type inference
algorithm exists. (decidability of types)

3 - If the type inference algorithm exists, it must execute in a reasonable
amount of time

Many of the type system features that one might want to add to a language can
violate one of these properties, specially when you start combining them
together into a single type system. In particular, subtyping is particularly
tricky.

Finally, there is also a problem that you can never get away from with full
global type inference schemes: type error messages tend to take the form of
"type A is not equal to B" instead of "found type A, expected type B". Then a
type constraint cannot be met, the error message cannot tell you which of the
two sides is the wrong one. Additionally, the way that type constraints
propagate means that the error message might not necessarily point to the true
source of the type error.

For these reasons, even in language with full type inference it is usually a
good idea to at least have type annotations for top level functions.

------
_halgari
Author here, a few things to keep in mind:

Firstly, I wrote this about 3.5 years ago. I was wondering why people were
suddenly commenting on it and now it all makes sense.

Part of the fun of writing articles about this is watching everyone argue
about what language can be forced into representing types in a given way. Yes,
I assume in any situation that if I want a feature X in a type system, that
_somehow_ Haskell can be forced to give me that feature, but that doesn't
necessarily mean it will fit with the ecosystem of the language or that that's
the only feature I'm looking for in the language. So saying "someone hasn't
done their homework if they think X can't be done" isn't relevant, what is
relevant that I'm not aware of a language that provides the type system
features I want combined with acceptable set of trade-offs.

So anyways, I'll stick around for awhile and see if I can answer any
questions. Thanks for the discussion, all!

~~~
contravariant
I'm somewhat curious why you'd want both (2) and (3) to hold. Isn't it
somewhat contradictory to want types to not just merely denote the structure,
while requiring that everything with the same structure satisfies the type?

Maybe it's because I'm influenced by C# but viewed from that perspective it
would be like requiring you to explicitly declare that e.g. some value is a
ProductID but when you're declaring a type you wouldn't need to declare it is
a Person, provided it simply implements the right fields (in C# you would have
to explicitly implementing some interface to clarify that first-name and last-
name do indeed refer to a person and not, for example, the head and tail of a
list of names). This does mean that any external code can't implement your
interfaces though, which is a bit annoying, though fixable.

~~~
jdmichal
That's commonly called "duck-typing". And since you're discussing C#, I'll
throw in that Typescript (another MS language) has duck-typing for interfaces.
You can explicitly implement an interface, in which case the compiler will
enforce that the interface is implemented. But you can also implicitly meet an
interface.

------
cjfd
I don't agree with this at all. Having the types of every variable written
explicitly in the code is important documentation that answers very quickly
the question 'what is the type of this thing' without having to look
everywhere and anywhere to perhaps find out that it could be any one of three
types. If you do not know the type of something you also do not know what
operations are available on that thing.

~~~
pjc50
It looks like there's two very different styles going on here. Most of us are
used to the OO-type approach (Java, C#) where the operations are "inside" the
object and defined on it. This article is coming from a very different place
where the operations are "outside" the typed things, which are more like
structs, and the programmer can write any operation that takes structs which
have the right fields.

The author doesn't make any comment on dispatch let alone multiple dispatch.

They appear to be writing for the kind of quasi-schema'd information
bureaucracy that's very important outside of SV. Lots of data in XML that
never quite lines up with an object model.

The Haskellite approach would also say that "what operations are available on
that thing" is the wrong way round; you define your operation, and the type
signature tells you what kind of things will work with it.

~~~
gridlockd
> The Haskellite approach would also say that "what operations are available
> on that thing" is the wrong way round; you define your operation, and the
> type signature tells you what kind of things will work with it.

Isn't the common case that you already have "the thing", you have a rough idea
what should be done, and you want to see what operations/transformations are
available? That's when autocomplete and a static type system comes in handy.

Some (more obscure) languages unify this, i.e. f(x) is the same as x.f(). I
don't think there's a "wrong way around", the difference is a technicality.

~~~
nicwilson
The difference not just a technicality it results in a large difference in
readability as it linearises nested calls i.e. it transforms it to a
functional pipeline (a la unix pipes).

D has this (called Uniform Function Call Syntax (UFCS) ) and also has optional
parenthesis for call with no more arguments which turns this:

send(getInvoice(calculatePrice(x)));

into

x.calculatePrice.getInvoice.send;

It also eliminates the need for extension methods since any function that
takes any given type as its first argument is automatically callable as if it
were a method of that type, this also applies to templates with the type of
the first argument determined by a template parameter.

As for a "wrong way around", you can overdo it both ways, use your judgement.
I prefer UFCS for more heavily nested calls which you get when you do a while
lot of data transformations all in a row. Some times I use it for a single
call but not very often.

~~~
Symmetry
And the weird thing is that it's almost trivial to create an equivalent of (.)
in Haskell that's ordered in the more natural forward way and I really wish it
was a standard. Though (.) itself does get rid of all those parenthesis.

~~~
jose_zap
That would probably be the (&) operator, which is In the standard library

~~~
Symmetry
I was thinking more about composition than application. Thankfully, I just
remembered that you can use Hoogle to search by type signature so I popped in
(a -> b) -> (b -> c) -> a -> c and it directed me to (.>) which is exactly
what I was thinking of in the flow package.

Flow seems like an interesting package to check out if I ever go and start
using Haskell seriously.

~~~
cannabis_sam
There is also >>> for left-to-right composition, which is in base.

My favorite is Elm, which has |> and <| for application, and >> and << for
composition, in the intuitive directions. It makes it easy and natural to eta-
reduce/expand, when you want to.

------
thesz
This is the case when someone didn't completed his homework.

Most of things he requested are possible with HList in Haskell. A library,
nonetheless. From 2008, I believe.

PS Structural subtyping is tricky:
[https://pdfs.semanticscholar.org/ff6f/1c49ff00efa483807fae71...](https://pdfs.semanticscholar.org/ff6f/1c49ff00efa483807fae71ef2a0f2baf6a04.pdf)

It took quarter of century to get it right. Most probably the author of gist
just don't get it's trickyness yet.

~~~
gpderetta
in fact, most of those things are possible in c++ of all places.

~~~
ken
Can you explain how? Most of these look like essentially type-inference
through duck-typing, and (AIUI) C++ doesn't even have duck-typing.

~~~
jcelerier
C++ has had duck typing since it had generics.

Here would be an example of the first case :
[https://wandbox.org/permlink/AIgZtkFgTuiQReDT](https://wandbox.org/permlink/AIgZtkFgTuiQReDT)

------
foxes
Honestly you can nearly write Haskell without specifying type signatures. GHC
is very sophisticated. I feel like there have only been a few situations where
I actually needed to specify the type. Really the argument shouldn't be about
having strict types vs dynamic, it should be about smarter compilers etc. To
me, GHC seems to be nearly perfect (also GHCi).

~~~
yogthos
HM style type inference is nice, but the root problem is that the static type
system restricts you to a set of statements that it is able to verify at
compile time. This necessarily leads to code that prioritizes the needs of the
compiler over those of the human reader. The more things you try to encode
using the type system, the more complex your code becomes because proving
something correct for all cases is typically significantly more difficult than
just stating it and testing for the set of cases you care about. Furthermore,
there is no way to verify that your type definitions themselves are correct.
When you start moving program logic into types, the compiler cannot help you
ensure that it's correct.

Here's a fully typed insertion sort [1] written in Idris. It's over 260 lines
long, and it takes non trivial amount of effort to understand it fully, and
know that what it specifies is indeed what was intended. Meanwhile, I can
trivially read and understand a 10 line Python version in its entirety.

Of course, you could argue that it's an extreme case, and that in practice you
probably wouldn't have such an exhaustive type specification. At that point
you'd be agreeing that relaxing type constraints does provide a benefit, and
we'd just be arguing about our respective levels of comfort.

[1] [https://github.com/davidfstr/idris-insertion-
sort/blob/maste...](https://github.com/davidfstr/idris-insertion-
sort/blob/master/InsertionSort.idr)

~~~
Drup
This is really an apple to oranges comparison. What you linked is not an
implementation of the insertion sort. It's a an implementation _coupled with
proof of correctness_. Try to make a mechanize proof of correctness of your
python code, and it'll be as long (or longer). If you just write an
implementation in statically typed languages like OCaml, F#, Haskell,
Typescript (or even Idris, if you go less crazy with the typing), and you can
get it down to less 10 lines easily.

~~~
yogthos
A proof of correctness is what static typing is fundamentally. The example I
provided simply turns this up to 11. However, if you relax the constraint then
you're losing benefits of statically checking properties at compile time. If
you agree that having a 10 line version that does less checking is better than
having a 260 line version that does, then you're agreeing that there is a cost
associated with static checking.

There are plenty of real world scenarios where static checking gets in the way
even if you're not trying to encode complex properties using the type system.

One problem is that static typing is at odds with modularization since type
declarations are considered globally. For example, Ring HTTP abstraction in
Clojure represents requests and responses as maps. Middleware functions [1]
can update these maps to inject additional keys, or modify existing keys.
These functions often live in separate libraries that know nothing about one
another. A static type system precludes this since it would require you to
provide a static declaration for every possible request and response map.

[1] [https://github.com/ring-clojure/ring/wiki/Middleware-
Pattern...](https://github.com/ring-clojure/ring/wiki/Middleware-Patterns)

~~~
aidenn0
Static typing does not prevent this; the fact that static typed languages can
parse freeform formats like JSON works as a simple example.

If you statically prove that you check that they keys are there before using
them, then the type system can be satisfied.

~~~
yogthos
You don't know what the keys are up front because each library is an
independent module. Of course, you could just use Any type, but then you're
not doing static typing.

~~~
aidenn0
Example:

Module X expects the "foo" key, then write a function that checks if it has
the foo key and returns a type Maybe(HasFoo). It's less good than just
treating the type as the union of HasFoo and HasBar or whatever, but if you
fully decouple, then you have to be able to handle the case in which the
mapping lacks a foo key anyways, and this will statically froce you to.

~~~
yogthos
But the point is that you don't know the totality of the keys up front. Module
A might deal with keys X and Y, while module B deals with keys W and Z. These
two modules know absolutely nothing about each other and they're developed by
different people.

~~~
aidenn0
Right, so Module A defines a "TypeA" (which describes a map containing keys X
and Y) and Module B defines a "TypeB" (which describes a map containing keys W
and Z) and they each define their respective functions that return a (Maybe
Type A) and (Maybe Type B). If the map has keys X and Y then you will get a
(Just Type A) and if it does not you will get (Nothing). The type system will
then enforce that you handle the possibility of (Nothing) which is something
you would need to do in a dynamically typed language as well.

Remember, in the sense we are currently talking about[1], _objects_ are
untyped, it's _variables_ that are typed. So there is no need to describe the
type of this map that is passing around in concrete terms; it's perfectly
cromulent to treat the map as a TypeA in one place and a TypeB in another
place, so long as you ensure that the prerequisites for those types are
satisfied by the object.

It is true that it is usually preferable in statically typed languages to
demonstrate that the map satisfies the requirements for TypeA before type
erasure (i.e. at compile time), but if you want total decoupling, then that is
not possible (since the requirement that the map have keys X and Y would need
to be enforced outside of Module A). This does not mean that a type system
cannot help you though.

I think we are getting to the limits of what can be easily communicated via HN
comments, so if you still don't understand, then perhaps I'll write a blog
article explaining it more fully.

~~~
yogthos
I'm saying that there is no way to know what all the keys in the map are going
to be up front, or what are all the types these keys can have.

For example, there is middleware for parsing our request params into a :params
key. There's another piece of middleware that parses the values of these keys.
So, you may or may not have a :params key, and the types inside the params can
be absolutely anything. Then you could just have completely separate
middleware that might add something like a CSRF token to the request map. And
so forth. Your only real option here is to treat the entire structure as Any
type.

If you still don't understand the problem, I really don't know how else to
explain this.

------
tobyhinloopen
What I want from a gist document: Markdown, so I don’t have to scroll
horizontally

~~~
kirubakaran
You can just click the 'Raw' button/link for plain text:
[https://gist.githubusercontent.com/halgari/f431b2d1094e4ec1e...](https://gist.githubusercontent.com/halgari/f431b2d1094e4ec1e933969969489854/raw/823f9ad204729aa00e02cacef20746b690f31357/gistfile1.txt)

------
ben509
I'm working on one [1] and doing the inference now, which is kicking my ass
TBH, so some thoughts:

> 1) Full type inference.

Really, while writing and hacking, you want it, but ideally the system should
be able to fill in your types for you. Note, though, that there's a tradeoff:
the more overloading and other conveniences, the less complete the inference
can be.

In the system I'm working on, it's intended to deploy a library +
documentation, so all types would be annotated in the artifact generated
regardless of whether they are in the source.

> 2). Value types in structures would need to be nominal, not structural.

My answer to this is we want cheap sum types. In Tenet, a sum type can be
constructed without any declaration by writing tag~expr. This implicitly
defines Union[tag~WhateverExprTypeIs].

Since unions implicitly combine, if I have one branch return meters~55 another
can return error~"something broke" and the return type overall is implicitly
Union[meters~Integer, error~String].

The idea of declaring that, e.g. meters~X and centimeters~Y are
interchangeable representations of the same value is something I'm thinking
about; it's getting at a facility for encapsulation.

> 3) Collections of [product types] would need to be inclusively typed, not
> exclusive.

So, if we have syntax for retrieving an attribute `x.a + x.b` and we know +
takes integers, we can deduce an assertion that x is some product type with
two integer attributes a and b. I call that an open tuple, as opposed to a
closed tuple.

But all product types must be closed (exclusive) in order to construct them,
obviously, and for non-dynamic languages, it seems like you're liable to have
code size explosion (multiple copies of functions to support each type used)
if you allow polymorphism this way.

> 4) ... it would be important to have all these structural members be
> namespaced as well.

I'm curious what people think the best namespace schemes are, because most of
them seem like a lot of complexity for what they do. But...

> 5) Now I would still need basic set logic for these types

It won't make the first cut, but the basic operations would be sum and diff
(for sum types), times and project (for product types) and a renaming facility
with globbing seems like it might answer #4 just fine.

[1]: [https://tenet-lang.org/types.html#tenets-approach](https://tenet-
lang.org/types.html#tenets-approach)

~~~
CuriousSkeptic
I found it interesting that you tag the types like that.

I’m working on a small language idea right now where I wanted to treat
bindings as first class, and having them combine with row polymorphism into a
construct subsuming tuples, argument lists, and environments into a kind of
unified thing, with a similar result in mind. Even to the point of perhaps
demoting scalars from first class status actually, point being that where
there is a value there should be a name (borrowing the rdf model here)

So (n1: v1) might be a value v1 bound to n1. And (n2: v2) another. (n1:
v1)(n2: v2) = (n1: v1, n2: v2), and one could even imagine (n1: v1, n2:
v2)(n3: n1+n2) as a valid expression. Which of course is equivalent to ( n1:
v1, n2: v2, n3: n1+n2, ) not entirely unlike a let-expression

(I’m thinking of unifying terms and types so a lambda would simply be such an
expression where v1 and v2 are types)

Oh well just a toy in my head at the moment. Just found it interesting to see
similar tag combining semantics.

(As for namespace I was thinking to borrow the plan9 approach of binding
things dynamically, using something like Scalas implicits, for evaluation to
do a kind of dependency injection thing, but a global lexically scoped tree
for typing it)

~~~
ben509
My motivation was to try to accommodate the way I've seen people think and
work. So the type system lets you describe anything using literal expressions,
and if you match everything up, it should Just Work, or at least have
intelligent complaints. (I hope.) So a discriminated union is just a tag stuck
in front of something else.

> Even to the point of perhaps demoting scalars from first class status
> actually, point being that where there is a value there should be a name

In theory, an integer is just an uncountable enumerated type. I haven't done
that to treat it as opaque / atomic, but you could imagine it essentially
being 1~~, 2~~, 3~~, etc.

But that might come back if I want to have sane semantics of floats, as in not
treating NaN as a magic value that doesn't equal itself.

> I wanted to treat bindings as first class, and having them combine with row
> polymorphism into a construct subsuming tuples, argument lists, and
> environments into a kind of unified thing, with a similar result in mind.

This might be helpful just as something to think about, but there is already
an example of One Structure to Rule Them All: relations from the relational
algebra. (You want to consider the mathematical definition; SQL tables break
stuff because of legacy.)

A relation can naturally represent any container, but also any function if you
allow them to be infinite. So if you have a relation that states all values of
y = cos(x), you can join it with an existing expression to apply the `cos`
function. (Or `acos` if the system is clever enough.) And that, obviously
would get translated into actual function calls.

> So (n1: v1) might be a value v1 bound to n1. ...

I had to read that a couple of times, but I think I get it. I think being able
to combine structures makes a lot of sense, and it's one of those ideas that's
not implemented in most languages (as the OP mentions) so we people tend to
engineer around the lack of it, so it'd be interesting what idioms develop
with it.

> As for namespace I was thinking to borrow the plan9 approach

Thanks, I'll have to look into that!

~~~
CuriousSkeptic
Yeah, must confess to being inspired by the relational model.

Re: plan9, this is in the context of language I imagine as a kind of
interactive environment (think small talk) and where partial evaluation would
be the means to construct deployable artifacts. In this environment i expect
to support metaprogramming and reflection, typing and unit-tests as something
very dynamic and integrated, but purely design time thing. Again, not entierly
unlike how you work with a relational dbms system and its schema.

------
sdegutis
It's really interesting to see so many people going back and forth between
dynamically typed languages Clojure and Elixir, and statically typed languages
like OCaml and Haskell.

Lately I've even started to even question whether TypeScript is really worth
the trouble and slowness it's causing. I wrote about it briefly here:

[https://sdegutis.com/2019-06-20-considering-removing-
typescr...](https://sdegutis.com/2019-06-20-considering-removing-typescript)

On one hand, TypeScript's type system is extremely convenient when mixed with
VS Code. On the other hand, it's getting slower and slower, which is making it
actually _counterproductive_. I often wonder if I'd be faster in just plain
JS.

~~~
jzoch
What you've found might be that because you had this explicit contract (types)
that coerced you to write your code in a certain way, your code looks simpler
when you just turn the types off. But would you have written it this way had
there been no types?

This exercise is more similar to converting a typed language like java to a
type inferenced one (a la java 12 or "auto" in C++11). If you didnt have the
guard rails of types the way you built your system may differ by relying a lot
more on runtime behavior and conventions rather than compile time type
checking and explicitly enforced rules.

~~~
sdegutis
That's exactly what I aim to find out. But I don't feel very comfortable
trying it on a client project, not yet at least.

Having written in many statically and dynamically typed languages, I feel like
I should have already come to a conclusion by now. Yet somehow I'm still on
the fence after all these years.

------
maxfurman
I'm not an expert on type systems, and maybe I'm misreading this gist, but
doesn't Typescript work this way?

~~~
phiresky
Yep. The only thing missing is nominal aliases to primitive types, which is
requested my many people [1] and even the TS compiler itself uses hacks to
work around the lack of them. No idea why they don't add them already.

Also of course any form of runtime type checking is missing.

[1]
[https://github.com/Microsoft/Typescript/issues/202](https://github.com/Microsoft/Typescript/issues/202)

------
gambler
I've recently listened to a podcast (2004) with the author of Newspeak. Mind-
expanding stuff that makes a lot of sense in retrospect.

[https://www.se-radio.net/2009/07/episode-140-newspeak-and-
pl...](https://www.se-radio.net/2009/07/episode-140-newspeak-and-pluggable-
types-with-gilad-bracha/)

------
stcredzero
_2) Value types in structures would need to be nominal, not structural. I
really don 't care if I get a tuple of [Int32, String], I very rarely care
what the machine level types are, what I really want is [ProductID,
ProductName], what the underlying types are doesn't really matter to me._

In Go, there's a practice called "microtyping" (credit goes to HN user jerf)
where everything is type defined to a semantic name. So you never have a
struct with a string for ProductID and ProductName. Instead, you have a struct
containing a ProductID and a ProductName. This makes it very easy to change
your mind. I used to have a base64 encoded UUID as an EntityID in my MMO. I
implemented microtyping, and after that, changing EntityID to 128 bits was a
breeze!

~~~
dvlsg
F# is good for this, too.

It _kind_ of works in typescript, but it's not enforced by tsc. Definitely
helps readability, though.

------
yakshaving_jgt
…Sounds like he just wants row polymorphism, no?

------
karmakaze
> Value types in structures would need to be nominal

The reasoning behind wanting this is good. It can however be satisfied
differently. I find structural typing to be more convenient where I have the
ability to define a type for another that is not considered to be equivalent.
A good example is how an int can be defined into different types with
conversions in Go.

If we were to follow through with nominal typing we would need transforms to
rename the same structures when working through abstractions that make sense
having different names.

------
tathougies
Number 3 is completely not what you want. I mean, you want that in certain
cases, but you may also want the exhaustiveness checking. There needs to be
two distinct notions here. The notion of a type should convey totality -- this
value has a particular 'shape'. The notion described here is not a type, but
more like an interface or a trait. In either case, it falls under
polymorphism. Person as described here is not a type.

That being said, all these things are implemented in PureScript, I believe.

------
mamcx
This is not solved by the relational model?

I have think about this for my toy relational lang, I consider too the best is
allow:

    
    
        fun print(people:People[id, name])...
    

to work with any relation/class/struct that at least have that 2 fields. Also,
in the relational model the name of the relation and the header is
significant.

Plus, re-combining of types is inbuilt. Need to extent person?

    
    
       rel Customer Person join Address, Phones...

------
mywittyname
I want from a type system the ability to quickly understand the data
structures involved in this code that I'm debugging which was written several
years ago by developers who've long since moved on.

The more clever, independent, and unbound you are, the less you appreciate
types, because you never really feel the pain points that types alleviate.

------
NikkiA
Given his use of s-expressions, Carp-lang is probably exactly what he wants,
but will probably be disappointed when he discovers just how much hoop-jumping
there is to deal with non-GC type lifetime management.

(unless he finds splattering `ref` and `copy` (or their syntactic sugar of `&`
and `@`) in his code acceptable, in which case, congrats!)

------
mcguire
No parametric polymorphism?

