Hacker News new | comments | show | ask | jobs | submit login
What is type safety? (pl-enthusiast.net)
106 points by tenslisi on Aug 5, 2014 | hide | past | web | favorite | 70 comments



I think "type safety" is one of those things that's so well-defined in theory that there's always a huge disconnect between people who know the theory and people who don't. Therefore, I always like to explain the "progress + preservation" view whenever describing type safety, because it illustrates the term really has a pretty precise meaning at least in the simple cases.

In the simplest case, think of your 'interpreter' as a program that translates programs into values. For example, 1 is a value and \x.x+1 (the lambda expression) is a value, but (\x.x+1)1 is not a value because it evaluates to 2 (or, at least, to 1+1).

Let's say that your interpreter does this a single small step at a time; so for instance, if you have* f(x,y) = x+y, then f(1,2) |--> (x + 2)(1) |--> (1+2) |--> 3.

Type safety goes like this: If x has type T then either x |--> x' or x is a value. Also, if x has type T and x |--> x' then x' also has type T.

The first property is "well-typed programs don't go wrong" and the second property closes a loophole ("if you're a well-typed program, then you're never going to evaluate to anything that isn't also well-typed", so we never escape from the first property by evaluating)

* using the product syntax because some people will get confused by f 1 2, but that's in fact what I mean.


Jeremy Siek has an excellent blog post explaining this called "Crash Course on Notation in Programming Languages": http://siek.blogspot.com/2012/07/crash-course-on-notation-in...


Thanks for sharing, that's very well written!

I'm not surprised, either. Siek's papers are always as accessible as they are insightful.


But this type theoretical definition is not really well-suited for the real world; either you have to include exceptions in the set of values, which could include even things like Segmentation Fault, making ASM well-typed, or you don't allow exceptions as values, which means that not even OCaml and Haskell are well-typed with their DivisionByZero and ArrayOutOfBounds exceptions.


Haskell (at least) is actually fine for throwing Segmentation Faults, DivisionByZero, and ArrayOutOfBounds since they can all masquerade as bottom which inhabits every type. Further, this extra inhabitant isn't (too) bad for semantics (it's "morally correct") since you cannot detect it—any attempt to examine bottom results in bottom, it's contagious.

The problem is that you sometimes can distinguish, say, a segfault from an infinite loop. Any code which does that is pretty dangerous.

That's why "pure" exceptions are considered super taboo in Haskell. If you need exception passing then you should do it in something like IO/Either/Cont to contain that effect.

It's also why the couple partial functions in Haskell are all considered warts and never for practical use:

     head :: [a] -> a
     tail :: [a] -> [a]
     (!!) :: [a] -> Int -> a
should all be replaced by

     head :: [a] -> Maybe a
     tail :: [a] -> Maybe [a]
     (!!) :: [a] -> Int -> Maybe a
which now uses exceptions which are marked inside the type system.

Personally, I kind of wish `(/) :: Fractional a => a -> a -> Maybe a` sometimes. It'd get confusing with IEEE floats though since that type already contains values Inf/-Inf/NaN.

[0] http://www.cs.ox.ac.uk/people/jeremy.gibbons/publications/fa...


To add on to tel's response, another method of supporting exceptions is to rewrite progress as "if x : t then either x |-> x', x is a value, or x is an error".

To do this, you need to define another decently large set of judgments for error propagation (such as "if x is an error and y : t then (x,y) is an error" and vice versa), but ultimately you can maintain type safety via progress and preservation while still accounting for exceptions as we know them.

On the topic of whether it would make ASM well-typed, I figure that the lack of array bounds checking would be one reason why ASM would have problems, but I haven't thought it through fully. However, I found a neat paper that tries to create a type-safe assembly language pretty similar to x86: http://www.cis.upenn.edu/~stevez/papers/MCGG99.pdf


wrt ASM, I think tomp meant something like what Robert Harper is getting at in his post on the linked article.

edit: got grandparent's username wrong.


Yes. A convenient way to do the former is to add another judgement err, so that you can prove theorems about things which go err, and know that there's nothing which isn't safe and also isn't subject to your theorems about going err. (edit: tel provides perhaps a better "irl" example)

But... that's why I added the "In the simplest case".

If you're not familiar with the "progress + preservation" definition, then it's going to be rough going understanding anything else without lots of background in logic or proof theory, especially in the case of programming languages as they relate to software engineering. As a prime example, your comment ("include exceptions in the set of values") really very often means something quite different to someone who has an intuitive grasp on the "progress + preservation" definition, and someone who does not.

See also Robert Harper's post and Michael Hicks's response on the article in question.


I think it's worth talking about the ability to propagate user-defined invariants. To me the most useful aspect of a type system is that it enables you to declare that all Xs have some property, and then the language guarantees that this holds. "Has a method named .foo(...)", while useful, shouldn't be the end of what we try to encode, but there's a big gap between allowing things like "an X is either a Y or a Z" and allowing arbitrary type-properties, liquidhaskell style, which the article just glosses over.

Which is a shame, because it's precisely this range - the gap between non-liquid Haskell or Ocaml and Java - which is most relevant to today's programmers making choices of programming language.


Yes, I've long wanted to be able to attach arbitrary predicates to types. This is great for functional state machines; say the state machine type is sm, you can specify functions like

    fun initialize(sm) -> sm'
    where sm.state == uninitialized, sm'.state != uninitialized
or

    fun ping_clients(sm) -> sm' where has_clients(sm)
in concert with

    fun has_clients(sm) -> bool
The compiler can then track these, and ensure e.g. that "has_clients" is called and returns "true" before "ping_clients" is called on a given value.

An even smarter compiler can, if it knows the definitions of these functions, determine statically whether a given predicate is guaranteed to produce the same result on a modified object. (e.g. simply by inspecting which fields are inspected and modified)


You can get this example working with indexed monads in Haskell. You can also do it quite nicely in dependently typed languages like Idris/Agda.

However, abitrary predicates is a different matter. Even if you have a language which allows arbitrary computation in types there exist undecidable predicates which may weaken your desire for arbitrary ones.


It seems to me that you probably don't want to do arbitrary computation in types at compile time. Even "this type is an integer that must be non-negative" is a problem if that type allows subtraction. Now your compiler needs to know the the value of all such types at the time that subtraction was applied to them, to know if the operation is valid. That's not a good position to put your compiler in.

It seems to me to be better to make this a runtime check, which the compiler makes sure to call every time subtraction is applied to a value of that type.


> Even "this type is an integer that must be non-negative" is a problem if that type allows subtraction.

Well, no.

Its a problem if you assert that the result of subtraction is the same type, because the non-negative integers are not closed under subtraction--if the definition of your data type relies on something that is internally incoherent, you have a problem; but you can define subtraction of non-negative integers in a way that is general, not problematic, and doesn't require compile time knowledge of specific values, e.g.:

  data Nat = Zero | Succ Nat
  data Neg = Neg Nat
  subtract :: Nat -> Nat -> Either Nat Neg
But, yeah, attaching arbitrary predicates as restrictions to types brings the halting problem into your compiler implementation.


> data Nat = Zero | Succ Nat > data Neg = Neg Nat > subtract :: Nat -> Nat -> Either Nat Neg

I think that this is one example where the types don't speak completely for themselves. Is `Neg Zero` meant to stand for the integer `0` (as one might naïvely expect) or the integer `-1` (as one might expect from the pseudocode `Neg n = -(n + 1)`)? If the former, then what is `subtract Zero Zero`? (I.e., is it `Zero` or `Neg Zero`? I hope that we don't get into IEEE 754 (http://en.wikipedia.org/wiki/Signed_zero) territory here!) If the latter, then there's no ambiguity, but I think that it's fair to say that that meaning isn't clear just from the type!


For minimum insanity, I think you'd have to assume that I meant Neg Zero to represent -1, Neg (Succ Zero) to be -2, etc., with an appropriate definition of subtract.


Well, you'd probably have to introduce a type such that every time you subtracted n from m you must prove that m is at least as big as n. Which would be a pain.

As usual, I really suggest playing with Agda or Idris to get a feel for what sophisticated types actually look and feel like.


https://hackage.haskell.org/package/type-level-numbers-0.1.1...

But it's not clear to me what happens when you try to compile and typecheck program that attempts to subtract numbers provided as input at runtime.


That link only goes part way.

What you'd need to do is something like

    createProof : (n : Nat) -> (m : Nat) -> Maybe (n >= m)
which could either succeed or fail to produce the needed proof at runtime. Without the proof it'd be impossible to perform the subtraction, so that branch of your program is statically inaccessible.


I think even better is to have a subtraction that returns a different type, where you can't prove that a > b. Where you can, I don't know that I object to the compiler tracking such things, though I also don't actually use a dependently typed language.


"Even "this type is an integer that must be non-negative" is a problem if that type allows subtraction. Now your compiler needs to know the the value of all such types at the time that subtraction was applied to them, to know if the operation is valid."

This is one of the common misconceptions about the stronger-typed languages. I know because I had it before I learned Haskell. To say that you have a guaranteed non-negative number does not necessarily mean that if you subtract two such numbers that the compiler must be able to prove on the spot that the result is positive.

What exactly it means depends on the language. Haskell is not particularly capable of representing that type (or at least, simple Haskell is not), so it'll happily let you write:

    subtract :: NonNegative -> NonNegative -> NonNegative
and then simply propagate the incorrectness if you do end up with a negative number, doing whatever the underlying system does. But a programmer will notice this, and a correct subtract implementation may be:

    subtract :: NonNegative -> NonNegative -> Maybe NonNegative
in which case the compiler isn't doing anything, really; the implementation will examine the two numbers and return Nothing if the subtraction is invalid.

In Haskell, this is a runtime check... it is simply one you can not ignore. It isn't an exception (very easy to forget to check for), and you can't use the result (if there is one) until you unwrap it, and the language syntax to do so tends to strongly encourage you to handle the error. You can not subtract two such numbers without having it shoved in your face that it may fail. You may then choose to fail to check it, since the language can't actually stop you, but you will be doing so with full knowledge of the fact that you are doing so.

And in general, that's how a lot of the "bad computations" that a type system prevents is done... it isn't "prevented" so much as "you are forced to deal with violations". For instance, imagine you are deserializing input from an external source... in a strongly-typed language, all the deserialization routines will have Maybe (or perhaps Either with errors) pervasively used, since deserialization routines are able to fail at pretty much any point. The language doesn't prevent failures from happening, because it can't. It prevents you from sweeping failures under the rug, accidentally or otherwise.

Thus, in Haskell, the first subtract example up above, while the compiler won't actually stop you from writing it, is incorrect; there will exist no implementation that doesn't behave badly for some inputs. In a dependently typed language, there exist safe implementations, but you will have to provide some sort of proof that it works. However, since in general it's difficult to prove, that would probably involve some local context, and it's not hard to imagine that in practice for such a simple thing you might still end up using the equivalent of the Haskell function. It just depends on what you have in hand.

In a way, programming in something like Go has a very similar feel in this way to programming in Haskell, where the fact that the error is constantly shoved in your face and you are forced to deal with it before moving on is the way it generally works. Haskell has a wide variety of clever ways of dealing with it, and Go mostly uses brute programmer force, but the feel is quite similar, IMHO.


It's also worth noting semantically what the type of `subtract` reads as. It says that for any two NonNegative typed values you can produce a new value in the type `Maybe NonNegative` which is the type of NonNegatives adjoined with one special "failure" value.

So this is a perfectly well-defined notion of subtraction on non-negative numbers. It reflects the obvious fact that NonNegatives are not closed under subtraction.


Common Lisp supports arbitrary type predicates, and I think that SBCL enforces them (not sure) at runtime. For instance, it will call the predicate before setting a struct's field to make sure that it returns true (under high safety compilation, low safety just trusts that it's correct)


That's more of a contract system than a type system, though.


> As such, we can think of these languages as type safe according to the null type system

I think a term like vacuously type safe is a good description of what Python/Ruby are under this set of definitions. So as to distinguish from languages where type safety actually means something slightly stronger than "all syntactically valid programs are accepted by the null type system".


A substantially better description would be "dynamically type safe".


In the context of this article that's just confusing the issue. "Type" is both well-defined and not the same as "type" in a Ruby context.


What is type in a Ruby context?


Usually in a Ruby context you'd refer to something like Fixnum or String as types when really that's their "class"

    irb(main):005:0> 3.class
    => Fixnum
Such a "type" doesn't have much to do with "types" as discussed when referring to static types. They're simply orthogonal concepts.


The result of the #class method does not _fully_ encapsulate an object's type (e.g. #extend, #define_singleton_method, etc may have been used to modify the object's type) but every Ruby expression does have a type in exactly the sense as used when talking about static typing; the only difference is when you can (in general) determine that type. The #class method is not orthogonal to type, it is merely incomplete.


Sure, but that's not really relevant. All of your "typing" occurs at runtime and thus is entwined in the language dynamics and not its statics.


I don't think it's quite right to denigrate the actual safety you get in dynamic languages by referring to their type systems as such.

They are indeed safe, you won't get undefined behavior in the vein of C/C++. In fact, if they weren't such, there wouldn't any point to using them over more performant architectures.

The only time as a Ruby developer I've ever encountered a segfault is when a library calls out to a C extension, the problem is fixed by switching to a different gem. I'm safe in that whenever my program fails, the bug is in something I've written. I've never found an actual bug in Ruby.

Sure, I'm giving up a hell of a lot of performance for this safety, but it's more than worth it for my company's use case.

In fact, I can't even imagine a form of safety you could add to Ruby that isn't a Very Hard Problem, e.g. safety from race conditions. TDD in Ruby gives you a kind of certainty about the correctness your code that is hard to get in other languages. If something breaks, just add another test.


> TDD in Ruby gives you a kind of certainty about the correctness your code that is hard to get in other languages.

? TDD is pretty easy in many other languages, both dynamic and static. In some ways, static type systems can enhance TDD by making testing to validate invariants that are beyond what the type system can statically verify easier, because static types can be used to generate test data automatically, so instead of specify specific tests, you specify the universally quantified invariant that must hold and the testing framework generates numerous tests. Both Haskell and Scala (and no doubt other languages) have testing frameworks that leverage the type system to allow this.


> Sure, I'm giving up a hell of a lot of performance for this safety, but it's more than worth it for my company's use case.

Surely you're not trading performance for safety, as dozens of languages are "safe" while having much higher performance over Ruby including similarly dynamic languages JavaScript and Lua, or others such as Java, Scala, Clojure, Haskell, OCaml, and so on.

There are, of course, other reasons to choose Ruby, perhaps because of Rails, large talent pool, code conciseness, etc.


You're putting words in my mouth by implying I denigrated Ruby as unsafe using a much stronger definition of safety. Ruby is type safe under the article's definitions, but I suggest that this claim doesn't carry much information content because a conclusion about the null type system is such a weak claim, like a claim about the null set.


Reducing dynamic typing's safety to a null set claim is exactly the sort of denigration I'm arguing doesn't make sense. Type doesn't disappear just because you aren't encoding it as a first-class PL design concern.


This discussion is just going to reduce down to a runtime tag vs formal type definition. This particular article is using the term type to mean formal type, and my comment is also under this assumption. Under that definition Ruby does have a null type system. That's not a moral judgement of it's design, it's just a fact that follows from the definitions.

Rather than debate this for the gazillionth time, I'll just point you at tel's article which breaks it down very nicely: http://tel.github.io/2014/07/08/all_you_wanted_to_know_about...


As usual, when freyrs3 says "type" he means something simpler and more specific than what you appear to mean. The whole idea is that "type safety" is both specific and not sufficient to mean anything valuable.


freyrs3 didn't say that these languages were unsafe, he said that they were vacuously type safe. The kinds of safety you mentioned do not arise from (static) typing and so, regardless of their validity, it's just not on topic.


How could it not be on topic, when the article clearly said that a type system is that which programs written using it “cannot go wrong”? He then proceeded to use as examples C's lack of memory safety.

With a definition that broad, you really need to examine every extant approach to solving the problem, even if you don't want to go in that direction.


He starts with an intuition-bound definition then clarifies as time goes on. Ultimately "go wrong" needs to be embedded in the type system before it applies.

Other things are obviously language safety features, but just are not "types" in the sense of how some languages are "type safe".


"Well typed programs don't go wrong" only for sufficiently narrow definitions of "go wrong". They don't go wrong by doing undefined behavior. They can still go horribly wrong by doing unintended behavior, though. Type safety doesn't eliminate all bugs, just a subset of them.


Depending on your type system, though, the subset that is eliminated can be larger or smaller. In Java, it's a smaller subset; in Haskell, it's a larger subset (because e.g. the type system statically rules out the possibility of null), and in a dependently typed language like Coq or Agda, it's a ridiculously large subset of bugs. It depends greatly on your particular type system.


I'll agree that that point may not have been highlighted quite as strongly as it could have been, but that idea is fully included in the original post, and you can view the purpose of examining so many language's type systems as comparing & contrasting the specific "go wrongs" that their type system addresses.


I tried to write a post a little bit similar to this one. I'd like to hear any feedback: http://tel.github.io/2014/07/08/all_you_wanted_to_know_about...


It is also interesting to consider the PL definition of type safety: the progress and preservation theorems.

Progress: if an expression is well typed then either it is a value, or it can be further evaluated (in the sense that "an appropriate evaluation rule exists").

Preservation: if an expression e has type t, and is evaluated to e', then e' has type t.

That's it, really (well, you can do a few extensions to allow errors such as division by zero, or take memory into account).

If your language has both of these properties, they tell you that typed expressions will behave nicely at run time.


That's it for the definition, but it's far from it for the meaning. Python/Ruby make a great example when they're interpreted as vacuously type safe—clearly this definition needs other components in order to be meaningful.

And that's really the rub. You want a language which is type safe and possessing expressive types. Getting both is challenging.


Indeed. In dynamically typed, every expression is well typed and there is always an evaluation rule that works, so this trivially holds but means nothing.


Saying that e.g. Ruby is "vacuously type safe" strikes me as fairly misleading. Consider an extension to Ruby which defines the ++ operator as casting its argument as an integer (e.g. Fixnum) and incrementing it. "Properly" defined, this operator would break type safety in a way that __can not__ currently be done in Ruby (i.e. modifying a string from "abcdefgh" to "bbcedefgh", "abcdefgi", or some other nonsense). This is clearly a type error, and it is one that Ruby does not allow, so describing Ruby's type _safety_ as vacuous strikes me as wrong.

Additionally, Ruby prevents e.g. calling on an undefined method on an object. Given the way the interpreter works this comes trivially, but that does not mean that it does not affect type safety.

To summarize: there are certain classes of type errors related to e.g. casting which would be dumb to allow in a (non-toy) dynamically typed language and others which come with no particular cost. This does not mean the type safety of the language is "vacuous" (though it may negate the need for an explicit "type checker").


What you're referring to may be both (a) true and (b) valuable but simply isn't "type safety" as is being discussed. It's unfortunately confusing to talk about this stuff since "type" is an overloaded word.


To those that know more than I: my understanding before this article was that the only "type-safe" languages were or like Haskell, OCaml, Idris, Agda, etc...

If that's not the case is "type-safe, pure, and functional" a good epithet to extend, say, Haskell's "type-safe" tag with many of its other features that weave with the type system well?


"Type safe" is both a marketing line and a technical definition. As a technical definition, many languages are type safe (including, say, Ruby) but they achieve this by having very low meaning to their (static) types.

Typically, the marketing definition means "type safe and expressively typed" which outlaws trivial type systems and their trivial type safety.


I read your blog post, thanks for the longer description :)


I believe Haskell would be considered have "strong static typing". Unfortunately, "strong" in this context is less precise than one might like: both Haskell and Java are generally considered "strongly typed", but Java allows e.g. type casting in a way that will not be statically type checked.

Haskell also uses type inference rather than requiring type declarations, but that is not _directly_ related.


Java's casting is not totally statically type checked. IIRC, the behavior at runtime is very clear, though - you either get a successful cast, if the object you're casting really is an instance of what you're trying to cast it to, or else you get a ClassCastException (from memory, actual name may vary).

That's not totally static, but within the definitions of the article, it might be good enough to count as "type safety" without falling into the "null type safety" category.


Well, "strict" might be a better name for your "strong".

I don't think it's any less precise a concept. Java type system is less strict than Haskell's, has similar "strictiness" to Python or Ruby (while quite different in static typing), and is more strict than C, or Perl.


Type casting has to be modeled (more or less) as (a) upcasting only (b) a failure of type safety, or (b) an admission that the type system is entirely vacuous.


No, you can have downcasting that is checked for validity at runtime, a la Java (or C++ with RTTI). This is not a failure of type safety, since only valid casts will be performed. It's not vacuous, either - it's not just one big type.


What happens when a cast is invalid?

You have type-safe casts in Haskell, for instance, where the failure is reflected at runtime.

    cast :: (Typeable a, Typeable b) => a -> Maybe b
Those are alright.


In Java, IIRC, the cast fails with a ClassCastException. That seems to me to be "failure is reflected at runtime".

In C++ with RTTI, IIRC it returns null. That's also "failure at runtime" (of a sort), but you have to check it.


RTTI returns nullptr when casting a pointer, but it does throw an exception when casting references. Both of these behaviors are well-defined, although the first is kind of questionable.


Haskell also allows casting in a way that defeats static type checking - `unsafeCoerce`


"Type-safe" as used in the FP community tends to be a much stronger claim than what's described in this post. At the very least, throwing a runtime exception is considered "going wrong" in the Haskell community, and this post keeps throwing out examples of runtime exceptions as avoiding "going wrong".


Article claims that, for Python and Ruby, "language semantics gives meaning to all programs, so the well defined and all circles of our diagram coincide".

How can that be so, if Python and Ruby provide escape hatches that allow C libraries to be called?

Perhaps what is meant that in the absence of using certain features, the core semantics is well-defined.


So type safety is just a synonym for "no undefined behavior"? What about code like "i=i++", which leads to undefined behavior but doesn't seem to have any type errors? I thought type safety referred to only a subset of undefined behaviors, namely using a value of type A as though it had type B.


This seems to be C/C++ code. The article said that C and C++ were not type safe, so "doesn't seem to have any type errors" within C/C++ doesn't mean that it's type safe.

That said, the article did say that Java was type safe, and you can write that same expression in Java. I don't recall whether the result is undefined in Java.


As far as I know, Java has no undefined behavior in the C/C++ sense (nasal demons), and "i=i++" is well defined in Java.


Type safety isn't just "no undefined behavior", but it cannot possibly exist in the presence of undefined behavior.


> A classic English example is Chomsky’s “Colorless green ideals sleep furiously”—perfectly syntactically correct, but meaningless.

Interesting. But is it meaningless? Isn't its very meaninglessness its meaning because it has come do be known due to it being bandied about far and wide as an example of meaninglessness.

Maybe it's something more. Clearly, for all X, X cannot be at one and the same time both green and colorless. Similarly ideals do not _literally_ belong to the class of things which can sleep. Finally, sleeping as an activity is never done furiously as fury signifies intent and whatever else may be true when one is asleep it is certainly true that one lacks intent when asleep.

Thus packed into that sentence are three ways in which it _cannot_ be the case or refer to something _actual_ in the world. Is it in this way that it fails to have meaning and so will always continue to fail to have meaning?

postscript:

> Let’s deconstruct this phrase and define its parts, considering the second part first.

Let's use the word analyse rather than deconstruct when that is the word needed.


These are dramatically philosophical, but also dispatched by epistemologists for a long time now.

1. "Isn't its very meaninglessness its meaning" is a confusion of reference and sense. The quoted sentence "Colorless green ideals sleep furiously" interpreted as an object in its own right holds meaning as a common example of syntax and semantics but if you "remove the quotes" and interpret its sense then we run into problems... and obviously that's what everyone is talking about when they bring it up.

2. "Is it in this way that it fails to have meaning and so will always continue to fail to have meaning?" Yep. If we assert that we're stating this sentence as an assertion or judgement of reality then we would like to evaluate its "truth". Even without getting too mystical it's a long established sense of the definition of truth that it ought to somehow correspond with reality and that any notion of truth which does not is hypothetical or meaningless. This is encapsulated in Dummett's C and discussed by Per Martin-Löf here: https://michaelt.github.io/martin-lof/Truth-and-Knowability-...


> But is it meaningless?

Well, the way I use words it is not meaningless. It has a pretty clear meaning. Admittedly, that meaning may not describe any possible state of the world (although I could perhaps argue that), but I can only spot that because I know what the phrase means.




Applications are open for YC Winter 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: