Hacker News new | past | comments | ask | show | jobs | submit login
Should Your Specification Language Be Typed? (1999) [pdf] (microsoft.com)
45 points by pron on Sept 27, 2015 | hide | past | favorite | 21 comments



The opinion piece (that's what it is) starts with an assertion that types make it easier to find bugs. I think the reality is more nuanced.

When types were pushed into the C language (function prototypes, etc), it fixed a lot of bugs. The compiler would prevent you from pushing a short into an int, etc. Because that was a common and hard to debug mistake, this experience increased the popularity of types.

Typing of higher level objects is less useful. When you pass an object into the wrong function, the code usually doesn't even pretend to work (unlike an integer passed in as a float parameter). Also, because objects are so different, it's rare to accidentally pass an object as something else. The only time I've seen that happen is in very convoluted, multi-layered code bases.

We can see something similar in languages like Ruby or Perl, where it's actually kind of nice that you don't have to worry about whether a variable holds a string or integer or whatever. (For some reason, Javascript's implementation of the same idea seems to be a bit more annoying, but that could just be me).

So strong type systems are not necessarily a boon. (Of course, on the other hand, we have Haskell, where the type system can help reduce bugs).


Types help you enforce invariants. When you need to care about representation, representational types are quite useful. But it turns out, when we don't care about information interchange or a whole lot about performance, we usually don't care about representation as long as it's handled correctly, and a lot of automatic systems are okay (or better) at handling representation.

More of what we care about is domain relevant, and that's a place where enforcing invariants can be important. As an incredibly simple example, passing a "bid" price where an "ask" is expected could be an expensive error, and is unlikely to trigger an exception.

A type system like Haskell can be a big win, but we can enforce surprising things in something as simple as C. I had a project where I had several threads devoted to different activities, operating on their own static data. With a little creativity, I was able to track at compile time which threads a function could be run from and detect when I accessed data from the wrong thread. Without adding any run-time overhead, and while leaving me a free hand in organizing my data and functions in both my source files and my executable. It was tremendously convenient, when changes in specifications required me to move functionality from one thread to another, to be told everything that needed to change along with that. Over the course of the project, I can count on one hand the number of concurrency issues found in testing and only one - a complicated livelock - ever made it into production.


That sounds really fascinating and educational. Could you possibly provide some more details or examples of how you accomplished that in C?


I'm a bit sleep deprived (new baby) but I'd like to try to respond before the reply link goes away.

There's a few layers to this.

I threaded knowledge of what thread a function could run on through as the first argument. It was always called `th` and took the form of an empty struct. It turns out, if you pass an empty struct by value and don't do anything but pass it along, modern C compilers seem to generate identical byte code as if you removed the argument entirely (obviously, this isn't something you could strictly rely on, but it made no difference for correctness only efficiency, and if you're at the level where you care about that scale of efficiency you should be verifying all manner of assumptions when moving between compilers!).

So now at any given point in any given thread-pinned function, you've got a variable in scope called `th`. Calling another thread-pinned function on the same thread looks like `foo(th, other, args)`. If that function isn't supposed to run on this thread, you get an error about trying to pass something of type `server_thread_t` as an argument of type `main_thread_t` (or whatever).

Controlling data access could be done with getters and setters. I took a slightly different approach (which I'm not convinced is better, but it didn't cause me any pain this project), which was to define a macro used to declare global data that was bound to a thread, which would mangle the name and also create a no-op checking function. I had another macro that would take the simple name, check the thread, and resolve to the thing with the mangled name (as an LVAL). Having called that macro `L`, I now can treat `L(foo)` just as if it were a global variable named `foo`, and at runtime that's all it is, but at compile time my access to it is checked.

I had a few other macros for putting a "lying" `th` in scope for those places I really needed to cross threads (`CROSSTH` for "I am accessing thread X from thread Y, `BEFORETH` for "I am touching this threads data before any threads were spawned", &c).

Oh, the definition of the `th` argument in functions was baked into a macro so it looked like `int foo(TH(main), int something, char else)`. `TH` also added some attributes - I know there was `unused`, there might have been others...

Hopefully that gives some sense of it. Let me know if you have further questions! :)


Thank you for that write-up! Sorry that my response is so slow. (Seems I can still reply several days later, wonder what the limit actually is)

That's definitely both a nice solution to the problem, and I found it pretty enlightening (having sadly not used anything as strongly typed as Haskell).

In the case of global variables, if performance didn't matter then it would be possible to just pass around a pointer to a struct of the thread-local/thread-locked data for use directly. I spent a while trying to come up with some alternatives or generalisation of your approach, and found it was really hard to match the same guarantees without a run-time cost. One idea I got from your description is putting access to certain data/function in scope in a type-checked way. So you could put multiple such contexts (pointers or other access methods like functions or even macros) in scope at the top of the function/block, rather than having to type-check every access throughout the function using an implicit control (e.g. the 'th' variable.) But really, that's just equivalent to using the L macro like so:

  void inc_foo(video_thread_t th) {
     video_thread_data_t *vid = &L(video_data);  // checked global access
     vid->foo++;
     vid->bar++;
  }
I think something like this looks more natural though.


> We can see something similar in languages like Ruby or Perl, where it's actually kind of nice that you don't have to worry about whether a variable holds a string or integer or whatever. (For some reason, Javascript's implementation of the same idea seems to be a bit more annoying, but that could just be me).

Well, for Perl that's likely because the conversion of the variable to the desired type is controlled by the operator, and as such they make distinctions between operators that work on numeric types (the usual assortment, +,-,/,*,==,!=,<,>,<=,>= etc) and those that work on strings (".",x,eq,ne,lt,gt,le,ge, etc). This makes the intended operation obvious, so explicit casting between types is often not required.

For example, in JS, what is the following doing? String concatenation or addition? What if the types are mixed?

    var c = a + b;
In Perl, the equivalent is one of the following, depending on your purpose.

    my $c = $a + $b; # Addition
    my $c = $a . $b; # Concatenation
Note there is no ambiguity. If warnings are enabled, if a string can't be cleanly converted to a numeric type, it will report as much on STDERR. This all stems from a core tenet of Perl, similar things should look similar, different things should look different. String concatenation and addition aren't really similar, as much as they may seem like it at first blush (for example, concatenation is not commutative).

As for Ruby... I don't know. It uses + for string concatenation (but also concat, I believe). I suspect having actual methods to disambiguate intention helps for Ruby.


Those functions still pretend to work until you actually call them. I would say that it's a pretty big plus to know that your code is faulty without the evaluation actually having to run through the buggy function.


Here's a nice review of empirical studies on typing (http://danluu.com/empirical-pl/) by Dan Luu.


>the code usually doesn't even pretend to work

It does pretend to work for a while, often failing in another function, in another file, with a cryptic message. All preventable by a simple type check.


> When you pass an object into the wrong function, the code usually doesn't even pretend to work

It very much does! It pretends to work whilst, in many cases, performing buffer overruns, arbitrary code execution, and so on.


> The opinion piece (that's what it is) starts with an assertion that types make it easier to find bugs. I think the reality is more nuanced.

It is, and the article ends with the recommendation that specifications (or any code that requires verification of interesting properties) should be only optionally typed, with various pluggable type systems, because every specific type system would make an interesting class of problems harder to specify than untyped code.


(Somewhat OT, but...)

Constructive logic accepts the truth of every integer is either even or odd, but only because we have an effective means of determining which alternative holds for any integer. It does not accept the statement every real number is either rational or irrational; given a real number, say as a convergent series, we have no effective means of determining whether or not it is rational.

I don't understand this. Consider the smallest number N so that any N points in the coordinate plane in general position contain a subset of 7 points that form the vertices of a convex polygon (taken from [0], recently discussed here on HN): is this number, which is definitely an integer, even or odd? We don't know, because we don't know what the number is. Isn't that the same situation we are in with regard to the convergent series? What's the difference? If we actually computed the decimal expansion of the value of the convergent series, we would be able to look at it and see whether it was rational or irrational. There is the minor difficulty that the expansion is in most cases going to be infinite, but so what? We're talking about mathematics here, not computation.

[0] https://liorpachter.wordpress.com/2015/09/20/unsolved-proble..., "Fifth grade" section


A convergent sequence doesn't have to be a digit expansion and so might not have the nice property that "if it's just repeating, it's rational, otherwise irrational."

Furthermore, it wasn't proven until the 18th century that Pi was irrational (https://en.wikipedia.org/wiki/Proof_that_%CF%80_is_irrationa...). To this day, it is unproven that pi+e is irrational, yet the decimal expansion is easy to compute (and you'd be a fool to bet on it being rational).


I don't think that the expansion being sometimes infinite is a minor difficulty. In fact it sounds pretty major, similiar to the halting problem. Unless you can prove if it's infinite or not a priori.

I'm not knowledgeable about this topic but the quote suggests you can't prove that.


"In fact it sounds pretty major, similiar to the halting problem."

I think it's identical to the halting problem. The issue is that, for irrational numbers, there is never a point at which you can stop and say "okay, we know it is irrational now".

In fact, if we had an algorithm that could always tell us whether a sequence converged to a rational number, we could solve the Halting Problem:

For any Turing Machine you wish to check for halting-ness, produce a sequence S by successively appending a full encoding (any will do) of the Turing Machine state at every step to the decimal expansion of the previous value in S. Apply your algorithm to S. If S is irrational, then we know it never halts. If S is rational, then we know it halts or repeats after a finite number of digits - so we can simply run it until it does one or the other (this may take arbitrarily long, and an arbitrarily large amount of storage, but finitely in both cases).


Everyone has gotten distracted by the infinite expansion and has missed the point of my question. Let me try again, with no infinite expansion.

The constructivist position is to refuse to accept the statement "every real number is either rational or irrational" because they're not in possession of proofs, for each possible real-valued expression, that its value is rational or, alternatively, a proof that its value is irrational. And yet, according to the quoted paragraph, they accept the statement "every integer is either even or odd", quote, "because we have an effective means of determining which alternative holds for any integer". My argument is that they should reject this statement on the same grounds: I named an integer whose value is not known, and therefore it is not known whether it is even or odd. There is not, in fact, an effective procedure for determining the evenness or oddness of the value of an arbitrary integer-valued expression. It's only after one actually knows the value that answering the question becomes easy.


Constructivists don't accept that you've named an integer. They would say: an integer is any member of the smallest constructive set that contains 0 and one more than any member of that set. Since the set is a constructive set, if you have anything you claim to be an integer, you must be able to either prove that it is 0 or one more than an integer. The number you named isn't zero, so presumably it must be more than some integer ("one less than the number you named"). But if I then ask you whether that number is one more than an integer, and then the same about that one, and so on, at some point you will be unable to answer me. Since you can't provide the evidence that qualifies something as an integer, there's no reason for me to think it is an integer.


I see. The difference, then, between that situation and the situation where I provide a real number named as the sum of an infinite series is that in the latter case I may be able to provide also the modulus of convergence, thus proving that the number is a constructive real number [0]. This is different from a situation where I have a number defined non-constructively and no series is known that converges on the number.

[0] https://en.wikipedia.org/wiki/Constructivism_%28mathematics%...


Constructivists reject the notion of an integer whose value is not known (more precisely, whose value cannot be computed in finite time) so there is no room to even ask of "it" is odd.

Infinite vs finite is an important distinction. A fact that takes infinite time to prove is not constructive and is likewise not computable.


IANAM, but I think the difference comes down to the fact that we know the integer you are producing is one we possess a proof for. We do not know that for reals.


A bit quaint in the light of the past 15 years. I find it amusing that the paper uses the phrase "type-based formalisms" instead of "type theory".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: