Hacker News new | past | comments | ask | show | jobs | submit login
Just-In-Time Static Type Checking for Dynamic Languages (arxiv.org)
59 points by pron on April 19, 2016 | hide | past | favorite | 29 comments



Fine, but there is surprisingly no mention of Lisp in the related work, which defines a COMPILE function in its specification. In a language where the compiler is part in your runtime, type checking is equally available at different execution times. I mean, when I connect to a running instance of SBCL (Common Lisp) to compile my programs, that's exactly what happens: static-type analysis at runtime. Let's say I define a function which builds a function and compiles it:

      CL-USER> (defun foo (x) (compile () `(lambda (u) (aref ,x u))))
      FOO
Call it with an array:

      CL-USER> (foo #(2 3 2))
      #<FUNCTION (LAMBDA (U)) {100B3C610B}>
      NIL
      NIL
The resulting closure allows to access elements of that array:

      CL-USER> (funcall * 0)
      2
Call FOO again, with a bad input:

      CL-USER> (foo 332)
      ; in: LAMBDA (U)
      ;     (AREF 332 U)
      ; 
      ; note: deleting unreachable code
      ; 
      ; caught WARNING:
      ;   Constant 332 conflicts with its asserted type ARRAY.
      ;   See also:
      ;     The SBCL Manual, Node "Handling of Types"
      ; 
      ; compilation unit finished
      ;   caught 1 WARNING condition
      ;   printed 1 note
      #<FUNCTION (LAMBDA (U)) {100B41E16B}>
      T
      T
COMPILE reports a type error (an exception that I could handle if needed).


Is it catching the type error before executing the code with the type error? The technique presented by the authors does not wait until there's a type mismatch in an individual expression inside of a function. It catches the type mismatch before executing the function. That's what they mean by "just-in-time static". It happens "just-in-time" because it's during program execution, but it's "static" because it happens before executing the function itself.


Yes, during compilation.

        (defun foo (x) (compile () 
                         `(lambda (u) 
                            (print "Executed")
                            (aref ,x u))))

        (foo 0)
... reports a compilation error and of course does not print anything. If however I directly type the offending form (aref 0 0) a runtime type error is signaled.


I believe that is still quite different from the technique presented by the authors. Their technique applies at program execution, but prior to the execution of the individual functions. What you're showing is all prior to program execution.


No, this is during program execution. Lisps blend compile time and run time together. You always have a compiler available, so you can compile new stuff (and recompile existing stuff) in the middle of your program's execution. And compile-time typechecking occurs whenever you compile.


The type analysis happens only after I enter "(foo 0)" in the REPL, which is execution time.


Common LISP is a bit weird because its type system was originally designed primarily as a way to speed up the executables (by letting the compiler aggressively optimize memory representation and remove runtime checks).

Nowadays that kind of efficiency isn't as big of a deal as it used to be and people tend to go more towards the angle of using the type system for error detection but I still can't shake that feeling that CL's type system feels a bit off when you compare it to other modern type systems.

(this is no excuse to not at least cite Common LISP though)


No it's not. That's exactly the same use-case those slow dynamic languages have now. This kind of efficiency is a big deal, in languages designed by people with the same mindset.

Getting errors is the same if at compile-time or in the test-suite or at the customer, though having them at compile-time is of course preferred. In LISP you have all 3 cases. In static languages only the first.

These guys here solve their type-checking performance problem by deferring the check to dynamic code. Checks are expensive if your type checking code is slow, or the source code is a dynamic mess. We are talking about ruby on rails here, which is the biggest dynamic mess one can think of. It's the similar problem the JIT guys have, deferring compilation to only actually invoked code and skipping dead branches.


A compiler with some compile time type checking has been implemented starting around 1985. That's the public domain CMUCL compiler (named Python) by Robert MacLachlan.

See 'Types in Python' from the manual:

https://common-lisp.net/project/cmucl/doc/cmu-user/compiler....

A variant of that compiler is used in the now popular SBCL implementation:

http://www.sbcl.org/manual/index.html#Handling-of-Types


This isn't static type checking though, if it happens at run time then it is by definition dynamic. And these languages already do that. What problem is this solving? I'll already get a runtime error now, how is this runtime error better?


It's somewhere between the two. Consider this program (in C-ish pseudocode):

    main() {
      foo();
    }

    foo() {
      if (false) {
        int i = "not an int";
      }
    }

    bar() {
      int j = "not an int";
    }
A statically typed language would report two type errors. A dynamically checked language would report none. Hummingbird would report one.

It gives a some of the benefits of static typing: it reports errors in code paths not hit through imperative branching. But it doesn't give all of it since it doesn't report errors in uncalled functions.

However, what it gives you in return is the ability to handle functions that don't even exist at program startup and are instead defined at runtime later in the program's execution through metaprogramming.


>Hummingbird would report one.

When?

>However, what it gives you in return is the ability to handle functions that don't even exist at program startup and are instead defined at runtime later in the program's execution through metaprogramming.

Those are already dynamically "type checked", how is this adding anything?


> When?

It would report the error in foo() when foo() is invoked, even though the code path containing the error is never executed.

> Those are already dynamically "type checked", how is this adding anything?

Only code paths that actually get executed are dynamically checked. This checks all code paths in the method.


>It would report the error in foo() when foo() is invoked

That's "dynamic typing".

>Only code paths that actually get executed are dynamically checked. This checks all code paths in the method.

Ok, so it is extended dynamic typing. Why on earth is it being described as having anything to do with static typing?


Because it uses static analysis techniques, that's all. You type variables and consider all execution paths at once, instead of executing code and checking the types of values.

In particular, the code that was generated won't get executed if it can be proved beforehand that there is a type error.

I agree that in effect you won't get an error message until execution, but that does not mean that this is dynamic typing. Execution time and compile time are generally interleaved with metaprogramming so it can make sense to consider static-analysis for code at runtime.

What I found strange is the fact that it happens "Just-In-Time", which for me means at the last possible moment. This is counter-intuitive with what static typing aims to do. I would prefer "ASAP static-typing" (the code could be analyzed long before it is run).

That's why I still find the COMPILE approach of Lisp better, since we have precise control over when analysis is done.


vs. "¿How can a dynamically typed language not actively prevent static checking?" http://lambda-the-ultimate.org/node/5320


Just reading the abstract it's not entirely clear what's going on. My impression is that it instruments the app at runtime to gather type signatures for methods, and then.. it saves those type signatures somewhere that can be used to statically type-check the program later?

Two things come to mind when considering this approach:

1. How does it handle the type signatures changing as the code is modified? Either it must detect when the code is modified and invalidate its saved signatures for that (although dynamic metaprogramming would make that hard), or you'll just have to live with static type-checking temporarily failing until you've run the program again to let it gather a new type signature.

2. This can only be accurate if you've actually tested all of your code paths, because any code path you don't test may call some method using a type signature that Hummingbird hasn't seen before. Granted, the scripting language community already focuses heavily on 100% code coverage (though 100% code coverage doesn't necessarily mean all the code paths have been tested), but it still seems like an important limitation to be aware of.


Point 1 is explained, at a high level, in the introduction:

"In Hummingbird, user-provided type annotations actually execute at run-time, adding types to an environment that is maintained during execution. As metaprogramming code creates new methods, it, too, executes type annotations to assign types to dynamically created methods. Then whenever a method m is called, Hummingbird statically type checks m’s body in the current dynamic type environment. More precisely, Hummingbird checks that m calls methods at their types as annotated, and that m itself matches its annotated type. Moreover, Hummingbird caches the type check so that it need not recheck m at the next call unless the dynamic type environment has changed in a way that affects m."

Personally, I think they're pushing the bounds of what "statically" means, but hey, they do say it's "just-in-time static". It does indeed do the checking before executing the function, it just happens to be immediately before executing the function. A paragraph later:

"To ensure our approach to type checking is correct, we formalize Hummingbird using a core, Ruby-like language in which method creation and method type annotation can occur at arbitrary points during execution. We provide a flowsensitive type checking system and a dynamic semantics that invokes the type system at method entry, caching the resulting typing proof. Portions of the cache may be invalidated as new methods are defined or type annotations are changed. We prove soundness for our type system."

Based on how their approach works, I don't think Point 2 stands.


Oh ok. So it's not really static at all, they're just calling it that because it sounds better. It's simply dynamic type-checking that checks the types immediately prior to calling the function instead of checking them at the top of the function body. Offhand, it seems that the only benefit to this approach (checking prior to calling the function instead of checking inside the function) is the ability to cache the results and skip checking the same callsite in the future if nothing's changed.

Being able to significantly optimize type-checking for scripting languages like this to the point where you can actually run this in production seems quite valuable. But calling it "static type-checking" is simply wrong. Granted, they're calling it "just-in-time static", but that phrase is contradictory, it would be like saying "dynamic static type checking".


1. When the only way to modify code is to create a new function, this technique would work fine.

2. I think you're right.


Doesn't sound like it solves my main objections to dynamic typing:

1. No compile-time type checking.

2. No autocomplete / intellisense in IDEs (you can try but it's never very good).


My main objection is always readability. Sure, it is nice when I am writing code and on top of my head I know the type of each variable, but when I am reading code, this becomes nightmare for lack of type information.


> My main objection is always readability.

Strange, that's my objection to type declarations.


I'm nor sure I would use the word "readability" there. I'd prefer saying that types aid in documentation.


Factory<MarketDataService<Stock>> marketDataService = new Factory<MarketDataService<Stock>>()

Is objectively less readable than

def marketDataService = new Factory<MarketDataService<Stock>>()

And even the latter is not the best.

def MyClass(marketDataService){

    this.marketDataService = marketDataService
}

It's imho the most readable. If the name is self describing and there are unit tests that prove that I'm receiving the correct type I cannot care less about the type itself.


Alternatively, in MLs (this sample specifically is OCaml) :

    type expression =
    | Number of float
    | Add of expression * expression
    | Subtract of expression * expression
    | Multiply of expression * expression
    | Divide of expression * expression

    let rec eval expr = 
        match expr with
        | Number n -> n
        | Add (expr1, expr2) -> (eval expr1) +. (eval expr2)
        | Subtract (expr1, expr2) -> (eval expr1) -. (eval expr2)
        | Multiply (expr1, expr2) -> (eval expr1) *. (eval expr2)
        | Divide (expr1, expr2) -> (eval expr1) /. (eval expr2)
Hey look, I just built a calculator DSL. And everything is statically typed and typechecked (and if I left off the "Divide" case in my "eval" method, it would warn me).


Point #1 would be very very hard to do for Ruby, since its such a highly dynamic language. The University of Maryland team thats behind this paper already tried doing that with Diamondback Ruby but it didn't really work out in the end.

janvitek.github.io/events/PBD13/slides/JeffFoster.pdf

They apparently changed their focus to gathering type information at runtime and using it to do interesting stuff, like this recent Hummingbird research.

By the way, I think it might be possible to implement autocompletion based on that runtime information they collect.


#2 isn't really true. ETags/GTags provide you with a TAGS file consisting of function definitions. Your intellisense can use those however it wants. Autocompletion and "goto definition" are features of certain emacs packages, for example. (See emacs company-mode for some pretty nice autocompletion on various dynamic languages.)


I think #2 is true. It's never any good.

There are a lot of rudimentary "caveman completion" packages available for Ruby coding, but nobody has ever written a good "intellisense" feature for Ruby, where "good" means something on the level of, say, Xcode's "CodeSense" auto-completion for Objective-C or Swift, or JetBrains IDEA's support for Java.

I work in Xcode a lot, but I also maintain a couple Rails apps at work. A few times a year, I get so aggravated I spend an afternoon googling the state of the art in code completion for Ruby, and trying out various editors and IDEs.

JetBrains RubyMine is consistently the best, but it is far, far inferior to Xcode in this regard. It suggests methods but not the one I am looking for, or it suggests inappropriate methods that would cause an error, etc. It can't always jump to the definition of something, or tell me all the places the method I am currently editing is called from.

Xcode has a lot of things I could complain about, but the code completion, finding caller/callees, jump to definition/counterpart... all that stuff is incomparably better than it is for Ruby in any editor I have ever seen.

And of course it is — it is incredibly hard and complicated to do a complete or even mostly-complete job of it with Ruby code. There are just too many ways to metaprogram and too little information available to the editor to make it a straightforward task.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: