
Predicting Variable Types in Dynamically Typed Programming Languages - gauravanand25
https://arxiv.org/abs/1901.05138
======
satyanash
> _...allows programmers to write code quickly by not requiring to declare the
> types of each variable which are determined at run-time based on the values
> assigned to that variable, thereby increasing programmer productivity_

Modern compilers for statically typed languages are really good at inferring
types of various identifiers based on multiple hints in a deterministic way
(see Kotlin, Swift etc). Mentioned statically typed languages C,C++ and Java
are pretty old and therefore carry some baggage of verbosity that is no longer
needed.

> _...since the variable types are not declared in the source code, the source
> code becomes difficult to understand and extend_

> _For programmers working on the large code base written in dynamic
> languages, it is hard to understand the control flow of the program if the
> types are not available at the compile time._

Dynamic languages as a result of their "dynamicness" tend to allow much better
expression of control flow when compared to static languages. Only recently
available statically typed languages have targeted expressiveness as a first
class goal in designing the language. In fact, statically typed languages are
notorious for obtuse control flows as a result of their type enforcement (see
C, C++, Golang)

~~~
jcelerier
> Mentioned statically typed languages C,C++ and Java are pretty old and
> therefore carry some baggage of verbosity that is no longer needed.

hahaha. Just getting people to adopt `auto` in C++ is an uphill battle. People
_want_ to write types.

~~~
toast0
I've basically spent my career without static types (except for the little
bits of C that happen here and there), but I keep hearing about how types
would change my world ... But then evertime I've looked at a typed language
lately, everything is type 'auto'. Well, maybe not Java, I don't think they
have auto yet?

As an honest question, if types are good, why would you put auto, instead of
telling me the type?

For context, I'm a neanderthal and just use a boring text editor, so I'm not
getting any tool assisted type information if you write auto everywhere.

~~~
pjmlp
Because even with type inference, the compiler shouts at me during compile
time if they don't match, instead of crashing at runtime, because Joe of the
5th floor forgot to add a field when he checked in his code.

So I don't need to write additional unit tests to do the compiler's work that
would have prevented Joe to check in his code.

------
phoe-krk
How does this relate and compare to actual type inference? For example, Common
Lisp implementation SBCL is able to infer types pretty nicely now and it
doesn't need neural networks for "predicting" the types with some kind of
chance.

~~~
ScottBurson
Data flow analysis will certainly predict a fair number of types in typical
programs in dynamically-typed languages, but there will also be lots of cases
it can't handle. I don't know the details of the CMUCL/SBCL type inference
algorithm, but I've worked on whole-program data flow analysis, and there are
still lots of types it fails to recover, for various reasons. Often, for
example, the entire program is not available for analysis; if one doesn't know
all the places a particular function can be called, one can't soundly infer
all the types it might be passed. Nonetheless, in practice a given parameter
of such a function is normally passed arguments of a particular type (or
subtypes thereof), and frequently, clues such as the parameter name can help
one take a pretty good guess as to what that type is. Since such a guess is
sometimes going to be wrong, it can't safely be used for optimization, but it
can be very useful for other purposes, such as analyzing the program to find
potential security vulnerabilities.

~~~
seanmcdirmid
DFA isn’t usually used in type inferencing algorithms, at least ones that
generate types for programmers rather than compiler optimizations. It isn’t
that it is expensive, but in general syntax tree walking works well enough.

Whole program analysis is too expensive for type inference (or much of
anything for that matter) even with its precision ramped all the way down via
something like CFA0. Type inference with sub typing is a very similar problem
to alias analysis, which hints at why doing it in any but very restrictive
contexts is too expensive.

~~~
ScottBurson
If you don't need perfect soundness, whole-program 1-CFA can be made practical
by heuristic pruning. I've done it successfully in a commercial static
analysis product.

------
sifoobar
Snigl [0] traces the code before running it; simulating stack contents and
inferring generic calls based on that information as far as possible, among
other things.

It's not perfect, but it's the only way I've found that makes sense in
combination with Forth-like stack semantics where function parameters are
never specified explicitly. And that runs fast enough to make sense in an
interpreted language. I also like that it's implemented as a transformation on
VM code, which makes it flexible and easy to debug.

[0]
[https://gitlab.com/sifoo/snigl#tracing](https://gitlab.com/sifoo/snigl#tracing)

------
mlevental
kind of cool but why is the model trained on such a small data set? why not
train it on a bunch of codebases that use type hints?

