
PyAnnotate – Auto-generate type annotations for mypy - psychotik
http://mypy-lang.blogspot.com/2017/11/dropbox-releases-pyannotate-auto.html
======
ipsum2
For any dropboxers (or others), how does this compare with pytype?
[https://github.com/google/pytype](https://github.com/google/pytype).

~~~
yegle
(work for Google and uses pytype daily)

Pytype is similar to mypy that it can do type checking with proper
annotations. In addition to use annotations, pytype can also do inference
based on static analysis.

I don't have much experience with mypy but the last time I used it, it cannot
infer from `return x == y` that the function returns a bool. Pytype can
correctly infer many simple forms of function argument types and return type,
and even some more complex form.

From reading the project, PyAnnotate completely rely on runtime profiling info
to _help_ you get to the first round of annotations. We also have similar
project that gathers types from runtime and help people to annotate the code.
The type information gathered this way has its limitations (PyAnnotate project
called this out as well, that you should only use it on legacy code but not on
newly written code).

To give an example: if PyAnnotate observe a function below to accept a list of
ints and returns an int, it may conclude that the type of this function is
`Callable[[List[int]], int]`

``` def foo(xs): ret = 0 for x in xs: ret += x return ret ```

But it can actually work on any iterable (because of the for-in loop), and the
item in `xs` is number (because the `__iadd__` call on integer 0). With static
analysis, the correct inferred type might be `Callable[[Iterable[Union[int,
float]]], Union[int, float]]`

~~~
patrec
It's not possible to infer that the result `return x == y` is a bool, because
python has rich comparisons (e.g. if x an y are numpy arrays you'll get an
array of the same shape (after broadcasting) back). So either pytype uses
additional information, or it's sometimes just wrong.

~~~
yegle
Technically true, since you'll need to check the return type of `__eq__()`.
But the following code doesn't trigger any error using `mypy test.py --check-
untyped-defs`.

    
    
      def foo():
        x = 1
        return x == 2
    
    
      def bar(x: bool):
        print(x)
    
    
      def baz(x: int):
        print(x)
    
    
      if __name__ == '__main__':
        bar(foo())
        baz(foo())

------
alcari
Here's the easy 50% of it:
[https://github.com/alcarithemad/typetrace](https://github.com/alcarithemad/typetrace)

------
hasenj
Somewhat off topic but I think that more and more people are learning (the
hard way, unfortunately) how important static typing is, and how dynamic
typing makes it very difficult to develop and maintain large projects.

I think the next generation of successful languages will all be statically
typed (whether they will run natively or in a virtual machine is a different
(even if related) question).

~~~
dkersten
Citation needed?

Anecdotally, I've developed large projects in C++ and Java (I know, they're
pretty lame static type systems -- but certainly the most popular static type
systems) and also in Python and Clojure and I really haven't seen much benefit
in static typing in regards to software defect rate or quality. Static typing
make auto complete and refactoring tools easier, for sure, but it also slows
down ease of experimentation (and writing generic code can be painful,
although other static languages especially type inferred ones fare better
here). I buy into Rich Hickeys view on this topic[1] and that's one reason why
I like Clojure: it gets out of the way, but it provides me with the tools I
need to verify or validate my data (eg on the module or application
boundaries).

I've played around with languages that have fancier type systems (Haskell,
various ML's, briefly ATS) and am very interested in Rust (but have yet to use
it), but they haven't really provided enough benefits for the effort of
describing the types.

Note that I used to be very heavily in the static typing camp and I still very
much like the idea of static typing, I just don't think we have found a static
type system yet that has the right balance of convenience and safety _and
actually catches the right kinds of errors_ (as described in the below talk).

I guess my point is that its not quite clear that _the next generation of
successful languages will all be statically typed_. In fact, current trends
would suggest otherwise (most of the popular languages are dynamically typed)
although perhaps that depends on your definition of "successful".

[1]
[https://www.youtube.com/watch?v=2V1FtfBDsLU](https://www.youtube.com/watch?v=2V1FtfBDsLU)

~~~
hasenj
> I really haven't seen much benefit in static typing in regards to software
> defect rate or quality

Hold it right there. I've never seen anyone argue that static type systems
prevent bugs.

I mean they do prevent silly bugs that occur from mistyping variable/property
names but I've never seen anyone claim that they eliminate other classes of
bugs.

The biggest benefit of static type checking is you know what all the variables
are.

    
    
        def checkout_cart(customer, payment_methods, cart, session):
            # body
    

What the hell is customer? What is payment methods? What fields are available
on these objects? What methods can you call on them? _no freaking idea_.

Of course, this kind of code is confusing in Java as well, but for a different
reason: Java conventions encourage a kind of obtuse programming style where
everything is hidden behind layers of abstractions of factories and managers,
so that even when everything is typed, you're not sure what anything is doing
because all the data that matters is private and so are all the methods that
actually do anything useful. All you're left with is an abstract interface
that can sometimes be just as bad as an untyped variable. But this is mostly a
cultural problem. (I've digressed).

> Static typing make auto complete and refactoring tools easier, for sure, but
> it also slows down ease of experimentation

Java slows down ease of experimentation because it requires tons of
boilerplate code for even the simplest tasks.

It's not the static type checking.

If anything, static type checking helps experimentation because you can change
your mind quickly and the compiler will help you catch all the stupid mistakes
that can occur from mismatching types or mistyping variable names. This
removes a huge cognitive tax and makes programming more enjoyable. Although I
will concede this is subjective.

~~~
crdoconnor
>Hold it right there. I've never seen anyone argue that static type systems
prevent bugs.

Really? I see this every single time the subject is brought up. And, to be
fair, they do catch some bugs, it's just that they do so at a cost.

>What the hell is customer? What is payment methods? What fields are available
on these objects? What methods can you call on them? no freaking idea.

And, if they are all strings, how much more of an idea do you have?

Static typing does not necessarily help solve this problem - a combination of
reduced scope(i.e. looser coupling), more specific variable naming and higher
cohesion (e.g. having a customer object) do.

Moreover, there's a super easy way to figure out what all of those things are
and figure out how you want to change it - run a behavioral test and launch a
REPL when it hits that function.

At that point you can inspect customer, use autocomplete on it and even
experimentally run code.

>If anything, static type checking helps experimentation because you can
change your mind quickly and the compiler will help you catch all the stupid
mistakes that can occur from mismatching types or mistyping variable names.
This removes a huge cognitive tax and makes programming more enjoyable.

Behavioral tests perform this function equally well, if you have them.

~~~
acdha
> IMHO behavioral tests perform this function equally well.

I think a common pitfall in these discussions is to compare the worst case
examples rather than reasonable quality codebases. I'd be far more interested
in, say, time/cost to correct result metrics for a well-maintained Python
codebase which has reasonable use of tests & linting (e.g. flake8) to an
equivalently-proficient team using a statically typed language.

~~~
dkersten
If we're discussing well-maintained code, then I would expect that the public
interface is documented, at least in docstrings. Then I also know what the
parameters are.

~~~
acdha
Agreed — I'm just wondering about how to quantify the impact of various
changes. A dynamic language project with no tests, etc. is going to look like
a selling point for static typing but I suspect the real-world bug counts for,
say, a Python project using mypy (or even flake8 + tests + coverage) is going
to be a lot closer than you might think from how heated these discussions get.

~~~
crdoconnor
There was a study that did a line by line translation of 4 python projects to
haskell and caught some bugs (between 0 and 4 per project):
[http://evanfarrer.blogspot.co.uk/2012/06/unit-testing-
isnt-e...](http://evanfarrer.blogspot.co.uk/2012/06/unit-testing-isnt-enough-
you-need.html)

I got the impression that the bugs found were either not at all serious (e.g.
throwing a typeerror on malformed input instead of some other nicer kind of
error) or were in areas of the code not covered by tests.

Unfortunately the author does not rate them by severity.

~~~
acdha
Thanks - that's a lot like what I had in mind!

My gut feeling is that dynamic typing + tests & static analysis is faster than
very heavyweight languages (e.g. Java) but probably near or less than
languages with more advanced typing systems like Haskell or Rust, but I'd
really like to see something more comprehensive than a subjective opinion.

