
Open-sourcing MonkeyType – Let your Python code type-hint itself - ambivalence
https://engineering.instagram.com/let-your-code-type-hint-itself-introducing-open-source-monkeytype-a855c7284881
======
alexchamberlain
Fantastic contribution back to the community; I look forward to trying it out.

I must say this is the first time I've been disappointed with the quality of
discussion on HN. For a community that promotes using the right tool for the
job _at the time_ , I would have thought people would be more open to the
choices the early engineers made. I'm sure that Instagram are using a variety
of tech across their stack.

~~~
blub
Most people would expect software to crash, hang, be slow or somehow leak
their personal information. That's normal behaviour for the products of the
software industry.

For a long time there have been efforts to ensure at least a degree of quality
and robustness through processes, practices and verification tools. One such
tool is a type system which allows encoding requirements and expectations that
will be automatically verified with the help of a checker. This is exactly
what Instagram is attempting in an effort to increase quality and ease
maintenance.

It seems many are wondering why they haven't done this from the beginning. A
typical (and probably correct) answer is development speed and flexibility:
early stage companies need to be nimble and compromise on quality if they want
to survive. Fair enough, I understand that. We can applaud Instagram's success
on the markets, but that doesn't have to mean that they're a role model of
technical excellence for building a million line Python code base.

This tool is the proof that Python has significant problems at scale, which is
something the Python community has denied for a long time. They're still doing
it in this thread, but the lesson looks pretty clear to me: if you plan on
building large scale, don't use Python. Or PHP while we're at it (see
Facebook). Definitely not JavaScript (see FirefoxOS).

The founders of a start-up can continue to do whatever they want in the name
of success. Instagram at their beginnings was basically a different company
from today's Instagram, no one would have used them as a technical role-model.
Now they're at best a role model for large Python code bases, but many here
seem to be drawing the wrong conclusion, namely that it's a good idea to do
large-scale Python in the first place.

~~~
moreless
> This tool is the proof that Python has significant problems at scale, which
> is something the Python community has denied for a long time. They're still
> doing it in this thread, but the lesson looks pretty clear to me: if you
> plan on building large scale, don't use Python. Or PHP while we're at it
> (see Facebook).

Yeah, absolutely, don't do this! These are two examples of successful
companies that did it and look at them now! </s>

Being able to move fast and produce a winning product _on time_ is much more
important for startups. What does it matter if you used
<your_cool_scalable_thingie> for a project, when it never went past 10 users
because you were concentrating on wrong side aspect of your business? PHP is
fine. Python is great. Use the tools that fit your problem and you know how to
use, not the latest toy.

~~~
blub
There are two things that are critically incorrect about your argument:

1) That market success implies having quality software. Average seems to be
enough in my experience.

2) That start-ups are a good example to follow if one wants to achieve good
quality. In fact they should be ignored, because they will absolutely murder
quality in order to stay alive. Sometimes the product doesn't even work and is
held together with duct tape in order to get past that important demo... It's
quite pointless to discuss quality and start-ups.

The lesson I mentioned should be heeded by mature companies that are able to
do some project planning, complexity estimation, etc.

------
JosephRedfern
This sounds similar to Dropbox's PyAnnotate -- Guido van Rossum writes about
it here: [http://mypy-lang.blogspot.co.uk/2017/11/dropbox-releases-
pya...](http://mypy-lang.blogspot.co.uk/2017/11/dropbox-releases-pyannotate-
auto.html)

Would be interesting to see how MonkeyType and PyAnnotate compare.

~~~
sbuccini
I'd also like to hear more about this -- both the feature set and the
development process. It's interesting that two large engineering organizations
responsible for some of the most popular applications on the planet spent
hundreds of engineering hours building almost the same exact tool at around
the exact same time.

I'd also like to know how much quicker or better it could have been completed
if it had been done out in the open.

~~~
ambivalence
The idea to gather types at runtime is as old as PEP 484. The Dropbox and
Facebook teams working on Python type checking know each other. We both worked
on our implementations independently since we wanted to first test internally
whether the idea holds water. For example, I personally thought it wouldn't be
as useful in practice as it turned out to be!

We knew we're going to open-source each others' implementations, esp. that
Instagram's is focusing solely on Python 3 which isn't useful for Dropbox at
the moment. It just took a while to get through the process of open sourcing
what we had (cleaning up the early implementation with limited documentation,
decoupling from internal data stores, etc.).

Would it be cheaper if this started out in the open? Probably, but I don't
think by quite the margin as you expect.

~~~
Traveler42
Even though MonkeyType is focused on Python 3, can the .pyi files be used with
Python 2?

------
troels
Interestingly, I made a similar tool for php some time ago. It was recently
revived and is now in active development to bring it up to speed with recent
developments in the language.

[https://github.com/troelskn/phpweaver](https://github.com/troelskn/phpweaver)

~~~
muglug
That is fascinating. I’m the author of a static analysis tool for PHP that can
generate types, but clearly not at the same level as runtime analysis. I’ll
use it on my company’s codebase and report back.

~~~
troels
It would be great if you can give any feedback. There are quite a few rough
edges currently.

------
didibus
Its interesting, this is a similar learning Clojure came around with, that the
types weren't really useful unless everything is typed. Though Typed Racket's
solution was to promote types to runtime validation at those borders between
things with types and things without.

I do find it intriguing though, that adding back types manually is so hard and
slow. Is it slower when done retroactively? Or is it just as slow when done at
the same time, but we don't realize its overhead?

~~~
lmm
It's a lot slower to do retroactively. You basically have to tell the computer
why you believe something is correct - e.g. if you're moving to having a
distinct type for non-empty lists because some functions are only valid for
non-empty lists, you have to explain why you believe a list you're passing to
such a function is non-empty. That's a lot easier to do at the same time
you're doing it (even in Python you'd probably still ask yourself whether you
knew the list was non-empty as you were writing it) than to come back months
or years later and remember why.

------
Cieplak
Given that people have asked why not use a statically typed language, seems
appropriate to mention that it's possible to write pythonic-looking C++:

[http://preshing.com/20141202/cpp-has-become-more-
pythonic/](http://preshing.com/20141202/cpp-has-become-more-pythonic/)

I have been using C++ a lot lately but really wish there were more tools for
reflection at compile time, e.g., ability to iterate over all the members of a
class. Other than that, I'm really loving C++17's auto template parameters and
type deduction capabilities, plus code that's 200x faster at runtime than most
interpreted languages. I've found autocompletion in CLion to be slightly
better than autocompletion PyCharm, but not quite as good as IPython or
IntelliJ with Java.

~~~
aub3bhat
Please show me C++ equivalent of

    
    
        sorted([(k.weight, k.name) for k in somelist], reverse=True)

~~~
deathanatos
Note: my C++ is extremely rusty.

Right now, I think that's approximately,

    
    
        vector<tuple<int, string>> output;
        transform(
            somelist.begin(), somelist.end(),
            back_inserter(output),
            [](const auto &f) { return make_tuple(f.weight, f.name); }
        );
        sort(output.rbegin(), output.rend());
    

Ranges, I believe, would reduce this _a lot_ , possibly even to a single line.
_If_ I am reading the docs on it correctly, something like,

    
    
        vector<auto>(somelist | view::transform([](const auto &f) { return make_tuple(f.weight, f.name); })) | action::sort;
    

I _think_.

Two notes, however:

1\. I feel like most of the desire for a static language is to know what type
something is. Is C++ exactly as brief as Python? No, as I think you've
demonstrated. But I think you're a lot more likely to know the _type_ of
something. Rarely do I think I find that Python has annotations, and
annotations can be _wrong_.

2\. C++ is, in general, I feel, much more explicit about where copies occur. I
elided one of the copies in your example, opting instead for an in-place sort
(but this is trivial to fix in the Python).

~~~
mehrdadn
> vector<auto>

Really? O.o How could this possibly work?

~~~
deathanatos
Oh, that was probably a typo (it was late); replace that with a concrete type.

------
BucketSort
I come from a statically typed background (C++), but have been doing a lot of
analytics in python in the past two years. It is frustrating not to have
compile time guarantees when dealing with mathematical programs, because some
things have to be a particular type (i.e. matrices of compatible dimensions).
The result is a copious use of asserts, but it feels bad when you know that if
you did this in a functional language,let's say, you could prove
implementations are correct by the nature of the type system. In short, I'd
love to see more strong type support in python.

~~~
joshuamorton
>(i.e. matrices of compatible dimensions)

What language do you use where you can get these kinds of guarantees? As far
as I know very few languages provide those kinds of dependent types
statically.

~~~
saagarjha
C++ makes this possible via templates. Generally the size is moved to a
template argument, which allows the compiler to check this at compile time (of
course, this restricts you to statically sized matrices).

~~~
joshuamorton
Good to know. I was apparently unaware how powerful templates were.

~~~
laverick
[https://stackoverflow.com/a/22645853](https://stackoverflow.com/a/22645853)

~~~
joshuamorton
I think I knew that templates were Turing complete, but so are java generics,
it's just that to get dependent types in generics you have to reinvent the
integers within the generic system. Not so for templates, which I didn't
realize. That's pretty nifty!

------
ValleyOfTheMtns
FYI, it requires Python 3.6+. It mentions it in the article towards the end,
but if you're like me and prefer to jump straight into trying something out
you may not have seen it. I wasted a bit of time trying to figure out what the
ContextManager is in the Python typing module and why it couldn't find it.

------
mkolodny
This looks wonderful :) I'd love to try it out at some point.

One thing that I think could really improve the documentation is a few
examples! One of my favorite things about the Python docs and the community is
the wealth of examples. From looking at the docs, I couldn't find the main
thing I wanted to see - what would MoneyType's annotations look like if I used
it?

------
jimnotgym
Reading all of the comments from engineers who seem to either posess a time
machine to send current tech back in time, or are criticising the technical
choices that made the founders $squillions, is making me a bit mad. As a
diversion perhaps some of them could list a few billion dollar startups that
made perfect choices at the start and never had any cause to refactor or
reimplement code as they grew?

~~~
mbid
It's not obvious (and IMO somewhat doubtful) that it was the technical choices
of the founders that made them successful. Their choices could well have been
bad, just not bad enough to make their business fail.

------
dilap
> At Instagram we have hundreds of engineers working on well over a million
> lines of Python 3.

Man, that's crazy. At the time they were acquired by Facebook, they had 13
employees.

~~~
kilpikaarna
This was the part that stood out to me also. They must have a ton of new stuff
in the pipeline, or their "display photos, insert some ads in between" loop is
way more complex than it seems.

Or that's total number of engineers and way fewer actually twiddle the
Python...

------
Lxr
Is their goal to annotate everything or just the non-obvious things? Also how
would a tool like this handle cases where the “correct” type is a generic base
class but at runtime it only sees a certain subclass? To be pythonic, a
function that accepts a tuple should usually also accept a list for example,
but at runtime that may never happen.

~~~
ambivalence
Since this is how gradual typing works, the goal is to annotate every last
function.

Good question about abstract base classes! Paraphrasing a well known cliché:
types in functions should be forgiving in arguments (what the function
accepts) and strict in return values (what the function emits). In our case,
the human reviewer needs to decide if the argument types collected by
MonkeyType should be generalized. In fact, the collected types might not even
work in all cases and the type checker might complain. It's because
annotations describe "what should be" whereas MonkeyType finds "what is". This
is why a system like MonkeyType shouldn't even attempt to use abstract base
classes in place of concrete types that it collected.

------
muizelaar
I wonder how this compares to PyAnnotate: [http://mypy-
lang.blogspot.ca/2017/11/dropbox-releases-pyanno...](http://mypy-
lang.blogspot.ca/2017/11/dropbox-releases-pyannotate-auto.html)

------
maltalex
Reading this as someone who writes mostly in statically typed languages, the
whole exercise seems odd.

Having so much dynamically typed code to maintain that you need to run
production code using a separate tool just to figure out the types sounds just
wrong. Why not use a statically typed language for such a large code-base? Is
this done by purpose, or did they end up with a million lines of Python code
and are looking for ways to make the maintenance easier?

And before I get down-voted to hell - I completely understand using Python for
many things. It a good technical choice for many different problems, but
navigating a million lines of Python seems just daunting to me (although maybe
I'm just not experienced enough with Python).

~~~
nawgszy
>Is this done by purpose, or did they end up with a million lines of Python
code and are looking for ways to make the maintenance easier?

Definitely the latter. I've seen this discussion a few times before, and it's
always the same. Your initial developers are not looking down the road to the
million lines of code milestone, they're just trying to make a product that
might actually make some money here and now.

I'm sure Instagram was exactly that. They needed to handle images and some guy
knew how to do it in Python. They wrote Python code, and then people liked
Instagram. They eventually became a billion dollar company with millions of
lines of code and no where along the road was there time to say "hey we need
to refactor this whole thing". Or if that was said, management laughed and
said "we need this feature".

So here is where you end up. The developers need to clean things up but they
don't have time to clean it up by using a language, realistically, they
probably don't know as well as the Python they wrote the millions of lines of
code in.

Re: your last comment, navigating a million lines of any codebase is daunting,
and especially more so if you aren't a developer in that language. I'm not
sure what exactly "Python" has to do with that, besides that you're not a
Python dev.

~~~
imiric
To add to this, note that type hinting is quite a new feature in Python
(introduced in v3.5, released in 2015), and this functionality simply wasn't
available before. So any company heavily invested in Python today obviously
wants to improve their runtime reliability, without having to rewrite parts of
their stack.

Stricter typing goes a long way to achieve this, and gradual typing allows you
to upgrade the code base at your own pace, which is great.

Consider this study[0] about TypeScript and Flow, which use the same approach
for JavaScript, which found both able to detect ~15% of runtime bugs. So no
wonder companies with large Python code bases would be the first to invest in
this space.

Personally I feel this is a great addition to the language, and hope type
checking becomes a first class citizen too, instead of being delegated to
external tools like mypy[1] or pytype[2].

[0]:
[http://ttendency.cs.ucl.ac.uk/projects/type_study/](http://ttendency.cs.ucl.ac.uk/projects/type_study/)

[1]: [http://mypy-lang.org/](http://mypy-lang.org/)

[2]: [https://github.com/google/pytype](https://github.com/google/pytype)

~~~
miohtama
Dropbox is very heavily invested in Python. I am under impression they hired
Guido van Rossum to do exactly this, among other things. First 100% statically
type the old codebase, then port it to Python 3.

You can statically type Python 2 codebases, but the language does not offer
native support for it. Thus, all needs to go to docstrings or comments.

------
domenukk
Similarly, I'm amazed by Pycharm (IDE), which supports pretty good inferred
type hints from debugging and static code anlysis, btw. Makes writing code a
lot easier. Looking forward to trying MonkeyType, it seems awesome for lager
projects.

~~~
pvg
Pycharm relies on pre-generated annotations as well, they're just built-in -
it's not as magically trace-y and infer-y as it might seem at a casual glance.

------
tandav
Why not use cython? It has types

------
true_religion
How is this different from Dropbox's PyAnnotate[1] which was released a few
weeks ago?

[1]
[https://github.com/dropbox/pyannotate](https://github.com/dropbox/pyannotate)

------
CGamesPlay
Why didn't you make the Python 3 type checking advisory instead? Like what
Facebook did with Flow and Hack, why not make write a product that lets you
statically analyze the types and will never itself cause runtime errors, and
transition to types that way? What advantage does building this tool have? I
understand that the thing I'm describing involves modifying the way Python 3
handles type annotations, but it doesn't seem like more work than building out
all of the instrumentation you've done here.

~~~
ambivalence
I'm not quite sure what you mean. Python has an external static checker for
types, it's called mypy.

Python's type annotations are in fact very similar to Flow and Hack in the
sense that they provide gradual typing. The specification (see: PEP 484)
describes that only annotated functions are type checked. Calls to non-
annotated code are treated as accepting any type in arguments and returning
the Any type (a special type which effectively silences the type checker).

This generates a chicken and egg problem: if you don't have enough functions
annotated, the type checker won't be able to provide meaningful output to you.
So convincing people to annotate their code is harder: they don't see the
benefit right away. Worse yet, you already have tens of thousands of functions
in your code that you know work in production but were written before type
annotations were introduced. It's not really feasible to come back and fill
this information manually.

MonkeyType is a tool that gathers types at runtime and enables putting them
back in your code as annotations. The goal is for mypy to have more
information to work with, making it way more useful.

~~~
CGamesPlay
Sorry, I guess I jumped to the conclusion that you profiled in production
because you had a mandatory type checker which would cause runtime errors if
invalid types were passed. Let me try another question. Do you think that a
project like MonkeyType would substantially aid adding flow annotations to a
JavaScript project? If so, why is the value to the Python ecosystem higher
than the value to the Flow/JavaScript ecosystem (i.e. why doesn't it exist for
JS)?

~~~
jw-
It's already been flirted with in JS I believe [1], though not specifically
for Flow. The problem is that it falls down in the higher order case, which
happens quite alot in JS. Also, I don't think JS has a mechanism like
sys.setprofile that deals with alot of the pain points.

[1] [https://medium.com/fhinkel/runtime-type-information-for-
java...](https://medium.com/fhinkel/runtime-type-information-for-
javascript-b134faac3c0a)

------
sitkack
As both a dynamic and static type enthusiast, back typing dynamic code is
extremely problematic. Fluent use of a dynamic language will use and create
constructs that are nearly un-typeable. If you want to make typed code, start
typed. If you code with implicit types, use a good type inferred language (ML,
F#, etc). If you want to use type checking in Python, use the annotations and
MyPy from the beginning.

That said, I am not saying this tool is bad. It could very well help a lot of
codebases, but I would warn against using it as part of the operational
workflow.

~~~
viraptor
> Fluent use of a dynamic language will use and create constructs that are
> nearly un-typeable.

Do you mean cases where you accept "anything iterable", an issue with
callbacks / template types, or something else?

------
z3t4
> With MonkeyType’s help, we’ve already annotated over a third of the
> functions in our codebase, and we’re already seeing type-checking catch many
> bugs that would have otherwise likely shipped to production

Are most of their code not yet in production ? Or why do they produce more
bugs now with static types, then before with dynamic types !? This sounds a
lot like homeopathy, that can both detect and cure diseases with placebo.

------
hacker_9
_" At Instagram we have hundreds of engineers working on well over a million
lines of Python 3."_

It always amazes me that some of the most popular products around are built
with the worst technology choices. And now they had to build their own static
type checker, which slows down random samples of real users, just to shore up
the language's weaknesses? Outstanding.

~~~
grandmczeb
If lots of successful projects use what you consider the “worst” technology,
perhaps the problem is with your perception rather than the technology?

~~~
alsadi
Those people believe that the right choice is assembly unless you can write
directly in hex /s

~~~
thechao
I was taught assembly at the job by an old assembly guru. His first statement
to me was something along the lines of: "we'll use an assembler to start with,
but it doesn't generate very good code; so, once you're comfortable, we'll
hand assemble our machine code".

~~~
AlexCoventry
Was he just messing with you? What's a case where assembly is not isomorphic
to machine code?

