
Our journey to type checking 4M lines of Python - signa11
https://blogs.dropbox.com/tech/2019/09/our-journey-to-type-checking-4-million-lines-of-python/
======
jdm2212
The worst, buggiest, least maintainable code I've ever dealt with was a 500ish
line (heavily tested!) untyped Python file at my last job. Caused more trouble
than all the Java and C++ and Go combined.

My personal experience with compile time type checking is that the benefits
kick in for a single developer project within a week, and for a multi-
developer project immediately.

~~~
dbcurtis
I am not going to dispute the value of compile-time type checking. And I admit
to only having had time to skim the link at this point but it looks like a
great read so I bookmarked it for later. That said...

Python can be as strongly typed as you like it to be. But of course, with
Python, instead of insisting on an incoming parameter being of a particular
type, the thing to do is Pythonically ask nicely if it can be the type that
you want. If you need parameter x to be a float, call float(x), which will
raise TypeError or give you a float. And of course, it gives the implementor
of the class the option to implement __float__().

With a small amount of effort, you can design all of your __init__()
constructors to do that style of validation. Yes, it requires some elbow
grease. But you will not be able to pass bad stuff to my __init__ constructors
if I don't want you to. And if you want some parameter y to be an instance of
Foo, just call Foo(y) and if it is already a Foo, you get the Foo, and if it
can be made a Foo, you get a Foo, and if it can't, you get TypeError.

So of course you actually need to exercise the code also, which is where the
hypothesis package comes in.

Anyway, I think one of the reasons Python is popular is that it is very
consistent about types. Haskel and Rust are also very consistent about types
-- most other languages, not so much. Give me one end of the spectrum or the
other. The murky middle is suboptimal.

~~~
joshuamorton
This doesn't work as well as you'd hope.

It requires among other things, a unique and toilsome coding style,
constructor overloading (which isn't a first class citizen of python), and you
only get late runtime validation (which means, for example, you can easily
miss type validation on only one side of a conditional).

Something like pytype detects such errors without any code modifications.

~~~
dbcurtis
Most of what you say I agree with. But it is actually very easy to construct a
__init__ that handles a very flexible parameter list. That style is well
represented in the standard library. And I find that writing a beefy
constructor simplifies parameter validation in methods.

But certainly, it doesn’t remove the need for good test coverage. Now, if you
want to argue for less testing, that is a different pitch.

~~~
joshuamorton
Possible, yes. Easy and straightforward, no. Casework and duck-type checks
over _args and_ *kwargs are not easy to maintain going forward, especially for
a non-expert in Python.

Factory methods are usually better, and I believe what the more modern parts
of the standard library have tried to use, but they don't work as well for
this specific case.

~~~
musingsole
> Casework and duck-type checks over args and *kwargs

Once you start doing this in python -- beyond any low complexity usage --
you're better off capturing your arguments in their own object and passing
this to the function as a single argument. This object can then enforce
whatever you want, and your function can now be a class method.

And then you can build a factory for this object that accepts a few args,
maybe some keyword parameters. But, once that gets reasonably complex, just
rinse and repeat. It's objects, methods and factories all the way down. And
unlimited billable hours.

------
hannofcart
On a near daily basis, I thank you guys for writing mypy. I have a personal
project spanning 2 years or so that I used mypy on 3 months back, and it has
been a pleasure to work with ever since.

Just a few days back, I had had to refactor one of the core classes used
everywhere and mypy told me exactly what to fix. At that point I remember
wondering what I'd have done without mypy.

Thanks a ton.

------
linux2647
Genuinely curious: how does a project like Dropbox grow to four million lines?
In my (albeit naïve/inexperienced) mind, I could see how a file syncing app
could grow to several thousand lines (even tens of thousands), but what all is
in those four million lines? Is it a lot of UI code? Backwards compatibility
for older data models? Building components themselves instead of just `pip
install`ing?

~~~
ampersandy
Let's just dig into one aspect of a file syncing app: storing the files at
Dropbox scale. Without spending too long thinking about it, I'd say we need:

    
    
      * An API to request files/upload files
      * Authentication/encryption (how do we ensure DB employees aren't arbitrarily reading data on servers?)
      * Service that shards data and handles multi-region replication
      * That entails multiple datacenters (how much code does it take to keep a DC running?)
      * Automated backups to cold storage
      * Automated restoration + testing of cold storage volumes
      * Soft + hard deletion (deleting hot/warm/cold storage volumes reliably)
      * Error handling (retries, host errors, network failures, filesystem corruption, finding data when a host dies)
      * Fail-safes like blocking requests when hosts can't handle them, shedding load, etc.
    

I'd also add that at this scale, very few libraries start handling the types
of problems you have. For example, there might be a memcache library that
makes writing queries easy. However, when you have thousands of memcache
servers globally you can't just hardcode IP addresses anymore. You know have
to write your own custom 'service-service' that lets you look up what host you
should talk to for something.

~~~
abernard1
I could imagine a list 10X that size that still does not need even 400K lines
of Python code, much less 4M.

I agree that people generally underestimate the lines of code needed to
operate systems at scale, but having used dropbox, I'd still say that's
excessive. Note, this is just the Python code, not the UI stuff or the golang
or Rust. I've been in multiple 1M-10M line codebases at scale and I just
cannot fathom with how with their product simplicity (not engineering
simplicity) they could be at that size with a language as expressive as
Python.

My guess is this is generously counting a lot of forked libraries.
Ungenerously, it makes me think there's a lot of NIH syndrome.

------
ledauphin
MyPy has worked very well for our small team of backend developers.

It's already production-ready in my opinion, but it still lacks some modern
type-system features that you may be used to from other languages. Ironically,
for instance, before Protocols were introduced (and they're still not part of
Python core), there was no way to express structural subtyping in MyPy (for a
duck-typed language!). And I'd argue they've made the same mistake (and a
worse one in some ways) with TypedDict, which is a strict/nominal set rather
than an a structural one, making it impossible to use on many real-world
collections that have key-value pairs that you want to forgo typing for
whatever reason.

Still, Python with MyPy is a huge step up from Python without, and mypyc looks
interesting as well.

------
j88439h84
Using mypy with [http://attrs.org](http://attrs.org) for _all_ my classes has
been a revolution in my Python codebases. I can't emphasize enough how big a
difference it makes in readability and correctness.

    
    
        @attr.dataclass
        class Person:
            name: str
            age: int

~~~
meowface
What advantage is there to using attrs instead of the new dataclasses module
in the standard library?

~~~
j88439h84
From "Why attrs"[1]:

PEP 557 added Data Classes to Python 3.7 that resemble attrs in many ways.

They are the result of the Python community’s wish to have an easier way to
write classes in the standard library that doesn’t carry the problems of
namedtuples. To that end, attrs and its developers were involved in the PEP
process and while we may disagree with some minor decisions that have been
made, it’s a fine library and if it stops you from abusing namedtuples, they
are a huge win.

Nevertheless, there are still reasons to prefer attrs over Data Classes whose
relevancy depends on your circumstances: attrs supports all mainstream Python
versions, including CPython 2.7 and PyPy.

Data Classes are intentionally less powerful than attrs. There is a long list
of features that were sacrificed for the sake of simplicity and while the most
obvious ones are validators, converters, and __slots__, it permeates
throughout all APIs.

On the other hand, Data Classes currently do not offer any significant feature
that attrs doesn’t already have.

attrs can and will move faster. We are not bound to any release schedules and
we have a clear deprecation policy.

One of the reasons to not vendor attrs in the standard library was to not
impede attrs’s future developement.

[1] [http://www.attrs.org/en/stable/why.html#data-
classes](http://www.attrs.org/en/stable/why.html#data-classes)

~~~
pmoriarty
_" They are the result of the Python community's wish to have an easier way to
write classes in the standard library that doesn't carry the problems of
namedtuples."_

What are the problems of namedtuples?

~~~
hermitdev
IIRC, one of the biggest complaint is about performance. namedtuples are not
known to be fast, but I think performance has gotten better in more recent
Python versions.

This is just what I recall from reading PEPs, the email lists and release
notes. My recollection may not be entirely accurate.

------
j88439h84
The article mentions PyPy wasn't fast enough on their code, so they wrote a
whole "new language". I wonder if they tried contributing some relevant
performance improvements to PyPy. I ask because in my experience PyPy can make
a massive difference if I'm using it on CPU-intensive code, and it'd benefit
everyone to have a faster PyPy.

~~~
zawerf
Dropbox's implementation of python was pyston:
[https://github.com/dropbox/pyston](https://github.com/dropbox/pyston)

But I think it's dead now:
[https://blog.pyston.org/](https://blog.pyston.org/)

------
j88439h84
> We thought about running it automatically on every test build and/or
> collecting types from a small fraction of live network requests, but decided
> against it as either approach is too risky.

I wonder what's too risky about it? Did you try it and something broke?
Instagram's MonkeyType seems to think it works ok in production.
[https://monkeytype.readthedocs.io/en/stable/stores.html](https://monkeytype.readthedocs.io/en/stable/stores.html)

~~~
boulos
Treating “all the types we happened to see at runtime” as “all the types
allowed” _automatically_ is I think what they meant as “risky”.

MonkeyType seems to record the answers and then let you decide (via apply) if
you want to agree. I’m a little surprised more people didn’t use this
approach, but it probably makes for fairly clumsy type sets that the human is
better off saying “Eh, I’ll just document it correctly”.

------
Waterluvian
How is python's added typing compared to TypeScript? Is it expressive enough
to completely supplant traditional Python or is it just something you add in
sometimes to help out a little?

~~~
justwalt
If I'm not mistaken, all of the types are akin to comments: completely ignored
by the interpreter.

~~~
jdm2212
So there's the same situation as with TS where you can't do runtime reflection
to get type annotations? That seems frustrating. The one really rough edge
I've seen in TS is type guards. Not the end of the world, obviously, but still
kinda gross coming from Java.

~~~
j88439h84
You can get annotations for functions and classes with
`typing.get_type_hints()`.

------
azhenley
This is an incredible story! Love hearing how academic projects can evolve
into success stories like this. I'm looking forward to an update a few years
from now.

------
ajxs
The only shocking thing here is that no one stepped in earlier. It's not like
Java/C# are radical new technologies in the online space. Obviously statically
typed languages are not a new concept either. I'm just aghast at this culture.
It's actually quite shocking that this debate is a real one taking up a large
amount of airtime in the industry.

~~~
mbar84

      > I was trying to find ways to make it possible to use the
      > same programming language for projects ranging from tiny 
      > scripts to multi-million line sprawling codebases, 
      > without compromising too much at any point in the 
      > continuum. An important part of this was the idea of 
      > gradual growth from an untyped prototype to a battle-
      > tested, statically typed product.

------
gitgud
The article show's several graphs of _Lines of Code_ over Time. Isn't this an
outdated way to measure anything in software?

 _" Measuring software productivity by lines of code is like measuring
progress on an airplane by how much it weighs."_ \- Bill Gates

~~~
stagger87
But they aren't measuring software productivity by lines of code...

------
siempreb
About 4 million lines of Python code is just a bad decision from the start.
Now they are trying to fix it with types, but IMHO the fault is not in the
language. They just should have started out with a statically typed language
that scales better.

~~~
ledauphin
Yeah, I'm sure Dropbox's founders remember and regret that fateful day when
they said to each other, let's write 4 million lines of Python to get this
fledgling business idea off the ground!

------
epage
I feel mypy has been a big help in the correctness and approachability in my
code I write. My big issue is that it is too easy for `Any` to (implicitly)
show up, making it so parts of my code are (silently) not checked. I wish
there was a good way to spot this; maybe I need to dig more into the docs.

~~~
j88439h84

        mypy --help 
    
    
        Disallow the use of the dynamic 'Any' type under certain conditions.
      
        --disallow-any-unimported
                                  Disallow Any types resulting from unfollowed
                                  imports
        --disallow-subclassing-any
                                  Disallow subclassing values of type 'Any' when
                                  defining classes (inverse: --allow-subclassing-
                                  any)
        --disallow-any-expr       Disallow all expressions that have type Any
        --disallow-any-decorated  Disallow functions that have Any in their
                                  signature after decorator transformation
        --disallow-any-explicit   Disallow explicit Any in type positions
        --disallow-any-generics   Disallow usage of generic types that do not
                                  specify explicit type parameters (inverse:
                                  --allow-any-generics)

------
Barrin92
the article mentions that the desktop client app also uses python, is dropbox
still using qt?

the reason I'm asking is I remember a blog post from earlier this summer
announcing the new desktop app which looked very webtech-ish.

------
stmw
This is impressive. But IMHO, anyone starting today should just use Rust from
the beginning, and avoid having to endure all of the problems for years and
then taking on a big HN-worthy post about 4M lines.

~~~
vonseel
Maybe if you're Dropbox, Rust is an option, but Rust is much, much more
complicated than Python and not nearly as approachable - especially for
beginners.

~~~
ajxs
I'd argue that how complicated a language is shouldn't be a factor for a
company at that scale. If it's the right tool for the job, and I'm not arguing
that it is, then they have the ability to hire engineers capable of using it.

~~~
oscargrouch
I dont know the experience of others with this, but in reality is not that
easy to have people with enough brain power or compromise for languages like
C++ and Rust.

Sure theres "enough for the world", if you sum everybody, but i bet that even
a company of the size of Dropbox would have a hard time hiring everyone they
need if the complexity of the language is the same as Rust or C++.

For instance, if you read the Go manifesto to why the language was created,
you can read between the lines that C++ getting in the way where enginners
were fighting hard to deal with language complexity.

So the same way as Java before them, they created a language for the "average
programmer". And you can see how the language is very pragmatic and how they
fight to no put more complexity on the language.

By the way, i know its a personal opinion, but i think Rust will not be a
great fit for the cloud backend exactlya because of this kind of thing. It
will be hard to create codebase as complex as the ones in Java, because there
will be much less man-hours available.

Maybe if generations are getting smarter, IDK, but needing to rely on a big
head-count of C++ or Rust enginners for any company, i bet they will have a
hard time to fill all the positions they need.

Theres much more C++ enginners than Rust right now, and companies have a hard
time hiring them (and i bet that for Rust it will be no different giving its
complexity).

~~~
speedplane
> if you read the Go manifesto to why the language was created, you can read
> between the lines that C++ getting in the way where enginners were fighting
> hard to deal with language complexity.

My understanding was that the push from C++ to Go was largely motivated by
long C++ compile/linking times. Go is designed to compile large code-bases
much faster than C++. I've experienced this myself, it's pretty frustrating to
wait 2-5 minutes for a C++ app to compile / link just so you can debug it for
30 seconds and start all over again.

~~~
hermitdev
> I've experienced this myself, it's pretty frustrating to wait 2-5 minutes
> for a C++ app to compile / link just so you can debug it for 30 seconds and
> start all over again.

That's quaint. When I first started working in C++ professionally, we were at
+16 hours for a clean build and link of just one of our systems (circa 2004).
Over the years, reduced the amount of code (without sacrificing
functionality), shared common code (object model) between client and server
(went from about 2M LOC to about 250k LOC), reduced the number of external
dependencies, introduced precompiled headers on both windows and Linux, and a
clean build was down to about 30 minutes. An incremental build and link could
be less than 30 seconds, depending on the scope of the change. But, yeah, C++
leaves a lot to be desired in terms of edit, build, test, repeat cycles.

