
Types for Python HTTP APIs - YoavShapira
https://instagram-engineering.com/types-for-python-http-apis-an-instagram-story-d3c3a207fdb7
======
dragonsh
In our startup we use marshmallow [1] to validate and make REST API type
aware. So marshmallow models validates the data types based on validation
rules for request/response.

In place where we need client to be able to define the validation the
request/response will consist of data, validation json schema. In this case
use jsonschema validator [2] [3].

So for every json request/request data, it's passed through a marshmallow
model which validates the json. In some cases we use jsonschema validator when
we have dynamic json data with schema definition for validations.

For response the function return value pass through a marshmallow model. We
are moving towards making all of internal functions/methods with type
annotations to generate documentation using Sphinx [4] plugin.

We are not Instagram but are very happy with it and can replace flask library
with bottle or other wsgi compatible framework and it will still work.

[1]
[https://marshmallow.readthedocs.io/en/stable/](https://marshmallow.readthedocs.io/en/stable/)

[2] [https://python-jsonschema.readthedocs.io/](https://python-
jsonschema.readthedocs.io/)

[3] [https://pydantic-docs.helpmanual.io/](https://pydantic-
docs.helpmanual.io/)

[4] [https://pypi.org/project/sphinx-autodoc-
typehints/#data](https://pypi.org/project/sphinx-autodoc-typehints/#data)

~~~
henryfjordan
Marshmallow is good but it can be slow. We also use it over a Flask API for
input/output serialization and in the worse cases it can take a significant
amount of time if the objects are large enough (maybe a hundred milliseconds).
We also use the Marshmallow models in conjunction with a project called
`flask-apispec` by the same authors to generate Swagger docs.

I've wanted to explore using the Typing module to replace Marshmallow since it
started making the rounds to see if it results in better performance, but
haven't had a chance. I would have liked to see Instagram release a library to
go with this blog post so I don't have to do as much legwork.

~~~
j88439h84
> Toasted Marshmallow implements a JIT for marshmallow that speeds up dumping
> objects 10-25X (depending on your schema).

[https://github.com/lyft/toasted-marshmallow](https://github.com/lyft/toasted-
marshmallow)

------
timothycrosley
Been doing this before the typing module was released, using hug:
[http://www.hug.rest/](http://www.hug.rest/)

FastAPI has a lot of advantages at this point for someone looking to do the
same:
[https://github.com/tiangolo/fastapi](https://github.com/tiangolo/fastapi)

hug has some catching up to do, some of it just because it got in so early (It
needs to be updated to be compatible with mypy and the types defined
typing.py!) in any case using typing on API endpoints - independent of how you
feel about Dynamic vs Static typing in general, just makes a lot of sense
IMHO.

~~~
tiangolo
Thanks for the mention @timothycrosley! I feel honored :)

------
hyzyla
Looks like FastAPI framework with handler type anotations

~~~
Sytten
Yeah I feel FastAPI will become the new Flask in a few years. The growth this
year has been amazing. I wish I had started my project with it, but I am still
using Flask. I found this nice extension though called Flask-Rebar that does a
similar task with Marshmallow.

~~~
Scarbutt
_Yeah I feel FastAPI will become the new Flask in a few years._

What else does it offers besides performance (switching to async is no free
lunch)?

~~~
acemarke
I ran across FastAPI earlier this year and did a tiny prototype to play with
it. Selling points for me:

\- Integrates nicely with some existing libraries (Starlette, Pydantic)

\- Well documented

\- Auto-validation of endpoints from data models

\- Auto-generation of OpenAPI schemas from those models

\- Auto-serves live API docs from that schema

\- Easy definition of sync and async endpoints

Again, it was just a tiny proof-of-concept prototype, but I'm sold on using it
on future projects.

~~~
ivalm
> \- Auto-generation of OpenAPI schemas from those models

Is there a package that can do a reverse? Can I give it an OpenAPI spec and
get a code stump going?

~~~
BerislavLopac
OpenAPI-generator [0], written in Java, can produce Python stumps, but I never
liked that approach.

Connexion [1] is built on top on Flask and does routing and validation based
on an OpenAPI spec.

I've recently started developing Pyotr [2], which does the same only based on
ASGI and Starlette. It also includes a client module.

[0] [https://openapi-generator.tech/](https://openapi-generator.tech/)

[1] [https://connexion.readthedocs.io/](https://connexion.readthedocs.io/)

[2]
[https://github.com/berislavlopac/pyotr](https://github.com/berislavlopac/pyotr)

------
mattbillenstein
Nice writeup - I've been doing this type of thing for 10 years before type
annotations by passing the types to the decorator and then using kwargs to
supply defaults.

    
    
      @annotate(foo=int, bar=str)
      def view_func(foo=0, bar='hi there'):
        ...
    

The types could also be arbitrary callables to parse things like datetime and
what not. I'd parse the params from either the get params or post data (json,
urlencoded, etc).

But now I'm using graphene and graphql to handle all this -- it's a better way
to do all of it imho. Of course this all came along after instagram, so you
didn't have that choice back then.

------
theptip
Rather than reimplementing this stuff, and assuming you are using Django, you
could just use Django Rest Framework to get OpenAPI, typed serializers, etc.

In that framework your serializers are by default auto-generated from your
model classes; this is convenient to get started, just like Django itself.

------
BerislavLopac
I'm always amazed that, in majority of cases, we use formal specifications --
like OpenAPI -- for documentation instead to guide our implementations. The
common pattern is to build an API first, and then extract routing and
validation information into an OpenAPI spec, which is then used to set up a
test server for clients to develop on top on.

But if you have a relatively clear idea of what your API should look like,
there are great benefits to be gained by providing the specification first.
This way, you don't need for the API to be implemented to start developing the
clients, even by completely unrelated teams. Second, your spec will already
include your routing and validation rules, and there is no need to manually
specify e.g. Pydantic models.

I recently wrote a PoC of a framework [0] that uses the OpenAPI spec to easily
implement a REST(ful) API; in a nutshell, you need to implement endpoint
functions that correspond to the spec's `operationId` names, and it will
automatically route a request to the right endpoint. It is fully ASGI
compliant and as has a bonus client module, which allows you to do the
requests.

[0]
[https://github.com/berislavlopac/pyotr](https://github.com/berislavlopac/pyotr)

------
acemarke
I'd love to see some more details on the tooling they created to generate
OpenAPI schemas by extracting types from their code.

Related: are there any good Python libs for doing request/response validation
based on OpenAPI v3 schemas?

~~~
Sytten
I have used Connexion
([https://github.com/zalando/connexion](https://github.com/zalando/connexion))
in the past. The only thing I didn't really like is that the OpenAPI file
grows rather unmaintainable after some time if you have a big service.

------
dmitriid
No one is mentioning that Instagram app which doesn’t have that many features,
and is quite poorly designed, somehow requires over 2000 API calls to
function.

I would dare anyone come up with even 100 for the Instagram app.

~~~
armatav
endpoints

------
move-on-by
I love types and I’m stoked about them coming back into style. For awhile, it
was a major trend to avoid static types. I think most people’s problem ended
up not being with types themselves- but poor type systems. For myself, I am
particularly impressed with what TypeScript has done with its types, really
amazing how expressive/flexible/and inspective it is. Anyways, just happy with
the trend of embracing types and hopes it continues.

------
ben509
This is a bit lower level as it's only concerned with serializing, but a
package I wrote called json_syntax[1] will take @dataclass or @attr.s classes
decorated with type annotations and build encoders and decoders for them.

It also handles Union types reasonably well and lets you put in hooks to
handle ugly cases. It's used in production on a system with a big complicated
payload, and I designed it to be easily extensible if the standard rules don't
work for you.

[1]: [https://pypi.org/project/json-syntax/](https://pypi.org/project/json-
syntax/)

------
iddan
I’ve been working on a similar solution for Flask in K Health. Currently we
have great serialisation / deserialisation from types to JSON, next step will
be creating OpenAPI documents

~~~
Sytten
You should check out Flask-Rebar, I am using it a lot in my projects and it's
very nice. Does all of this automagically.

------
theSage
I've been using type hints with my bottle projects for some time now. Makes it
a lot easier to write JSON based apis.

[https://github.com/theSage21/bottle-
tools](https://github.com/theSage21/bottle-tools)

------
mikeurbach
When I read the headline and saw the source, I assumed this would be about
GraphQL. I know Instagram utilizes GraphQL, for example on the web client, so
now I'm wondering how that fits in.

------
vkaku
Flask Swagger already has a way to annotate it quite well.

------
solarkraft
I find it really impressive that such a large organization runs anything on
Python. Isn't the speed optimization potential immense? Does the amount of
Python involved just not matter compared to image data?

Secondly I find it really impressive that such a large company with so many
smart people can produce an application so mediocre and make the experience
extra terrible and me wonder what the absolute fuck is up with that company by
trying to block desktop browsers from the perfectly useable on desktop web app
(the one in which you can upload things).

Especially for a photo centric application (that has since begun to be used
for original video production, of course hampered by the insane lack of any
options, starting with the aspect ratio) one could expect a normal work flow
to include transferring photos from a camera to a desktop computer. Making
your browser pretend to be a mobile phone seems like a step that could maybe,
if you really tried (to take out the arbitrary restriction that you must have
explicitly added in the first place), be made unnecessary.

So that's my (condensed) rant about Instagram as a whole.

~~~
symlinkk
Who cares if it's 10% slower running on Python if the whole thing is going to
be distributed across 1000s of containers that are behind layers of caching.
At that point developer productivity is more important than runtime
performance.

~~~
sorenjan
What exactly are those developers doing all day then? Because every time I
open an Instagram.com link I'm struck by how bad it is. The images are small,
the comment list is the same height as the image, so wide but short images
have short comment list, the comments are scrolled to the bottom so the
poster's description isn't visible without scrolling up, videos doesn't have
any kind of playback controls so you can't even see how long they are, and so
on. It's a terrible web site and I don't understand where all the developer
time and productivity is going.

And wouldn't a 10% hardware cost decrease be worth a lot at a company with
such a high load, or is it all rendered so seldom because of the caching?

~~~
gigatexal
Really? You’re dogging the product because you don’t like how it looks on
mobile? The article is about how they use python and types to improve their
APIs.

------
dna_polymerase
The whole Type thing in Python and all the workarounds to make Python type-
safish looks and feels absolutely pathetic. I have no idea why a company like
Facebook takes so much time to apologize for the inadequacy of Python as a
programming language in the broader sense. Especially since they are utilizing
a service based architecture they easily could make the switch to a language
that actually supports types. Also the obvious performance gains...

~~~
joshuamorton
Python does support types though. It's type system is, imo more pleasant and
powerful than Javas.

~~~
rytill
Can you elaborate on this? I’d love to actually use types in Python.

~~~
acemarke
Python 3 has optional static typing:

[https://docs.python.org/3/library/typing.html](https://docs.python.org/3/library/typing.html)

[https://realpython.com/python-type-checking/](https://realpython.com/python-
type-checking/)

~~~
nurettin
Static typing is exactly what python doesn't have. It will remain dynamically
typed. It has type hints which can be used with external tools to enforce type
correctness to some degree, but python code will run even if you assign:

    
    
        def f(x: int):
            x = "str"
        f(42)

~~~
joshuamorton
This is true of java and c as well.

You can cast things or subvert the type system and the code will still run.

At least where I work, my build process prevents me from running Python code
with invalid type annotations, so it's exactly the same as for Java is cpp or
any other statically typed language.

~~~
nurettin
In my opinion, the main difference between the two from a developer's
perspective is this;

You can't grep for dynamic type errors. You can grep for a cast to see where
you've made a mistake and find a way to do it without a cast. This of course
goes for the named casts in C++. It is harder to grep for casts in C.

~~~
joshuamorton
Right, but if you run mypy over your code, you're in the same situation as in
java. It's a one liner to run mypy before you invoke your tests. You have to
cast to subvert the type system.

In c you have void ptr everywhere anyway.

