
Python at Netflix - luord
https://medium.com/netflix-techblog/python-at-netflix-bba45dae649e
======
kgraves
> "Python is the industry standard for all of the major applications we use to
> create Animated and VFX content, so it goes without saying that we are using
> it very heavily..."

Interesting... very curious to see Netflix using tools in the VFX industry,
Shotgun & Nuke to name a few, I wish they can expand more on this.

~~~
erlangNewb
Netflix produces content too

~~~
mixmastamyk
A lot of it, which needs compositing, post-production, encoding, etc. Python
is great for pipeline code to script the compiled libraries that do the number
crunching.

Some of it is licensed/farmed out to places like DreamWorks however (new
Voltron/She-ra), which is mostly a Java/Spring shop, to my knowledge.

------
sametmax
Do you know what the Netflix culture is regarding remote work ?

I've been looking for a job for the first time in 10 years of Python, and
Netflix is one of the rare big companies that I still have respect for.

[https://jobs.netflix.com/](https://jobs.netflix.com/) has several offers I
would be a good fit for, but I'm not ready to relocate to the US.

~~~
jimmaswell
As someone who has worked on fully remote game dev volunteer projects for
nearly 10 years, it's unbelievable how much the cultural inertia of physical
presence has prevented companies from making the rational move of eliminating
all possible office space in favor of remote work and skype meetings. So much
money wasted on rent, security, and upkeep and so much time (and even lives)
lost to commuting when there isn't a single compelling reason programmers need
to cohabitate a physical office to get their job done.

~~~
theshrike79
Communication between team members is a huge issue in remote work.

It works if everyone is on site. It also works if everyone is remote.

But when some are at the office and some aren't, communication and information
sharing becomes a lot harder. Mostly it's a process and tool issue, but still
humans will rather just turn around and ask the team than spend a minute
writing their issue on Slack/whatever.

~~~
heleninboodler
It's true. I worked at a company that was 100% remote when it started and kept
the remote culture as it grew from 4 to 25 people, but one city in particular
ended up having enough people that they decided to rent space in a big shared
office, coworking-style. Ultimately, people who worked in that space ended up
being more connected to what's going on, due to physical proximity. You wander
past a hallway conversation and you end up joining, or at least knowing about
it. You go to lunch with your coworkers and happen to get a little work talk
in. You ask questions more easily from someone who's right across the desk
from you because you can read social cues about how "interruptible" they are.
Jokes come across better in person than via messaging. There are lots of
subtle ways in which in-person interaction is just inherently different than
remote, and the office people end up tighter.

~~~
Macha
It's not even remote vs non-remote. My employer had many medium sized offices
of 100-200 people, and we merged with a company that had one single 5000+
person campus. That was a major culture clash for sure in terms of
documentation and processes

------
glckr
Interesting. I had it in my head that Netflix was a pure Java shop. Was I
wrong in the first place, or has something changed?

~~~
2StepsOutOfLine
From what I understand from their talk on how they (don't) do devops they
adopt a polygot architecture and teams use different languages.

~~~
linuxdude314
Different teams using different languages doesn't mean they don't do DevOps.

~~~
monsieurbanana
I read that as "in a talk on how they don't do devops they also happened to
talk about how their teams are polyglots".

------
prdonahue
At Cloudflare we use their Python-based Lemur
([https://github.com/Netflix/lemur](https://github.com/Netflix/lemur))
application to issue (on some days) 1M SSL certificates.

------
xtreak29
I am wondering if PyPy is used somewhere for workloads that need performance.

~~~
mixmastamyk
Probably not, more likely to call out to their numerous Java projects and
C/C++ video tools.

------
Tycho
Does anyone have more info about this or know of similar projects?

 _We lean on the many of the statistical and mathematical libraries (numpy,
scipy, ruptures, pandas) to help automate the analysis of 1000s of related
signals when our alerting systems indicate problems. We’ve developed a time
series correlation system..._

~~~
aaronblohowiak
What would you like to know?

~~~
Tycho
The correlations part specifically, what does it do, what’s the modelling
approach, and is there more info about it anywhere?

~~~
relix42
Initially we wrote the library to help us answer the question about what
change may be causing impact to our customers' experiences. Out of the
billions of time series metrics we have we knew there were about 10,000 or so
likely (for this initial use case), possible candidates and we wanted to
reduce that set as far as possible and either (1) find /the/ candidate for
problem or (2) produce a short list of things for the humans to look into. In
order to get to the point where we could ensure high likelihood of apropos
correlations, we needed to do some work on the signals first.

First we detect if any of the possible candidate signals were born or died in
the interesting time period. We use these time points to reduce the window
we'll use to pass to the correlation functions. We can also detect any
changepoints in the time series and apply similar logic. Once we've determined
the best window bounds for each candidate signal, we use pearson and spearman
correlation functions to get a score for the pair of signals -- the initial
signal that started the inquiry and the candidate signal using the determined
time window.

The code is about 98% data preparation, signal analysis, and window
determination and about 2% correlation work.

(I've tried to summarize quite a bit, let me know if you'd like clarifications
or have other questions.)

------
gvd
I fail to understand the use of python in a distributed environment while the
language has such poor concurrency support (on top of the lack of a type
system). You can make your application HA, but they are obviously not trying
to squeeze out every CPU cycle.

~~~
gshulegaard
> I fail to understand the use of python in a distributed environment while
> the language has such poor concurrency support

Because it's a distributed environment probably is exactly why. Python has
(arguably) great concurrency support apart from Multi-threading.

[https://www.youtube.com/watch?v=MCs5OvhV9S4](https://www.youtube.com/watch?v=MCs5OvhV9S4)

So if you need concurrency in the context of a single thread, then Python's
GIL is a non-starter. But a distributed environment is not likely one of
those.

Edit: I should amend concurrency in a single thread to: concurrency in a
single thread that is compute gated...since coroutines can give you pseudo
concurrency in a single thread provided you're workload has blocking steps
like IO or TCP calls.

~~~
weberc2
Even if your app is IO bound, Python's concurrency is painful. Because it's
not statically typed, it's too easy to forget an `await` (causing your program
to get a Promise[Foo] when you meant to get a Foo) or to overburden your event
loop and such things are difficult to debug (we've had several production
outages because of these class of bugs). Never mind the papercuts that come
about from dealing with the sync/async dichotomy.

~~~
jnwatson
Both problems have built-in debug solutions in recent versions of python. The
event loop will literally print out all the un-awaited coroutines when it
exits, and you can enable debug on the event loop and have it print out every
time a coroutine takes longer than a configurable amount of time.

~~~
weberc2
> The event loop will literally print out all the un-awaited coroutines when
> it exits

IIRC, I've only ever seen "unawaited coroutine found" (or similar) errors;
I've never seen anything that points to a specific unawaited coroutine. In
either case, a bug in prod is still many times worse than compile time type
error.

> you can enable debug on the event loop and have it print out every time a
> coroutine takes longer than a configurable amount of time

I don't run my production servers in debug mode, and even when I do manage to
find the problem, I have limited options for solving it. Usually it amounts to
refactoring out the offending code into a separate process or service.

An extreme counterpoint is a language like Go which

1) Is roughly 100X faster in single-threaded, CPU-bound execution anyway

2) Allows for additional optimizations that simply aren't possible in Python
(mostly involving reduced allocations and improved cache coherence)

3) Has a runtime that balances CPU load across all available cores

This isn't a "shit on Python" post; only that concurrency really isn't
Python's strong suit (yet).

~~~
mixmastamyk
These are not really an issue in vfx production and other things Python is
used for.

~~~
weberc2
It’s a problem for lots of things Python is used for, but maybe not vfx
(whatever that is).

~~~
mixmastamyk
They are using it for things it’s good at, for others they use java. So this
subthread is largely a waste of time.

~~~
weberc2
Who is “they”? What is your point?

~~~
mixmastamyk
They is Netflix, and other post-production oriented users. You know, what this
article and discussion is about?

~~~
weberc2
It's not at all obvious that "they" refers to "netflix and other post-
production oriented users", and your argument is a tautology "Python is good
at the things that Python is good at". Obviously. The rest of us are debating
what those things are or are not.

~~~
mixmastamyk
The subject is well-trodden, there's not much to debate. Python is not good at
threading, but works well in multiprocessing situations. Netflix is using it
in the later situation, and not the former. Async is unlikely to be a use case
either.

~~~
weberc2
> The subject is well-trodden, there's not much to debate

And yet we see the same incorrect information trotted out over and over again.

------
bredren
Interesting to see a touch of Flask used for some internal APIs.

~~~
alexpotato
People talk a lot about complicated stacks but Python Flask + some basic
HTML/CSS/Vanilla JS can solve a LOT of problems.

This is especially true inside big orgs that have lots of silos and need
"Rosetta Stones" that translate between the silos.

------
bouncing
I had actually looked at Flask-RESTPlus for a relatively recent project before
deciding to use FastAPI.

~~~
randomsearch
Surprised they use Flask.

I'm working on my first larger, industrial-strength REST API in Python and
I've found the Django Rest Framework to be more suited once you get to that
level of complexity.

~~~
bouncing
Probably whoever made it just preferred Flask.

django-rest-framework is nice because it's maintained well and it's consistent
with the rest of Django, but plenty of people just prefer the Flask way of
doing things, even if it's a scrappier set of tools in some ways.

~~~
swah
I'm using DRF in a project and don't really like the code/magic I end up with.
IIRC I was happier with Flask and SQL... (But of course, for the bits that
maps properly into its model, DRF its awesome)

~~~
bouncing
Yeah. You can always get into what you do/don't like about a framework, but
the same can be said of Flask.

FWIW, I don't like DRF's reliance on serializers that do more than
serialization.

------
whoevercares
Metaflow looks like a superhero solution for ML/DL. Hope they open source it
one day

------
sandGorgon
> _The ability to drop into a bpython shell and improvise has saved the day
> more than once._

what do you really do here - connect to the flask instance and route requests
manually ?

~~~
aaronblohowiak
Our python code does _not_ handle user requests directly. Our team uses python
to control the control plane of traffic — we flip dns records, control cross-
region proxying, re-steer cdn-based reverse proxying and scaling of the
hundreds of micro services that power the Netflix experience. We’ve improvised
various custom traffic distribution patterns, operated outside of our normal
traffic-shifting workflows, and written/run quick scripts to modify scaling
fleetwide.

~~~
sandGorgon
hi - so im actually curious about the infrastructure that allows you to
connect bpython to it. Is it a rq process that you connect to, etc ?

I would love to setup an architecture like this that lets me connect to stuff
through a CLI. Also, I'm assuming you are on kubernetes (or something). How
does this bpython business work through all those layers ?

~~~
aaronblohowiak
We ssh into a box in production, run bpython and import the libraries ( some
environmental stuff, boto3, our own code, etc). At that point, there isn’t
much difference from a bpython session and the regular application code...
Ssh’ing into a production box does involve connecting through a bastion, and
knowing which box you want to connect to, but that isn’t too hard with
spinnaker.

~~~
sandGorgon
pretty cool. We have struggled with doing this in Flask, because the whole
codebase starts getting peppered with flask_context everywhere. So importing
some libraries and code starts spawning flask.

Not sure if you use different frameworks that dont have this issue.

------
z3t4
Wasn't Netflix a Node.JS shop two years ago when Node was popular and now when
Python is the most popular they are a Python shop ? :P (sarcasm) It's good to
use many languages as it favors a micro-service architecture.

~~~
tombert
Is Python the most popular language now? I'm not trying to argue, I'm trying
to see how out-of-the-loop I am nowadays.

~~~
passthejoe
I've been resisting Python for years, but it just keeps getting more popular
(and support and more/better libraries come along with that), so I might just
stop fighting using it for local scripts and small GUI projects.

~~~
felipellrocha
Why would you resist it?

~~~
zornado
It would be nice if the language was typed. Easier for overall maintenance,
and refactoring for big projects.

~~~
stuxnet79
Python 3 is typed. Not strongly typed but the point remains. Python is used in
a lot of domains and I feel the strides the language has made over the past
decade have made it very competitive and a worthy addition to any programmer's
toolbox.

~~~
d0mine
Python is strongly typed. It is not statically typed.

------
ptah
now to get the apple tv app to use more than the bottom third of screen for
letterbox sized navigation by making the massive "thumbnail"/trailer at the
top a third of the screen, so i can actually find something good to watch...

------
butterpeanut
Id like to contribute to the netflix code base by offering what is obviously a
missing but much needed piece of code:

def enableAutoPlay( flag ): annoyingAutoPlay = flag return

~~~
pletnes
Afaik you can disable autoplay in the options?

~~~
bhauer
At least with the web UI, you can mute the auto-playing video and subsequent
auto-playing videos will remain muted.

------
prolepunk
"You've read 11 stories this months, lets make things official".

Medium is trying to look like NY Times, minus any kind of effort to actually
write the stories.

~~~
lma21
Why can't I read their articles through Pocket?

------
peterwwillis
Reason #8,201 not to model your org after FAANG: writing custom software just
to operate your site is like building your own tools to till your farm. I
suppose if you hire world-class toolmakers you'll have some very good tools,
but that's not quite the point of the farm, is it?

~~~
fossuser
If you hire world class tool makers to make the best tools for your farm then
your farm has a strategic advantage.

Really good tools can make a big difference.

You really do need world class though - the trade off of custom means your
tools really do need to be a lot better than the standard.

------
1ba9115454
Most of these use cases would be better served with a type safe language.

~~~
inglor
You think you're being downvoted because people like dynamic typing very much.

You're being downvoted because:

A) You left a comment with a strong opinion without expressing it while
dismissing other peoples' experience B) Python is in fact a type safe language
for a while now
[https://docs.python.org/3/library/typing.html](https://docs.python.org/3/library/typing.html)
\- you have pluggable "a la bracha" types.

~~~
philwelch
Python’s type annotation mechanism is somewhere between afterthought and joke.
Nobody uses it and nothing is built with it in mind; it can be anywhere from
needlessly complicated to utterly impossible for code that interacts with
third-party libraries to successfully discover and import class names for type
annotations. Making matters worse, you need to use even more extra libraries
for the type annotations to actually do anything.

~~~
inglor
I beg to differ, I've seen it adopted in quite a few places by large
companies. I've rarely seen it used in startups though. My evidence is
anecdotal (so is yours though).

I definitely agree that it's not seeing nearly as much adoption as let's
say... TypeScript in the JavaScript ecosystem though.

Facebook even wrote a package to generate types automatically for existing
code
[https://github.com/Instagram/MonkeyType](https://github.com/Instagram/MonkeyType)
.

~~~
mixmastamyk
Re: startups, it's more useful in a large, mature code-base rather than
smaller projects still in the design-churn phase.

~~~
vtomole
FWIW, we use mypy in Cirq
([https://github.com/quantumlib/Cirq](https://github.com/quantumlib/Cirq)) and
we haven't reached version 1.0 yet.

