
Profiling CPython at Instagram (2017) - mcenedella
https://instagram-engineering.com/profiling-cpython-at-instagram-89d4cbeeb898
======
sandGorgon
this is a very interesting talk around the same topic -
[https://lwn.net/Articles/754163/](https://lwn.net/Articles/754163/)

> _But Shapiro may not be aware that the Python core developers have often
> preferred simpler, more understandable code that is easier to read and
> follow, over more complex algorithms and data structures in the interpreter.
> Some performance may well have been sacrificed for readability._

> _Mark Shannon said that Python 3.7 has added a feature that should provide a
> similar boost as the method-lookup caching used in the experiment. Shapiro
> said he had looked at those changes but still believed his proposed
> mechanism would provide more benefit. Attribute lookup still requires five
> lookups in CPython, while it is only one lookup in the experimental version.
> Shannon did not sound entirely convinced of that, however._

------
munificent
_> Roughly 85% of LOAD_ATTRs occured at monomorphic sites;_

This observation and the accompanying graph are, I think, the "generational
hypothesis" equivalent to static types.

The generational hypothesis says most objects are either very short-lived or
long-lived and modern generation GCs very effectively optimize for that fact.

A particularly pragmatic way to look at static types is that they optimize for
monomorphic calls at the expense of polymorphic calls. For the latter, they
require you to manually define interfaces, virtual methods, or some other
explicit dynamism. In return, monomorphic calls are much faster.

The 85% number is an indication that that's a good optimization, as long as
the other 15% aren't too hard to handle in your statically-typed language.

~~~
xapata
Or you can just do what Python 3.7 implemented (and Instagram seems to have
separately implemented) -- cache attribute lookups and invalidate the cache if
necessary. [0]

The GC optimization, well, that'll probably be left up to other interpreters.

[0] "Mark Shannon said that Python 3.7 has added a feature that should provide
a similar boost as the method-lookup caching used in the experiment. Shapiro
said he had looked at those changes but still believed his proposed mechanism
would provide more benefit. Attribute lookup still requires five lookups in
CPython, while it is only one lookup in the experimental version. Shannon did
not sound entirely convinced of that, however."
[https://lwn.net/Articles/754163/](https://lwn.net/Articles/754163/)

------
dmoreno
> This suggested techniques to eliminate loads and stores (e.g. switching to a
> register based bytecode) would be an effective optimization.

Did anybody try anything in these line of work?

~~~
tyingq
The Perl6 VMs (Parrot, MoarVM) are register based.

~~~
mschaef
I believe Lua is as well.

------
sciurus
This gives instagram ideas for how to improve the python interpreter for their
workload. If instead you're interested in profiling to look for ways to
improve your python code, check out
[https://github.com/uber/pyflame](https://github.com/uber/pyflame) or
[https://github.com/benfred/py-spy](https://github.com/benfred/py-spy)

~~~
gnufx
To repeat
[https://news.ycombinator.com/item?id=17929621](https://news.ycombinator.com/item?id=17929621),
there is Python support in several comprehensive performance tools with better
visualization/analysis functions than flame graphs. I don't know how well the
Python-specific instrumentation compares with the above, but it would be worth
working on, if necessary, for what the tools provide generally..

------
Alex3917
Is there any Python library for profiling or optimizing regex? E.g. for a
given set of fixture data, see which order things inside each group should be
in to get the fastest test suite times while everything still passes.

~~~
valarauca1
Optimal layout of a NFA-Regex is (I assume) an Hard academic Problem. Which
means a library likely won't just be able to "do it for you".

The CS-Fundamentals that underscore (Python, Perl, PRCE, and Java regexes) are
the same. So this longer article may help:
[https://www.javaworld.com/article/2077757/core-
java/optimizi...](https://www.javaworld.com/article/2077757/core-
java/optimizing-regular-expressions-in-java.html) as while the syntax of the
regex may change, the side effects and performance implications of the
individual operators will not.

------
datavirtue
So stop using functions...got it.

~~~
nomel
The Python Performance Tips [1] basically says this, along with many other
helpful hints when trying to squeeze performance. They all boil down to: In a
tight loop, don't do anything that triggers a lookup or function call.

[1]
[https://wiki.python.org/moin/PythonSpeed/PerformanceTips#Dat...](https://wiki.python.org/moin/PythonSpeed/PerformanceTips#Data_Aggregation)

------
yedpodtrzitko
published at November 2017

~~~
medecau
Thought-bleach for yesterday's post?

~~~
mcenedella
What’s a thought-bleach?

------
posharma
Why not just use Java/C++?

~~~
xapata
"Our most common success story starts with a Java or C++ project slated to
take a team of 3-5 developers somewhere between 2-6 months, and ends with a
single motivated developer completing the project in 2-6 weeks (or hours, for
that matter)."

[https://www.paypal-engineering.com/2014/12/10/10-myths-of-
en...](https://www.paypal-engineering.com/2014/12/10/10-myths-of-enterprise-
python/)

