
CPython internals: A ten-hour codewalk through the Python interpreter (2015) - melqdusy
https://www.youtube.com/playlist?list=PLzV58Zm8FuBL6OAv1Yu6AwXZrnsFbbR0S
======
c4obi
I put together a ebook on the internals of the python interpreter. Get it for
free at
[https://leanpub.com/insidethepythonvirtualmachine](https://leanpub.com/insidethepythonvirtualmachine)

------
makmanalp
This is awesome - I wish every large software project had something like this
that was a prep-course to be able to start contributing meaningfully!

~~~
saurabhjha
Not a prep course, but for example Redis has a very good source code overview
here [https://github.com/antirez/redis#redis-
internals](https://github.com/antirez/redis#redis-internals)

More remarkable is the fact that antirez updated the documentation in response
to a post in Reddit.
[https://www.reddit.com/r/redis/comments/3re0aw/any_pointers_...](https://www.reddit.com/r/redis/comments/3re0aw/any_pointers_towards_learning_internals_of_redis/)
Thank you antirez! :-)

~~~
gens
Also check out libpng's source.

[https://github.com/glennrp/libpng](https://github.com/glennrp/libpng)

~~~
Twirrim
Oh wow. That's beautifully done. Simple comments that explain clearly what the
code is doing, pretty clear choice of variable names so little head-scratching
going on.

------
chubot
I feel like I should understand this but I don't: What names are looked up by
name vs. by number in CPython?

That is, I think local variables and constants are looked up by a small
integer which the CPython compiler produces by stack analysis.

But any globals must be looked up by name: functions, classes, modules, global
variables. And methods on classes, attributes on classes.

I'd be interested to get clarity on that, and any pointers to relevant
code/docs. Is this addressed in the videos? I have looked through the CPython
source a lot, and even patched it, but the lookups are a little hard to
follow. I've played with the "dis" module and code objects.

EDIT: Answering my own question, it seems like I was confused about the index
into co_names, which is a small integer into a list of strings, and then the
lookup of that string. So it's a 2-step process?

~~~
xapata
You can find out by using dis and checking what is load_fast, load_const or
load_global. Attribute lookups are always just that as far as I know. The dot
operator has a bunch of paths.

~~~
chubot
Yes thanks, I think that is right... LOAD_FAST and LOAD_CONST are by number,
and used for local variables and constants.

LOAD_NAME, LOAD_ATTR, and LOAD_GLOBAL are all lookups by name, and are used
for everything else: globals, object attributes and methods, modules, etc.

It seems that if Python had a static module system, all the lookups by name
could be compiled down into lookups by number.

[https://docs.python.org/3/library/dis.html#python-
bytecode-i...](https://docs.python.org/3/library/dis.html#python-bytecode-
instructions)

~~~
xapata
I'm not clear on the status of PEP 509, but it could/should make LOAD_NAME and
LOAD_GLOBAL approach the speed of LOAD_FAST. It'll set a flag on the globals
dict (or any dict?) that trips when the dict is mutated. Non-mutated dicts can
have fast repeated lookups.

[https://www.python.org/dev/peps/pep-0509/](https://www.python.org/dev/peps/pep-0509/)

Dictionaries got some sweet upgrades for v3.6.

------
giis
Here's the main page: [http://pgbovine.net/cpython-
internals.htm](http://pgbovine.net/cpython-internals.htm)

------
saurabhjha
Do you need an understanding of compilers to go through this? What are the
prerequisites?

~~~
CalChris
No. They skip the python to bytecode compiler and go straight to the
interpreter and runtime. More or less. You should know C.

------
hermitdev
Curious: At any point, is it explained why the Global Interpreter Lock is
necessary? If so, I'll spend the time to watch.

~~~
m_mueller
It's not necessary, it was a design choice that made sense back in the 90ies.
In a multithreaded environment you can lock at a fine grained level or on a
coarse grained level - or you can crash, but let's ignore that as an option.
Python chose coarse grained, giving up parallel interpreter computations, but
gaining a lot of thread sync overhead. All attempts so far to remove the GIL
have resulted in a (usually much) slower interpreter, but the latest attempt
shows some promise and it's thinkable (but not guaranteed) that in a few years
there will be an official GIL-less cPython.

~~~
hermitdev
Do you happen to have any papers about the current efforts to remove the GIL?

I love Python, and use it a lot for ETL type work, but if threading worked
well, I could/would possibly use it for far more purposes.

~~~
brianwawok
Can you give an example where the GIL is really holding you back?

Because with multiprocessing and greenlets, 99.99% of concurrency problems are
trivilially solved by current Cython.

~~~
m_mueller
actually GP, but it has held me back in the past.

I'm writing a transpiler that uses global information from codebases, and so
it transpiles potentially hundreds of files at once and creates rather complex
data structures. Compute bound for quite a while, so I tried speeding it up
with multiprocessing (since multithreading would be useless). But with
multiprocessing it took longer to serialize/deserialize the complex
datastructures for each process, so I had to give up. Next time I have time
for this I'd probably try to use Jython as a drop-in replacement and see
whether I can get it to run with GIL-less multithreading.

~~~
orf
It sounds like you have a couple of hot paths and are not optimizing them. I
can't tell for sure without seeing any code but nothing in your post screams
out "this will be slow" or "I need parallism/concurrency". Perhaps it's the
data structures you are using?

~~~
m_mueller
I already did extensive profiling and performance improvements, at this point
I'm quite sure that if I could do multithreading on my lab's 24 core Xeon
Haswell machines I'd be getting a nice speedup.

------
callesgg
In the first video he stats that every language have a compiler.

A interpreted language does not need to be compiled into bytecode. Some
languages are compiled to bytecode some are interpreted as is.

------
ciupicri
It seems to be about Python 2. Too bad it's not about 3.

~~~
giis
I watched this series more than once, it has so much details. I believe
python-3 is not complete rewrite of python-2. So there must be lot of common
code between them. So its useful regardless of its python-2 series or not.

~~~
masklinn
> I believe python-3 is not complete rewrite of python-2.

Python 3 is not even remotely close to a Python 2 rewrite. Much changed UI-
wise, but the core is very similar if not identical.

------
ipnon
Between Python 2 and Python 3, what are are the differences in CPython?

~~~
pgbovine
that's a great question! i never did a diff of the source, but a good place to
start is to diff ceval.c, which contains the main interpreter loop.

------
anocendi
Dr. PG has a Youtube channel? I never knew.

This looks awesome!

------
canada_dry
Kinda painful to watch... thank god for the playback speed X 1.5

