Hi! This is part 6 of my Python behind the scenes series. Each part of the series covers how some aspect of the language is implemented in the interpreter, CPython. This time we'll focus on the Python object system. You'll learn:
- What Python objects and types are and how they are implemented.
- What slots are and how they determine the behavior of objects.
- How slots are related to special methods.
I'd be glad to get your feedback and answer your questions. Thanks!
I have a C background from graduate school and a Python background from my professional work, but I've never explored how Python is implemented in C until I came across one of your previous parts of this series.
I don't have any questions. I just wanted to let you know I think you do a fantastic job writing these up. The thinking process is very clear, and the resource links are well placed for followup research. High quality stuff, for sure.
Thanks for making these. I look forward to future installations.
Off topic, but do the programmers you know that can't read English struggle using programming languages and libraries where the keywords, class, variable, and function names are primarily in English? I'm curious how beginning or ESL programmers manage. Are there ecospheres where they can manage using only their first languages?
There is a vast difference between people that don't fluidly read/speak English and people that can't be arsed to learn what the 20 English keywords mean a programming language might use.
Here in Europe is quite common to be faced with programs where everything written in-house uses the local language, including comments, only the third party libraries use English.
I already had a couple of consulting gigs where what got me the deal was my knowledge of the specific (human) languages being used.
I've seen several coding conventions that prohibit the use of past perfect aspect in boolean query methods (e.g. `checked`) in favor of an explicit auxiliary verb (e.g. `didCheck`). Also sophisticated concepts frequently get transliterated. 20 might be a stretch but less than 1000 is not that impossible.
How does a cord for fleeing tell you anything about what it actually is?
Both of your examples bring very little meaning from their English definition. You just know a second definition in the context of programming, which is why you can deduce what it does.
The same applies to the people that don't fluidly speak the language.
I'm not a native English speaker, but most of what I read on a day-to-day basis is in English. And yet, whenever I see or hear the term "string", I associate it with "sequence of characters", not with "cord". Same goes for "thread", but even more strongly [1]
So I agree, you don't have to speak fluent English to be able to read and write programs. Knowledge of English grammar won't help (except for reading comments – if they exist...), and the vocabulary is small, specialized, and has little to do with colloquial language.
[1] Or should it be "...even stronger"? I don't think so, because it refers to the verb "goes", so it should be an adverb, right?
This is pretty common in Brazil, there is a lot of people that are able to master the syntax, but cannot fully understand an article in English. That's why I always recommend then to have at least invest a little in technical English, as the resources in Portuguese usually are scarce and outdated.
Are there people who would be interested in how Python works at the C level but can't read English easily?
I've encountered programmers who are not fluent in English (e.g., some of the audience of non-English StackOverflow) but they don't seem to be a curious type (for whatever reason)
> Are there people who would be interested in how Python works at the C level but can't read English easily?
They said that there might be because they know people who are interested in the subject but don’t speak English. So I don’t know why you are asking the question.
I’d say anyone curious would have learned English already.
(Before I get bashed for this, I speak 3 languages well and 4 more poorly, English is my third. It really isn’t a difficult language to learn when all American media and most of computing uses it.)
I work with Japanese developers who I would struggle to hold a basic conversation with in English. They're about as curious as anyone else you'd find here and do read similar articles - but in their own language. There's a whole world of articles and open source out there not available in English.
The link between language learning and curiosity sounds tenuous at best. I would also add that having access to media in a particular language doesn't mean a lot unless you're prepared to put some effort in. I know expats who've lived in Japan for 20 years and can only speak a few words of Japanese despite being surrounded by it everyday.
But enough about you. There are a lot of places where people aren’t exposed to English-spoken media even if they might consume American media (dubbing in particular).
If it's easy, good for you! But from what I've noticed English, with its complicated & chaotic grammar is not that easy for a lot of people around the world that have different 1st & 2nd languages.
Perfect English is difficult, but that's the case for nearly all native speakers too. English is nothing but fluid and fault tolerant though, useful communication still happens even with quite broken English.
It is ok that my comment above is downvoted but It would nice to hear any counter arguments (it is hard for me to imagine a passionate programmer who can't learn English).
Language learning is difficult and your success with one language has a lot to do with what you speak natively. I've heard that Italian speakers for instance can pick up English with relative ease compared with someone who grew up speaking Japanese.
People also have different priorities in life. Past high school and university you've got a career and possibly a family to attend to. Finding enough time to learn another language would be a big ask for many.
This is great! There aren't enough references on Python internals, I find.
Not sure you're at all interested in going in this direction, but I (and likely many people) would be really interested in a similar explanation of NumPy internals.
There are some great videos on youtube, a little outdated but gold if you can find them. It's this guy who literally sits at his desk for an hour and a half straight and walks a class line by line through various parts of cpython2.7 - little outdated but was awesome. There are multiple parts.
Please open a documentation issue at https://github.com/numpy/numpy/issues describing exactly which internals you would like to see explained: indexing, ufuncs, linkage, random, dtypes, ... and what format you would like to see the documentation in: text, video, lessons with questions/answers, ...
As for python internals, https://snarky.ca/ has some great short high level “how does this construct work” text explainations.
More people are looking for passive articles to consume than filing a bug report to request information on specific systems. This reply is unnecessary.
Python's descriptor protocol is very elegant. Given:
class C:
def __init__(self):
self.a = 1
def b(self):
print(f'b({self})')
@property
def c(self):
print(f'c({self})')
o = C()
The descriptor protocol is a generic way to enable `o.a` to access an attribute of `o`, `o.b` to access an attribute of `C` and bind `self` to `o`, and `o.c` to access an attribute of `C`, bind `self` to `o`, and call it.
Do any other dynamic languages use a similar protocol for attribute access?
Ruby, for example, doesn't need to - it can use a simpler approach because `o.x` never refers to an attribute of `o`.
JavaScript's solution is less elegant - the way it sets the value of `this` is infamous, and AFAIK properties are treated as a special case rather than built on top of a generic protocol.
I don't think Python's approach is particularly more elegant than JavaScript, as they both distinguish property-as-value and property-as-function. It's just a matter of where to put property-as-function (and unlike Python, JavaScript can have non-configurable properties which can be optimized). As property-as-function can subsume property-as-value, no built-in property-as-value would be truly elegant (Ruby is close, but not exact).
I agree. Python's descriptor pattern was an endless source of confusion for me. But, it's much easier to do mixins in Python than in JS, even if the mro algorithm is a bit complicated...
Very interesting and quite ... instructive. The lack of optimizations is interesting. One would naively imagine that treating, e.g., ints, strings, dicts, and lists special would gain some performance.
Yeah. The Python C implementation's simplicity is a double-edge sword. It's delightfully easy to extend despite being straight C, but it also has made some frustrating performance trade-offs. Last time I checked, something as simple as
myObj.MyMethod()
would be implemented as (this is from memory and I'm rusty):
myObj is fetched from the local scope array by array-index.
Dictionary lookup on myObj failing over to dictionary lookup on myObj.__class__ to find the method MyMethod(). Or was it one merged dict? Whichever. All __slots__ does is mean that myObj doesn't need its own dictionary, it still has to go to __class__ for a dictionary hit, which can even be overridden if they've changed __getattr__ or __getattribute__.
Originally cpython even used deliberately-high-collision hashtables for lookups for reasons I no longer remember (faster sort?).
MyMethod() instantiates a new Method object that stores the underlying class-function and the "self" parameter, but the object is pooled, so it's quick. Still, it's creating a reference that will have to be collected in a moment.
Then we actually invoke the damned function.
Then we decrement the refcount on the Method object, which drops it to zero and it is destroyed (returning it to object-pool).
Obviously, this is from my memory and it's been over a decade, so I might be getting details wrong, but it was surprised that there were so many unoptimized layers involved in resolving a method. That even with __slots__ dictionary hits were unavoidable, and that method invocation involved instantiating an object (from a pool, but still).
> myObj is fetched from the local scope array by array-index
That's true if myObj is a known local variable (assuming you're inside a function or class block--in global scope there is no "local scope array" to begin with). But if it's not, myObj has to be looked up in the dictionary of global variables, which is slower than the fast local array indexing. (And there is also the nonlocal keyword, which further complicates the lookup since enclosing non-global scopes also have to be included.)
> Or was it one merged dict?
No, it's separate. And it's further complicated by descriptors; first the lookup needs to check if MyMethod is a descriptor (such as a property) on the class, and if it is and the descriptor is a data descriptor (i.e., has a setter method), it overrides the lookup in the instance dictionary.
> All __slots__ does is mean that myObj doesn't need its own dictionary, it still has to go to __class__ for a dictionary hit
Yes, __slots__ is an optimization to reduce memory consumption, not to increase speed.
> surprised that there were so many unoptimized layers involved in resolving a method
They can't be optimized in the general case without sacrificing the dynamic attributes of the language, which would defeat the purpose.
Optimizers like PyPy focus on optimizing these layers in the special cases where particular dynamic attributes aren't being used in particular parts of the code. Cython, which does as much static analysis at compile time as possible to enable eliminating the extra lookups when they're not going to end up changing anything, is another example of the same idea.
> Yes, __slots__ is an optimization to reduce memory consumption, not to increase speed.
It can increase speed though in practice. Less memory means less management and better cache usage. I've nearly doubled the speed of stream reconstruction from packet capture with a lot of slots usage. (Huge amount of tiny objects)
For cases where you don't need to access any non-slot attributes (packet capture would generally be one of those cases), yes, __slots__ can speed things up as well.
I thought that if you were at global scope it loaded the whole global scope dictionary as the current-scope-array? Or maybe that's something I did by accident when messing with the embedded interpreter.
> I thought that if you were at global scope it loaded the whole global scope dictionary as the current-scope-array?
No, it doesn't. There are two separate opcodes involved: LOAD_FAST loads a local variable from the local array based on its index; LOAD_GLOBAL loads a global variable based on the global dictionary lookup.
> All __slots__ does is mean that myObj doesn't need its own dictionary, it still has to go to __class__ for a dictionary hit
In view of a follow-up comment about slots, this should be clarified: an class with __slots__ does not need to do a dictionary lookup for slot attributes; those are looked up by index into an array, just as local variables in a function are. The dictionary lookup is only done for attributes that aren't listed in __slots__.
They aren't looked up by index into an array, they are looked up by name, even if stored in an array. Roughly, an object with __slots__ is a mutable counterpart of namedtuple. Both need to map a name into a field/slot index. Of course, there're ways to optimize that lookup (which primitive Python implementations, like CPython, may not do, but other implementations definitely do).
> They aren't looked up by index into an array, they are looked up by name...[which is mapped] into a field/slot index
Yes, this is a fair point. Still, accessing the slot does not require a dictionary lookup, as it would for an ordinary instance attribute, which was the main point I was trying to make.
> there're ways to optimize that lookup
The way CPython does this, if you can call it an "optimization", is to implement the lookup as a data descriptor, which directly accesses the slot array location by index. (The namedtuple implementation correspondingly implements accessing the attribute as a read-only, non-data descriptor that directly accesses the appropriate tuple location by index.) Quite possibly the fact that the descriptor lookup comes before anything else in the attribute access code is considered "optimization" enough for this case.
> Still, accessing the slot does not require a dictionary lookup, as it would for an ordinary instance attribute, which was the main point I was trying to make.
I'm sorry but mapping a string (slot name) to an index [in an overdynamic language like Python] does require a dictionary lookup. It's just this dictionary is located in the class, not in each instance.
> The way CPython does this, if you can call it an "optimization"
The usual way to optimize lookups-by-name in dynamic languages is using (inline) caches. AFAIK, CPython now does that too.
> mapping a string (slot name) to an index [in an overdynamic language like Python] does require a dictionary lookup
Yes, you're right, I wasn't clear enough. What I meant to say was that accessing the value of the slot attribute (to either get or set it) on the instance does not require a dictionary lookup, just an array access. But of course finding out that the string (attribute name) is the name of a slot and getting the slot index does require a dictionary lookup (on the class, as you say).
Well, it's because I was trying to implement something that leveraged them heavily, so I got pretty intimately involved with it. Obviously if I'd just been reading up I wouldn't remember any of this.
It isn't, but IIRC refcounting is cooked into the python interpreter in so many places that everything that can be referenced must also be refcounted. If you get a reference to something, you increment its refcount. If you lose the reference, you decrement it. No ifs, no branches.
Thank you so much for this article, I remember seeing some talk on this topic many years ago but could never find a comprehensive summary so far (or even a video of the talk).
I'd be really curious about this in other languages as well, such as Ruby and JavaScript.
I'm actually tempted to start a series of blog posts in this vein for Ruby. The book Ruby Under a Microscope [0] covers these topics (and more), at least from a cursory glance of the blog series, but it would be nice to refresh that material in my own head for Ruby 3.
When do the objects get destroyed or garbage collected?
This part was always magical. It seems to free up the local memory usage of the program, but it doesn’t seem to release the memory back to the operating system, until some type of triggered event.
> This part was always magical. It seems to free up the local memory usage of the program, but it doesn’t seem to release the memory back to the operating system, until some type of triggered event.
That's up to the C runtime's memory allocator. Modern memory allocators don't typically request a new chunk of memory for every malloc() call -- instead, they allocate a single large region of memory at a time and carve that up as needed. This is massively more efficient (system calls are expensive), but also means that those regions can't be released to the OS until all allocations in them are gone.
There's an interesting talk [0] from Bobby Powers on how to make a malloc() and free() that perform compaction, to be able to release memory back to the operating system more frequently (and improve cache hit rates, etc.). But this isn't standard at all yet.
I plan to cover CPython's memory management in the future posts. In a nutshell, an object gets destroyed when its reference count hits 0. In this case, CPython calls `tp_dealloc` [1] of the object's type. The `tp_dealloc` slot releases all the resources the object owns and frees the memory. The implementation of `tp_dealloc` differs for different types. Eventually, the `free` function of the memory allocator is called to free the memory. A memory allocator is a set of functions to manage memory. The default memory allocator for objects is pymalloc. It allocates small objects (<= 512 bytes) using the arena allocator [2] and falls back to the raw memory allocator otherwise. The latter calls the `free()` library function to free the memory.
The Python/C Reference Manual has a great section on memory management [3].
I quickly skimmed over OP's previous posts and I don't think they mention it. According to [0], CPython's GC is ran every X instructions (not sure how up to date the source is).
From this, I guess CPython opted for simplicity instead of implementing something like a memory-usage monitor.
I haven't read the article but as I understand python will hold on to acquired memory and pre-allocates 'blocks' of memory ready for items of various sizes. When you use more memory and it's freed iirc it holds on to most of these blocks.
This is exactly the reason why python is so slow. The everything is an object approach has no real benefit, as the code is still full of special cases. Faster VMs of course special-case int and float. Best at compile-time.
Special-casing (only) ints and floats in VM is too trivial and a bloat for quite minor benefit. To start with, faster VMs are not VMs, but JITs. And they are able to specialize for arbitrary types, not just 2 random ones. I thought there were some "lessons learned" from ParrotVM... ;-).
This has nothing to do with parrot. Every self respecting VM treats value types differently than reference types, and esp. objects. Ruby, Perl, Javascript, every lisp, lua... That's why they are all so much faster than python. Python made the weird/ insane decision to overload + for strings, so treating numbers as classes goes from there. It just cannot be fast then. But even most Common Lisp's treat every value as class, with an atom being a struct of class and value. So it can be made fast if you do it properly.
Lessons learned: Don't ever look at the python vm, if you want to look for a well designed VM. Even Ruby is miles better there.