Very interesting and quite ... instructive. The lack of optimizations is interesting. One would naively imagine that treating, e.g., ints, strings, dicts, and lists special would gain some performance.
Yeah. The Python C implementation's simplicity is a double-edge sword. It's delightfully easy to extend despite being straight C, but it also has made some frustrating performance trade-offs. Last time I checked, something as simple as
myObj.MyMethod()
would be implemented as (this is from memory and I'm rusty):
myObj is fetched from the local scope array by array-index.
Dictionary lookup on myObj failing over to dictionary lookup on myObj.__class__ to find the method MyMethod(). Or was it one merged dict? Whichever. All __slots__ does is mean that myObj doesn't need its own dictionary, it still has to go to __class__ for a dictionary hit, which can even be overridden if they've changed __getattr__ or __getattribute__.
Originally cpython even used deliberately-high-collision hashtables for lookups for reasons I no longer remember (faster sort?).
MyMethod() instantiates a new Method object that stores the underlying class-function and the "self" parameter, but the object is pooled, so it's quick. Still, it's creating a reference that will have to be collected in a moment.
Then we actually invoke the damned function.
Then we decrement the refcount on the Method object, which drops it to zero and it is destroyed (returning it to object-pool).
Obviously, this is from my memory and it's been over a decade, so I might be getting details wrong, but it was surprised that there were so many unoptimized layers involved in resolving a method. That even with __slots__ dictionary hits were unavoidable, and that method invocation involved instantiating an object (from a pool, but still).
> myObj is fetched from the local scope array by array-index
That's true if myObj is a known local variable (assuming you're inside a function or class block--in global scope there is no "local scope array" to begin with). But if it's not, myObj has to be looked up in the dictionary of global variables, which is slower than the fast local array indexing. (And there is also the nonlocal keyword, which further complicates the lookup since enclosing non-global scopes also have to be included.)
> Or was it one merged dict?
No, it's separate. And it's further complicated by descriptors; first the lookup needs to check if MyMethod is a descriptor (such as a property) on the class, and if it is and the descriptor is a data descriptor (i.e., has a setter method), it overrides the lookup in the instance dictionary.
> All __slots__ does is mean that myObj doesn't need its own dictionary, it still has to go to __class__ for a dictionary hit
Yes, __slots__ is an optimization to reduce memory consumption, not to increase speed.
> surprised that there were so many unoptimized layers involved in resolving a method
They can't be optimized in the general case without sacrificing the dynamic attributes of the language, which would defeat the purpose.
Optimizers like PyPy focus on optimizing these layers in the special cases where particular dynamic attributes aren't being used in particular parts of the code. Cython, which does as much static analysis at compile time as possible to enable eliminating the extra lookups when they're not going to end up changing anything, is another example of the same idea.
> Yes, __slots__ is an optimization to reduce memory consumption, not to increase speed.
It can increase speed though in practice. Less memory means less management and better cache usage. I've nearly doubled the speed of stream reconstruction from packet capture with a lot of slots usage. (Huge amount of tiny objects)
For cases where you don't need to access any non-slot attributes (packet capture would generally be one of those cases), yes, __slots__ can speed things up as well.
I thought that if you were at global scope it loaded the whole global scope dictionary as the current-scope-array? Or maybe that's something I did by accident when messing with the embedded interpreter.
> I thought that if you were at global scope it loaded the whole global scope dictionary as the current-scope-array?
No, it doesn't. There are two separate opcodes involved: LOAD_FAST loads a local variable from the local array based on its index; LOAD_GLOBAL loads a global variable based on the global dictionary lookup.
> All __slots__ does is mean that myObj doesn't need its own dictionary, it still has to go to __class__ for a dictionary hit
In view of a follow-up comment about slots, this should be clarified: an class with __slots__ does not need to do a dictionary lookup for slot attributes; those are looked up by index into an array, just as local variables in a function are. The dictionary lookup is only done for attributes that aren't listed in __slots__.
They aren't looked up by index into an array, they are looked up by name, even if stored in an array. Roughly, an object with __slots__ is a mutable counterpart of namedtuple. Both need to map a name into a field/slot index. Of course, there're ways to optimize that lookup (which primitive Python implementations, like CPython, may not do, but other implementations definitely do).
> They aren't looked up by index into an array, they are looked up by name...[which is mapped] into a field/slot index
Yes, this is a fair point. Still, accessing the slot does not require a dictionary lookup, as it would for an ordinary instance attribute, which was the main point I was trying to make.
> there're ways to optimize that lookup
The way CPython does this, if you can call it an "optimization", is to implement the lookup as a data descriptor, which directly accesses the slot array location by index. (The namedtuple implementation correspondingly implements accessing the attribute as a read-only, non-data descriptor that directly accesses the appropriate tuple location by index.) Quite possibly the fact that the descriptor lookup comes before anything else in the attribute access code is considered "optimization" enough for this case.
> Still, accessing the slot does not require a dictionary lookup, as it would for an ordinary instance attribute, which was the main point I was trying to make.
I'm sorry but mapping a string (slot name) to an index [in an overdynamic language like Python] does require a dictionary lookup. It's just this dictionary is located in the class, not in each instance.
> The way CPython does this, if you can call it an "optimization"
The usual way to optimize lookups-by-name in dynamic languages is using (inline) caches. AFAIK, CPython now does that too.
> mapping a string (slot name) to an index [in an overdynamic language like Python] does require a dictionary lookup
Yes, you're right, I wasn't clear enough. What I meant to say was that accessing the value of the slot attribute (to either get or set it) on the instance does not require a dictionary lookup, just an array access. But of course finding out that the string (attribute name) is the name of a slot and getting the slot index does require a dictionary lookup (on the class, as you say).
Well, it's because I was trying to implement something that leveraged them heavily, so I got pretty intimately involved with it. Obviously if I'd just been reading up I wouldn't remember any of this.
It isn't, but IIRC refcounting is cooked into the python interpreter in so many places that everything that can be referenced must also be refcounted. If you get a reference to something, you increment its refcount. If you lose the reference, you decrement it. No ifs, no branches.