Hacker News new | past | comments | ask | show | jobs | submit login

Very interesting and quite ... instructive. The lack of optimizations is interesting. One would naively imagine that treating, e.g., ints, strings, dicts, and lists special would gain some performance.



Yeah. The Python C implementation's simplicity is a double-edge sword. It's delightfully easy to extend despite being straight C, but it also has made some frustrating performance trade-offs. Last time I checked, something as simple as

myObj.MyMethod()

would be implemented as (this is from memory and I'm rusty):

myObj is fetched from the local scope array by array-index.

Dictionary lookup on myObj failing over to dictionary lookup on myObj.__class__ to find the method MyMethod(). Or was it one merged dict? Whichever. All __slots__ does is mean that myObj doesn't need its own dictionary, it still has to go to __class__ for a dictionary hit, which can even be overridden if they've changed __getattr__ or __getattribute__.

Originally cpython even used deliberately-high-collision hashtables for lookups for reasons I no longer remember (faster sort?).

MyMethod() instantiates a new Method object that stores the underlying class-function and the "self" parameter, but the object is pooled, so it's quick. Still, it's creating a reference that will have to be collected in a moment.

Then we actually invoke the damned function.

Then we decrement the refcount on the Method object, which drops it to zero and it is destroyed (returning it to object-pool).

Obviously, this is from my memory and it's been over a decade, so I might be getting details wrong, but it was surprised that there were so many unoptimized layers involved in resolving a method. That even with __slots__ dictionary hits were unavoidable, and that method invocation involved instantiating an object (from a pool, but still).


You left out a couple of extra steps. :-)

> myObj is fetched from the local scope array by array-index

That's true if myObj is a known local variable (assuming you're inside a function or class block--in global scope there is no "local scope array" to begin with). But if it's not, myObj has to be looked up in the dictionary of global variables, which is slower than the fast local array indexing. (And there is also the nonlocal keyword, which further complicates the lookup since enclosing non-global scopes also have to be included.)

> Or was it one merged dict?

No, it's separate. And it's further complicated by descriptors; first the lookup needs to check if MyMethod is a descriptor (such as a property) on the class, and if it is and the descriptor is a data descriptor (i.e., has a setter method), it overrides the lookup in the instance dictionary.

> All __slots__ does is mean that myObj doesn't need its own dictionary, it still has to go to __class__ for a dictionary hit

Yes, __slots__ is an optimization to reduce memory consumption, not to increase speed.

> surprised that there were so many unoptimized layers involved in resolving a method

They can't be optimized in the general case without sacrificing the dynamic attributes of the language, which would defeat the purpose.

Optimizers like PyPy focus on optimizing these layers in the special cases where particular dynamic attributes aren't being used in particular parts of the code. Cython, which does as much static analysis at compile time as possible to enable eliminating the extra lookups when they're not going to end up changing anything, is another example of the same idea.


> Yes, __slots__ is an optimization to reduce memory consumption, not to increase speed.

It can increase speed though in practice. Less memory means less management and better cache usage. I've nearly doubled the speed of stream reconstruction from packet capture with a lot of slots usage. (Huge amount of tiny objects)


For cases where you don't need to access any non-slot attributes (packet capture would generally be one of those cases), yes, __slots__ can speed things up as well.


I thought that if you were at global scope it loaded the whole global scope dictionary as the current-scope-array? Or maybe that's something I did by accident when messing with the embedded interpreter.


> I thought that if you were at global scope it loaded the whole global scope dictionary as the current-scope-array?

No, it doesn't. There are two separate opcodes involved: LOAD_FAST loads a local variable from the local array based on its index; LOAD_GLOBAL loads a global variable based on the global dictionary lookup.


The Method objects are no longer created in most cases - https://docs.python.org/3/library/dis.html#opcode-LOAD_METHO...


> All __slots__ does is mean that myObj doesn't need its own dictionary, it still has to go to __class__ for a dictionary hit

In view of a follow-up comment about slots, this should be clarified: an class with __slots__ does not need to do a dictionary lookup for slot attributes; those are looked up by index into an array, just as local variables in a function are. The dictionary lookup is only done for attributes that aren't listed in __slots__.


They aren't looked up by index into an array, they are looked up by name, even if stored in an array. Roughly, an object with __slots__ is a mutable counterpart of namedtuple. Both need to map a name into a field/slot index. Of course, there're ways to optimize that lookup (which primitive Python implementations, like CPython, may not do, but other implementations definitely do).


> They aren't looked up by index into an array, they are looked up by name...[which is mapped] into a field/slot index

Yes, this is a fair point. Still, accessing the slot does not require a dictionary lookup, as it would for an ordinary instance attribute, which was the main point I was trying to make.

> there're ways to optimize that lookup

The way CPython does this, if you can call it an "optimization", is to implement the lookup as a data descriptor, which directly accesses the slot array location by index. (The namedtuple implementation correspondingly implements accessing the attribute as a read-only, non-data descriptor that directly accesses the appropriate tuple location by index.) Quite possibly the fact that the descriptor lookup comes before anything else in the attribute access code is considered "optimization" enough for this case.


> Still, accessing the slot does not require a dictionary lookup, as it would for an ordinary instance attribute, which was the main point I was trying to make.

I'm sorry but mapping a string (slot name) to an index [in an overdynamic language like Python] does require a dictionary lookup. It's just this dictionary is located in the class, not in each instance.

> The way CPython does this, if you can call it an "optimization"

The usual way to optimize lookups-by-name in dynamic languages is using (inline) caches. AFAIK, CPython now does that too.


> mapping a string (slot name) to an index [in an overdynamic language like Python] does require a dictionary lookup

Yes, you're right, I wasn't clear enough. What I meant to say was that accessing the value of the slot attribute (to either get or set it) on the instance does not require a dictionary lookup, just an array access. But of course finding out that the string (attribute name) is the name of a slot and getting the slot index does require a dictionary lookup (on the class, as you say).


Ooh, that's good. As I said, I haven't looked in over a decade but I was surprised to see that before.


You comment just reminds me of how lucky some people are to be born with ...IMO... the super-power of good memory.

> Obviously, this is from my memory and it's been over a decade

I would not remember those details if I read them last week let alone a decade ago.


Well, it's because I was trying to implement something that leveraged them heavily, so I got pretty intimately involved with it. Obviously if I'd just been reading up I wouldn't remember any of this.


Not if you read them.


There are fast-paths for these cases, like this one: https://github.com/python/cpython/blob/4b8cdfcb22fbeaab9d954... (and note the comment above it about integer addition!)

But the cpython folk make understandability an explicit goal of the codebase which tends to preclude things like fancy optimisations or JITs


AFAIK the implementation of cpython is purposefully kept relatively simple such that it remains approachable.


Small integers do get special treatment as the same values share the same object.


Okay, that's dodging the object instantiation/destruction thing but it's still doing all the reference counting on them isn't it?


Why would reference counting be necessary an immutable, immortal global?


It isn't, but IIRC refcounting is cooked into the python interpreter in so many places that everything that can be referenced must also be refcounted. If you get a reference to something, you increment its refcount. If you lose the reference, you decrement it. No ifs, no branches.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: