Yeah; I'm pretty sure this isn't anything you can take advantage of. But I think...

masklinn · on Feb 15, 2021

> Yeah; I'm pretty sure this isn't anything you can take advantage of.

You probably can but should not rely on it:

* it was a very official part of the Python 2.4 release notes

* it's unlikely the devs would remove it as they know how implementation details tend to leak into the software (see: dict ordering)

* but it is an optimisation so there's no guarantee, a simple trace function could break it

* and obviously there's no guarantee (and likely no way) other implementations can implement it

dmw_ng · on Feb 15, 2021

The optimization got moved to unicode objects during the great Python 3 upheaval and AFAIK still remains unimplemented for what are now bytes objects. The paramiko SSH library relied on it heavily, as a result its runtime is (AFAIK still) horrific on Python 3

dmurray · on Feb 15, 2021

This one is less painful for the devs to break, if they need to, because it only works 99% of the time. If you rely on this behaviour for correctness, rather than performance, it will set you right pretty quickly.

If you do want to take advantage of this, maybe you can go a level deeper: dig into Python's behaviour when allocating memory for strings, and figure out in what circumstances you can be guaranteed to get the same id back. E.g. maybe if you create a 28-character string, there will always be room to append 4 more.

lostcolony · on Feb 15, 2021

Right, but my thought is "at what times would it make sense to join, and not append, but this optimization would be faster than joining"? I'm guessing not very many.

Because, yeah, the fact that str += "?" is faster isn't a big deal; that's the natural, ergonomic way to deal with it, because you're appending all of one string. Likewise foo + " " + bar + "?" is probably easier to write that way, than to drop them into a list and join (but even if not, I'd be curious if it's actually any faster; this article doesn't measure). By the time you get to joining large amounts together, concatenating a CSV or something, you're going to use join, naturally, and join is going to be more performant.

So to my mind it's kind of an open question, at what points is the non-ergonomic thing going to be the faster thing? That's what "taking advantage of (an optimization)" feels like; otherwise you're just writing code and letting the performance fall where it may.

masklinn · on Feb 15, 2021

> Right, but my thought is "at what times would it make sense to join, and not append, but this optimization would be faster than joining"? I'm guessing not very many.

I’m guessing all of them. Because what it does is what join will do internally anyway, but without the overhead of the list itself, and with operations which have lower overhead than the list’s.

lostcolony · on Feb 15, 2021

"I'm guessing"

Yes, you are. I'm saying this is a more interesting question, and one that isn't answered. Because the reality is that if you're appending just a handful of things together, the idiomatic approach is to concatenate them, and this optimization comes into play. If you're concatenating a lot of items together...you probably already have a list, and the idiomatic approach to join them is probably faster than a for loop to concat them over and over. So the question then, is is there a point where idiomatically it makes more sense to join things (putting them in a list if they aren't already, but they might be; depends on the example), but this will be more efficient?