As the article points out, the size progression, as detailed in the comment in that source, is relatively modest. Your "classic" dynamic array will simply double the size as needed. This progression does less than that: 16, 24, 32, 40, and so on. But: The underlying implementation is using the standard C library function realloc(). This will perform some amount of in-place resizing on its own, and will lend a lot to the performance of a Python list, albeit in a way that might be difficult to nail down, since it depends on the overall usage of the C library's heap.
For comparison, java.util.ArrayList is implemented in Java itself, and is much shorter: http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/...
AFAIK, this unfortunately wouldn’t be possible to implement in Python though because I don’t think there would be any way to know how much space to allocate for each item in the list.
If you're going pure Python then space is a better reason than speed. I'm not sure an array is even as fast as a list.
Why python gets away with it I don't know. Back when I still did python, integers below 256 and above -3 (?) were "cached" and had no boxing overhead. In 3.6 that was still the case. Either cpython lacks tagged pointers or has the worst tagging scheme in history.
Yes, because then the list object doesn't have to do any extra work to check the types of objects. If your application needs that extra checking, you can add it in your application's code; but there's no reason to burden the basic list object with extra code that not everyone needs.
> Especially since they are going to be way slower
Way slower than what? They're slower than Python arrays, of course, but Python arrays can only store numeric values (since the amount of space each value takes up is constant and known when the array is created).
For a container that needs to be able to store objects of any type (including instances of user-defined classes), a heterogeneous list is faster than a homogeneous one, because, as above, it doesn't have to do extra work to check the types of objects. All the other extra work is there because the size in memory of the objects isn't known in advance; and that will be true even if all of the objects are of the same type, since instances still won't necessarily all have the same in-memory size (for example, consider instances of user-defined classes with different attributes).
My question is about that. Is storing objects of any type a common use case, compared to storing objects of one type?
I'm not sure it's possible to know a general answer to that, since Python is used in such a wide variety of contexts.
In my own programming, I use lists that can contain objects of different types often enough for that use case to be significant for me.
One other thing to keep in mind is that Python has duck typing, so, for example, if you are iterating over a list and retrieving the same attribute or calling the same method on each object, that doesn't require each object to be of the same type, just to have an attribute with the required name, or a method with the required name and calling signature.
At that point, you need pointers anyway, so mixed type doesn't add much overhead. Especially given that you need to handle mixed concrete types even if you have single “typechecking” types because of inheritance (including via declared ABCs and union types.
For heterogeneous collections with a fixed length, I tend to use a tuple. That makes more sense to me, conceptually (coming from C++).
Everything's passed by reference, so the list has to contain pointers even if you know all its elements have the same type. And some objects can change their type by assigning to __class__, which can happen from a distance (because everything is by reference).
A careful implementation can solve this for some builtin types. PyPy does it for ints: if a list only contains ints it's represented as a proper numeric array. But then as soon as you append some other type the whole list is converted to a more expensive more traditional representation.
PyPy's ints still have to pretend to be heap-allocated objects at all times. In particular they have to do something tricky with the id() function, which on CPython just returns an object's memory address. The return value has to be unique and consistent for as long as an object lives.
Arrays are widely used in Python for serious work (mostly NumPy arrays, either directly or via higher-level libraries, not the stdlib ones.)
Could you kindly explain this part in detail?
Its curious that for Python this is blog post material, but for Go it's intrinsic to understanding the performance and usage of slices or maps, or in Java (and other langs) would be referenced merely by the class name ArrayList. The fact that so much can be done effectively in Python without always having to think about such details is an achievement of the language and ecosystem of libraries and packages.