Python lists are actually arrays — contiguous chunks of memory.
Just wondering here, is this guaranteed to always be the case? Practically it probably is, but does the Python spec (as in: the laguage, not one of it's implementations) say a list must be implemented using contiguous memory of slots with Python objects? That seems so low-level and C-ish. Or does the OP actually mean CPython here for instance?
Python doesn't really have a spec in the way that, say, C# does. The docs are missing things like whether the argument is positional-only, exact exceptions raised in all cases, and time complexity of common operations.
That said, I think every implementation uses contiguous memory. PyPy has list "strategies" int[], double[] and PyObject[] and probably more, switching transparently between them.
With numba or other jit tools in CPython, you have things equivalent to pypy list strategies as well. If you add jitted helper functions or Cython, you have even more control.
I'm not aware of what the specification for a list would be, this is the closest I'm aware of: https://docs.python.org/3/reference/datamodel.html#objects-v..., 3.2 > Sequences > Mutable Sequences > Lists, "The items of a list are arbitrary Python objects. Lists are formed by placing a comma-separated list of expressions in square brackets"
If I'm talking about arrays with someone else, I would expect us to be thinking of arrays in the sense of C. Both C++ and Java use the same definition. Other languages that I'm aware of (e.g. Javascript, Go) have their own notions of arrays that you can't conflate with the C notion of an array.
From the C11 spec, §6.2.5.20: "An array type describes a contiguously allocated nonempty set of objects with a particular member object type, called the element type."
Lists are very often used in the way arrays are, though.
The heterogeneous use-case (that in C, etc. is usually a struct) is often filled by either tuples or (ordered) dictionaries in Python.
If you really want something a bit closer to a C array, you can fairly easily create a UserList that enforces the non-empty requirement and only allows items of a particular type.
So... Because lists are dynamically typed and heterogeneous, does that mean the underlying C is basically a contiguous segment of memory of python object references? Then in cpython the actual list contents are Python objects sitting on the heap in a non-contiguous manner?
And when the list needs to be resized and doubles itself, it's just doubling the space for object references?
> Because lists are dynamically typed and heterogeneous, does that mean the underlying C is basically a contiguous segment of memory of python object references?
The growth pattern is still exponential, instead of going x -> 2 * x, you go x -> 9/8 * x + 6. I think it should still be O(1) amortized, you just need to adjust your proof for doubling a bit.
> So... Because lists are dynamically typed and heterogeneous, does that mean the underlying C is basically a contiguous segment of memory of python object references?
Yes. Every Python object is held by the interpreter as a pointer to a PyObject struct on the (C) heap.
Not something I would put into production, but had a fun application of this on a side project. I had the `id` of a function (but not the function object itself) and needed to recover the function. In CPython, `id` of a function corresponds to it's memory address (not sure if that can be overridden). By casting the id to a PyObject, I was able to recover the original.
To me it is kind of self-growing array that can hold objects of different types. The C++-ish equivalent would be something like std::vector<void*> / std::vector<MagicObjectBoxType> (there is no MagicObjectBoxType of course).
To me, the difference between a list and an array is that an array is a fixed length allocation of memory storing multiple items of the same size. This way, to jump to the third item in the list, you just need to know the data type (or rather, how many registers a single item of that type takes up in memory) and the index. That way you can jump right to the correct memory address by just doing ```index * cell size```.
You compare this to, say, another fundamental data structure, the stack. In a stack you don't know how much memory each item takes up, so you can't just jump right to that item. Thus, the limitation is that you can only pop or push the last item on that stack. Once upon a time, stack-based languages were quite common.
A list, in contrast, is an ordered, polymorphic collection of objects which can have variable size. You can create resizable arrays by using a linked list. A resizable list would typically be implemented as an array with the last item being a pointer to the next array. I believe (but don't quote me on this), that Python uses links to increasingly large arrays so that there aren't too many links. That is, it dynamically increases how much memory to allocate to that array depending on how much data you're actually storing in that list.
A list is also polymorphic, meaning it can house multiple types of data. I sometimes hear a non-polymorphic, but resizable array called a "vector," but that's super confusing terminology. You can implement this by creating a resizable array of pointers, where each pointer points to a location in memory where the value, along with the data type, are stored. This way, all the pointers are the same size and you can jump right to that location by calculating it from the index, but you still get to have multiple value types.
The distinction is not universal, though. But in my humble opinion everyone who disagrees with me is wrong.
No, they are not.
At "best" they are implemented "like" an array of pointers to the objects (scattered somewhere else in the memory) in the list.
If you want an array (in the "c-like" sense of it) you can look at the array module https://docs.python.org/3/library/array.html
It's not implemented like that and couldn't be. They aren't arrays unless you consider them arrays of pointers but this isn't a useful definition in Python.
An array contains a number of elements of the same size. That means you can find the memory address of any element by simple arithmetic: array_loc + index * element_size.
Python lists can contain objects of any type and the objects can't be found using arithmetic like that.
Just wondering here, is this guaranteed to always be the case? Practically it probably is, but does the Python spec (as in: the laguage, not one of it's implementations) say a list must be implemented using contiguous memory of slots with Python objects? That seems so low-level and C-ish. Or does the OP actually mean CPython here for instance?