
Time Complexity of Python Data Structures - iamelgringo
http://wiki.python.org/moin/TimeComplexity
======
djacobs
This article is a great example of what's so great about the Python ecosystem.
I don't think I've ever seen something like this for the Ruby data structures
(except for in the Hamster data structure project [1], which has solid
performance data).

[1] <https://github.com/harukizaemon/hamster>

~~~
ComputerGuru
You should program in C++ then. Every single data structure in the standard
library, boost, and 3rd party libraries dissected, analyzed, benchmarked,
extended, rewritten, and optimized :)

But there's a reason you don't. I chose C, others chose Ruby, and you chose
Python :)

~~~
kingkilr
Aren't the computation complexity of STL data structures defined by the spec?

~~~
beoba
STL data structures are explicitly defined as to their internals, which in
turn implies their performance. For example, when you're using a STL map, you
know it's internally a tree and has the upsides, downsides, and performance
characteristics of that structure. (for a hashmap you'd be using
unordered_map)

Or at least that's how I picture it.

~~~
ComputerGuru
Actually, it's the exact opposite. The STL specification specifies the
complexity but _not_ the underlying data structure.

For instance, if you'll refer to section 23.1.2 of the C++ 2003 standard
("Associative Containers"), you'll find that nowhere does it mention "tree,"
though it does explicitly lay out the complexity requirements for associative
containers such as map and multimap, that at the end of the day, can really
only be had with one type of tree or the other. Table 69 goes through all the
functions and operations and the complexities allotted for each. Some are even
explicitly specified as "compile time" operations. No C++ implementation is
allowed to deviate from this, even to provide better performance.

~~~
beoba
I think of it in these terms: If you're implementing a key for a std::map, you
have to implement operator<, which in turn implies that std::map is tree-
based. Similar for unordered_map, which would want operator== and a hash
function.

Either of these interfaces are effectively describing what the internals are
going to look like, even if the official standard decides to be coy about
using the terms "tree" or "hash map" when describing them.

In the case of Python, the fact that {}.keys() has no guaranteed order (for
example) already implies that dictionaries are hash based and would have
similar characteristics to any other hash map.

~~~
ot
> I think of it in these terms: If you're implementing a key for a std::map,
> you have to implement operator<, which in turn implies that std::map is
> tree-based.

Not necessarily. For example a skip list can satisfy the same requirements but
it is not a tree

~~~
beoba
True, but you still end up with similar performance to a tree.

------
mahmud
The great majority of them should be intuitive to any competent programmer,
imo, at least within a ballpark. Some of it is implementation detail (list
append isn't required to be constant, and could be O(N) in a naive
implementation, for example) but generally, you should have an intuitive feel
for this stuff across languages.

Sometimes you can almost guess how a language is implemented just from
profiling data.

~~~
ohmygodel
I went specifically looking for this information a week ago when I was
learning Python. It wasn't clear to me how a list was implemented, and in
particular if random access and deletes were constant or linear time

~~~
jemfinch
Which is exactly why it should never have been called "list". If it were
"vector" or "array" you'd never have even wondered.

------
rflrob
Having not thought deeply about big-O runtimes since Freshman year, are set
operations usually harder to evaluate? Half of them don't have "wort-case"
times, and Symmetric Difference just has a "?" for the average case.

------
praptak
It is interesting that get-slice on list does not use copy-on-write, so it is
in fact Theta(k) even if you don't modify the slice.

~~~
ot
Copy on write can be very dangerous if not used properly. Besides concurrency
issues (not specifically for python, because of the GIL), in the case of slice
taking a small slice from a big list would keep a reference to the list even
if it is not needed anymore. In pathological cases this can be as bad as a
memory leak.

------
kmfrk
Thanks for this. It's a bloody goldmine.

