
Complexity of Python Operations (2013) - SethMurphy
https://www.ics.uci.edu/~pattis/ICS-33/lectures/complexitypython.txt
======
mdxn
People reading this write-up should keep in mind that the author uses the term
"complexity class" incorrectly. When the author says this, they actually mean
"worst case runtime". It never dives into space complexity, which is yet
another complexity measure for evaluating the performance of these operations.
In some spots they completely abuse big O notation to make it do things it is
not supposed to. For example:

"O(==) is the complexity class for checking whether two values in the list are
=="

This stuff needs to be fixed since it is only going to lead to later confusion
(and embarrassment).

A complexity class is a set of problems that tend to be similar in resource
requirements. O(f(n)), more or less, means asymptotically bounded from above
by c*f(n) for some constant, c, as n -> inf. It is a general way to represent
the asymptotic growth rate of functions and is NOT married to the concept of
complexity classes.

~~~
_droptable_
> When the author says this, they actually mean "worst case runtime".

Actually, the author says that all dict operations take O(1) time, which is
the best (or average) case and not the worst case. I think it should at least
mention the worst case! No hash table exists where you get worst case O(1)
(non-amortized) for all three of insertion, deletion, and lookup. You either
have to resize the table at some point or you have to deal with hash
collisions.

(BTW, I've been asked this question numerous times in job interviews and
answering O(1) without qualifying the statement is definitely wrong.)

See
[https://wiki.python.org/moin/TimeComplexity](https://wiki.python.org/moin/TimeComplexity)

and [http://cs.stackexchange.com/questions/249/when-is-hash-
table...](http://cs.stackexchange.com/questions/249/when-is-hash-table-
lookup-o1)

~~~
kevinwang
This is a good point. The reason for this error in the notes, I believe, is
that they are for an introductory programming class at this university and
introducing this nuance may put off some students with its complexity.

------
deathanatos
Some of these don't seem quite right. For example, set difference:

    
    
      Difference    | s - t        | O(len(t))     |
    

Worst case, we do something like `set(range(1000)) - {1}`; the bulk of this is
going to be making a copy of s from which to subtract t, which is O(|s|). (If
this were `s -= t`, I'd agree, O(|t|).) (Same logic applies to the other set
operations.)

    
    
      Clear         | d.clear()    | O(1)	     | similar to s = {} or = dict()
    

I would actually guess d.clear() is O(|d|) _in CPython_ : you're doing to need
to deref each item in the dictionary[1]. Similarly, earlier it states,

    
    
      Binding a value to any name is O(1).
    

If there's already a value stored in the variable that gets unref'd, leading
to more derefs… (though I don't know how you'd express that in O(…); perhaps
we might say that it might be a larger constant time than you expect!)

    
    
      Note that for i in range(...) is O(len(...)); so for i in range(1,10) is O(1).
    

I feel like O(…) notation's purpose in life is to show how an algorithm's
complexity in time or space responds to changes in its inputs. Saying a
constant expression such as `for i in range(1,10)` is "O(1)" somewhat misses
the point, I feel, as there aren't any real inputs to vary. (As opposed to
`for i in range(n)`, where the `n` is an unknown input, and the run time of
the loop is going to be O(n).)

    
    
      check ==, !=  | s != t       | O(min(len(s),lent(t))
    

(this is for sets again) This one isn't wrong, but there's an interesting
opportunity for the notation here, I feel. CPython keeps track of the set's
length along with the data internally[2], and checks that prior to iteration
over the set[3]; so you _could_ notate this `O(|s|) == O(|t|)` as the worst
case. This is because in the worst case, |s| and |t| must be equal; if they're
not, the internal size comparison fails, and you're O(1) (how long to check
the sizes). (A lot of language implementations' set implementations do this I
think; in fact, I'd say most programmers would expect it, because they'd
expect len(a_set) to be O(1).)

Also, I think some of these are great examples of just how easy CPython's
implementation is to read in many cases.

[1]: see this, for example:
[https://github.com/python/cpython/blob/0eb5f5996feed30ade8c2...](https://github.com/python/cpython/blob/0eb5f5996feed30ade8c2e8a7f5a056c3006965d/Objects/dictobject.c#L1355-L1359)

[2]:
[https://github.com/python/cpython/blob/0eb5f5996feed30ade8c2...](https://github.com/python/cpython/blob/0eb5f5996feed30ade8c2e8a7f5a056c3006965d/Include/setobject.h#L46)

[3]:
[https://github.com/python/cpython/blob/0eb5f5996feed30ade8c2...](https://github.com/python/cpython/blob/0eb5f5996feed30ade8c2e8a7f5a056c3006965d/Objects/setobject.c#L1835-L1836)

~~~
viraptor
How do you deal with GC in reality though? I mean, if you include it in
calculations, then any operation which derefs an object is at least
`O(number_of_heap_objects)`. Or in a compacting GC anything that creates a new
object potentially is at least `O(heap_size)`.

I think it makes perfect sense to ignore this.

Edit: actually it's higher than `O(number_of_heap_objects)` for refcount,
because you can trigger a cycle collector which is unlikely to be `O(N)`

~~~
deathanatos
I thought about trying to generalize the previous post to a generic GC, but it
proved more difficult, so I left it out.

In my first post, I was mostly focused specifically on CPython, which does
ref-counting with some GC to catch cycles. I was ignoring the real GC and
cycles in my post; my understanding of d.clear() is that the ref counting
happens and happens immediately, so _at best_ it's still O(|d|). My main point
was that in the implementation you're likely to be using, it happens at the
time you run the code, and can thus change the order notation of the code (and
perhaps how you expect it to perform, though you did have to construct d,
which was O(|d|) at best to begin with…).

In a generic GC, I was thinking it _might_ be fair game to ignore the GC. If
GC can be done completely concurrently with the remainder of the app, and
doesn't end up effectively stealing resources (mostly, CPU, I guess) from the
app, then I think I might be persuaded to call it "free", or at least close to
it. One might cynically call a "never free; just alloc and let the OS free on
process death" a very naive form of garbage collection, and there, it is
definitely close to free.

------
datwhitehairdoe
pattis ftw

~~~
dang
It's pretty cool that you guys like your professor so much and that probably
means he's an awesome prof, but comments like these don't belong here, so the
community is likely to treat them pretty harshly.

We'd be happy if you stuck around and learned the ways of HN and started
making solid contributions, so please do! Best to read these links to get an
idea of what's good on the site:

[https://news.ycombinator.com/newsguidelines.html](https://news.ycombinator.com/newsguidelines.html)

[https://news.ycombinator.com/newswelcome.html](https://news.ycombinator.com/newswelcome.html)

------
pattis_is_god
Based Pattis

------
TwiSparklePony
This is my professor

