

Ask YC: Any resources on the complexity of built in type methods in Python? - ivankirigin

I'd like to know the complexity of functions on built in types in Python. O() notation is perfect.<p>Empirical tests would also be good.<p>This came up recently in a Twitter conversation (I'm @tipjoy) about eliminating duplicates in a list. I like using dictionaries:<p><pre><code>  def dupe(aList):
    d={}
    for x in aList:
      d[x] = True
      # or to count: d[x] = d.get(x,0) + 1
    return d.keys()
</code></pre>
Compare this to an alternative using the list count method:<p><pre><code>  def dupe(aList):
    f=[]
    for x in aList: 
      if not f.count(x):
        f.append(x)
    ret f
</code></pre>
I think N hash lookups are probably faster than N  calls to count, but I don't actually know. The difference could be small enough that doing what you think looks better is all that matters.<p>Anyone know a good resource?
What about for other languages?
======
gms
How about list(set(aList)) ? Anyway, to answer your question, look in the
Python FAQ at <http://www.python.org/doc/faq/general/>

But really, you shouldn't care about this sort of thing, until it actually
matters.

~~~
ivankirigin
You're right, don't do this till it matters. But even in profiling and
optimizing, the first steps are algorithmic. If you can't change the "outer
loop" algorithm, you'll probably look into things like this.

------
earle
that's what sets are for.

>>> list(set(['1', '2', '3', '1']))

['1', '3', '2']

~~~
ivankirigin
Cool

More often than not, I do want the count though.

I would be surprised if a set didn't use the exact same hashing subroutines
that a dict uses. Am I wrong?

~~~
MarkTwinia
Yes, but as gp said, don't bother yourself with these details.

------
earle
If you don't have set (< python 2.4):

def distinct(l):

    
    
        d = {}
    
        map(setitem, (d,)*len(l), l, [])
    
        return d.keys()

