There are also some Python operations that have a large constant time - operations which are much slower, but still O(1), in Python than they are in lower level languages. In these cases, choosing algorithms which reduce the number of operations of that type can be a bigger win than a simple big-O analysis would suggest and allow pure Python code to compete better with code written in lower-level languages.
Better algorithms are the right answer the fast programs more often than faster code. There is room for both, but the "if the code isn't fast you may as well not bother with a good algorithm" claim is generally false.
Sure but why not do both??
Take the most computationally intensive parts, use a different language and use an efficient data structure in that other language. That might seem like a lot of work but fortunately there are handy-dandy libraries easily available to let you do this with even less effort than a simple language extension; just toss your data into SQLite or Kyoto Cabinet and let them deal with it. As is standard practice in, uh, many, many applications?
Your point about speeding up the program by improving the algorithm is valid - but it's only valid when the initial algorithm is doing something more or less unique. If the program is doing which a data base does, you should use a database like SQLite or a data store like Redis, Kyoto Cabinet, etc.
(The above is a serious simplification, but contains enough truth to be worth stating. No doubt others more knowledgeable than I can expand and enhance.)
So can balanced binary trees, and a number of other structures. I'd demand benchmarks, because Python is not C (and I don't mean just trivially, I mean, it is very not just C); just as one for instance, are you hitting a pessimal case with the GC with this algorithm? You never know.
I'm not actually demanding them, since it didn't seem the point of this exercise. My point is that you can't just assume.
It's especially interesting to me that the standard large body of work on algorithmic complexity is assumed by many to be getting less and less relevant as other issues come into play. Issues such as memory access time, where it now matters if something is in cache or not, and if so, which cache. Issues such as whether your machine will stop completely at some inconvenient moment to perform a GC when you least expect or desire it. Issues such as whether your language is in fact caching stuff in hash tables, and so what appears in your code to be linear, isn't.
And so on.
More and more it seems necessary to resort to experiment and tests to see what happens, rather than being able to determine things for sure and definite via analysis.
Exciting times, but slightly disappointing.
However if I had no O(log n) structures at all and needed to implement some, I'd have to be a maniac to want to do balanced binary trees, which have notoriously delicate balancing algorithms, over skip lists which are very sweet and simple.
1. They're really cool and clever. This is a valid reason.
2. They offer easier concurrency than most alternatives. It's straightforward (though not quite easy, unless you have transactional memory) to make a lock-free concurrent skip list. Try that with a heap or a red-black tree, and you'll quickly run into all sorts of memory conflicts and crazy-complicated locking. The fact that a skip list only provides probabilistic logarithmic time bounds really makes coordination between threads easier.
(Note that it's possible to do some similar stuff with modified versions of other data structures. For example, you can make a good concurrent dictionary by taking a red-black tree, relaxing the invariants, and adding a periodic rebalancing thread to run in the background. But that's a topic too long to fit into this post.)
... if a man held a gun to my head and forced me
to implement a logarithmic time algorithm I could
implement skip lists in an hour or so ...
Don't you mean asymptotic?
First, sure, CPython's only about two orders of magnitude slower than C, so on big enough data sets, the asymptotic performance rules. N log N is 100 times better than N^2 as soon as N is over about 1000. (The obvious-to-me way to implement an ordered map in Python would be to maintain a dict and a sorted list of keys; inserting N items into a sorted list is N^2.)
Second, this kind of data structure might be fast in PyPy. Like, C-like fast, or worse in the constant factors by only a factor of 2 or 5. Certainly you could imagine JIT optimizing dynamic language implementations where that was the case. But I haven't tested.
Third, usually not.