Presumably the developer of the topological sort was operating on small graphs where time complexity was not an issue. Its better to have the inefficient but numerically correct algorithm at hand than no implementation at all. So I don't think the author of the original topological sort has make a mistake.
The best method for developing high performance algorithms is write the least error prone implementation first, with no eye on efficiency. Then build a separate high performance implementation. You test the second one is working by blowing random data into both and double check the fast one computes the same thing as the simple to understand implementation. That's how I built a persistent red-black tree over a space of two weeks
X having a higher growth rate than Y does not mean that Y is unconditionally faster. It means that there exists some problem size n such that Y is faster than X for all problems larger than n. Below that value, all those terms that big-O notation ignores are still significant performance factors.
Which was, at least in the C++ community I was active in, pretty famous
It actually helped a lot of beginners choosing the right C++ Standard Library containers for their needs.
Are there any similar posts to the original that provide simple explanations & benches of the effects of implementing different data structures/algos?
a) implement other data structures such as doubly linked lists, heaps, red black trees, etc.
b) combine the strengths of two different data structures in a new object
I think it was Bill Gates who emphasized the importance of picking good data structures up front.
A book I recommend for learning how to implement data structures (and concomitant algorithms) is: Introduction to Algorithms (http://mitpress.mit.edu/books/introduction-algorithms)
It is also useful to have some heuristic for picking data structures, and I have found the recipe proposed by Gayle Laakmann in her book, Cracking the Coding Interview, useful in most non-esoteric cases.
"I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important. Bad programmers worry about the code. Good programmers worry about data structures and their relationships."
For example "An introduction to data representation synthesis" (http://theory.stanford.edu/~aiken/publications/papers/cacm13...) presents a technique to do the following:
1) specify what the data structure should do using relational algebra (like for data bases)
2) automatically generate a data structure that is optimized for your specific hardware and workload.
A big advantage is that you can optimize for different hardware and workloads without changing your code.
Also, the relational specification is much easier to get right than an optimized implementation.
It has also been extended to concurrent data structures:
> Using the right data structure is usually bad
He argues, because really, most of the time it won't matter.
I absolutely love his section on how calling some programmers lazy when all visible evidence shows otherwise is silly.
Maybe an array of key-value pairs?
It's all the more funny, given the structured programming guys talked a lot about this some thirty years ago (at least), and now most of it is forgotten.
Going from the first form of the code to the second was the essence of the structured programming movement. When OO came to be it felt out of fashion, but the OO people tend to only teach modelling on problems with really quite obvious decompositions, and as a result one doesn't learn this ability of noticing a data structure that could be formalized in a mess of straight-line code.
In essence it was exactly the same thing you did, but due to it being OO, I managed to reuse it quite a few times since (having something which cycles between three states on click is a breeze now). The thing here is that the problem of "wrapping" a sequence instead of throwing some kind of Out-of-bounds-the-sky-is-falling error is pretty common and it deserves generic solution. You - and structural programming, for that matter - trapped a solution to this problem inside a solution for carousel. That's exactly one of more important things OOP came to fix.
But I guess it's true that "the OO people tend to only teach modelling on problems with really quite obvious decompositions". I think nearly 100% of examples of OO in tutorials and books are completely useless and that there should be some major change in the way of teaching and thinking about OO. For me, that I managed to come up with a useful abstraction back then was pure luck. Later, when I saw Smalltalk - and worked in it for some time - I finally understood what OO is about and how beautifully it can simplify and generalize problems which would be one-off with procedural/structured programming.
I understand what both of you are trying to say. Thanks for the answers. Yes, I'm applying that kind of thinking quite often myself. (I do Haskell at work, where these abstractions are particularly natural. No OOP necessary. )
You might want to look into it.
i've seen this same fundamental blunder in AAA game code bases that will remain nameless...
so its not just choosing the right data structure, but understanding the implementation well enough that you chose the correct implementation of a data structure - which might not be the one in the standard library unfortunately
anyone knows if membership test in nsset is also as faster than in nsarray as described in the article ?
Lists, being an "asymmetrical" recursive data structure, have O(1) time for adding an element to its head but O(n) time for sticking an element to its tail, and a search (for membership) would be O(n) in the worst case. Keeping the list pre-sorted would save some time.
In case one needs a quick (close to O(1)) look-up there are hash-tables. It is even funny to see that it comes as a surprise.)
As Brian Harvey explicitly mentioned few times, constant factors doesn't matter, while choosing appropriate data-structure could make a dramatic changes, such as changing form a linear time to near constant.
Another topic is about using non-memoized recursion, like in the classic case of a naive recursive function of computing Fibonacci numbers.
That is another proof that taking a decent introductory CS course (preferably, based on SICP) is the must for any person who for some reason decided to code.
- SICP the book and the original MIT course that is available online do not cover data structures or algorithms very well, it isn't that much in the focus of the book, they implement a queue in one place from what I remember, there is some mention of big-oh, but it's all rather cursory. Thomas Cormen published "Algorithms Unlocked" recently, which seems like a good intro book for programmers who are not willing to devote a fraction of their life to studying the big Cormen book or Knuth.
- Constant factors matter increasingly more and more. There has been a lot of talk in the last years about how things like big and very fast processor caches make some not-asymptotically optimal algorithms outperform the asymptotically optimal ones for all input sizes we can ever expect to handle. One should of course still understand asymptotic analysis, but you also have to understand caches, locality of reference and do real-world benchmarking. One doesn't substitute the other.
Actually it takes a long time to understand what they are saying (big ideas behind the words) and why they chose so (why this or that particular example).
But the great idea is quite simple: "All you need is Lambda and the List structure as a glue (to create layers of DSLs) plus self-discipline of not violating abstraction barriers". But one has to really understand them. (Well, arrays and hash-tables are also necessary due to some efficient algorithms based on its unique properties).
Well, a list is an abstract data structure. Being Python, it's implemented as an array, so adding an element to the head actually costs O(n) - since all elements must be copied to the next position - while appending to the tail is O(1) (amortized, since in certain cases it triggers a full resize, requiring a full new copy of the list).
The language wiki has a page with the time complexity of a few of its data structures: https://wiki.python.org/moin/TimeComplexity
By the way, you could make Python's list's adding of one element at the back worst case O(1), too, if you were willing to waste some space.