A binary heap also has better locality of reference, because you have more chances of a cache hit, especially when you're near the root. The Fibonacci heap, in contrast, keeps a number of pointers to memory locations that are all allocated on-the-go and possibly very far apart. If the binary heap is implemented as a B-heap, as described by PHK , it can be made even faster.
My point is: please benchmark. Always benchmark. "Stupid" algorithms and data structures may work much better for your case than you would have thought. Programmers are notoriously bad at finding the real bottlenecks... so please benchmark and profile. Fibonacci heaps, AVL trees, KMP, Coppersmith-Winograd, these are all little works of art and theoretically great solutions. With big data, they will perform better than more naive solutions. But that big data doesn't happen as often as you might think. Computers are weird, have too many quirks, and their operation will surprise you, no matter how skilled you are. So profile your programs.
"The asymptotically best algorithms frequently turn out to be worst on all problems for which they are used." — D. G. CANTOR and H. ZASSENHAUS (1981)
(250 is peanut-sized in cryptography, number theory or computer algebra, and asymptotically fast matrix multiplication is very much useful in practice in those domains.)
Strassen multiplication usually starts to win over classical multiplication already for matrices with a few hundred rows (in special circumstances, it is even superior for 2x2 matrices). The Coppersmith–Winograd algorithm, on the other hand, is probably "strictly of theoretical interest" for the time being...
- expected portfolio mean & variance
- optimizing mean-variance subject to a risk budget (following Markowitz portfolio theory), etc.
would all entail operating on the n×n covariance matrix, which is typically dense. If your universe is the Russell 1000, S&P 500, etc, then n will be significantly larger than 250.
And yes, I was thinking that it was the "sparse" nature of typical large matrices that makes them impractical.
See this StackOverflow question for some nice insights: https://stackoverflow.com/questions/504823/has-anyone-actual...
Also I would point out that as computers get faster things that were suboptimal before become feasible and begin to outperform the tricks people used before. A good example would be spellchecking, which had to be done using a wide array of cleverness but now we can just throw tons of computing power at it and get better results.
An explanation of why it's useful in practice is again totally besides the point - and really would be taken to be rather insulting by many researchers - the motivations are often largely theoretical.
As a case in point, the fibonacci queue noted by OP was motivated by reducing the theoretical worst-case asymptotic complexity of Dijkstra's algorithm, and does that very - no benefit intended at all for normal programmers.
If people do not know what big-O means, that's their own problem. The algorithms people do their research, and their papers are published in the context of an informed reader - if someone does not know the definition of big-O and make mistakes based on it, the time wasted by them is probably their own fault.
In the case of this article, it seems as though the code originally ran in 200ms. This seems like a very acceptable amount - and one that would be improved more by using an algorithm like A* with decent heuristics than by simply trying to do the same work in less time.
In the average case you do the expensive median of medians on a small fraction of the data and so only pay a few percent average penalty. And in the worst case you still get n log(n) performance.
Of course, shuffling is still potentially helpful because ascending/descending order is often common in many applications.
What changes is that one very common permutation of data (ascending data) does not have O(n * 2) performance.
I'm not sure what you mean by deterministic shuffling. Shuffling is supposed to be random, and therefore non-deterministic. In practice, you'd probably use a pseudorandom function to implement shuffling, but the goal of pseudorandom functions is to be indistinguishable from random functions, so that detail shouldn't matter.
Probably what you meant was that shuffling makes the worst case very unlikely.
On a practical level, if you have to shuffle the data before sorting to make your sort algorithm work better, you've most likely already blown any performance advantage. So you might as well choose a different sort algorithm.
So Quicksort is O(n^2); the worst case is <= cn^2, for a constant c.
Sorry to bring it up. I was wondering if I wanted to be the annoying guy who nitpicks on that; but then I thought you might be interested in the difference.
[edit1: was wrong, don't want to mislead]
[edit2: not sure I was wrong anymore, this is confusing]
The best case is something entirely separate, and can itself be bounded from above or below.
- the Ɵ() and O() notations can be used on mathematical function. For example 5n^2 + 3n = Ɵ(n^2) [^1] or 3n^2 = O(n^3).
- algorithms have different complexities: for best case, for worst case, or average case. It corresponds to as many mathematical functions.
Each of these functions can be given appropriate Ɵ() or O() bounds.
[^1]: really meaning λn.5n^2 + 3n ∈ Ɵ(λn.n^2), it's a relation between functions.
And if a stackoverflow link isn't good enough, than go read CLRS, Dasgupta, or Skiena.
Quicksort has no Theta(x), it is O(n^2) and Omega(n log n) with an expected runtime of n log n. The reason that Quicksort performs so well is because it is only in very specialized cases where it devolves to its worst case.
Edit: Replied to the wrong commenter, so the replied to post isn't wrong, the grandparent post is.
1. Theta(n) means that it's O(n), O(n^2), O(...), but
Theta(n) is a proper, tighter definition of the upper
bound. It doesn't say anything about the lower bound.
2. Theta(n) means that it's tightly bounded up and down
by n. Both Omega(n) and some O(n) are equal, thus its
That is why I emphasized that the worst case in particular has its bound within a constant factor of n², hence Ɵ.
Profiling is good, but I don't see how it would have helped her in this case. In order to profile the Fibonacci heap, you'd have to implement it first. She mentioned that it took twice as long as the original algorithm, and timing is a simple form of profiling, so she basically did profile (after implementing it).
What would have helped is intuition about algorithms and the ability to estimate how fast they'll be without having to implement and time them. This was a good exercise for developing that intuition.
It's useful to keep in mind that O(log n) is effectively O(1) in the real world, because on actual computers, integers are bounded by 2^64, which means that any logarithmic factor will be <= 64 (or some constant). When O(n) and O(n log n) algorithms differ in practical performance, the asymptotic difference is not usually the best explanation, IMO. Linear algorithms tend to be simple and make one pass over the data, which is going to make them fast. O(n log n) algorithms tend to be more complicated, make multiple passes over the data, and/or involve doing things that are slow in hardware, which is going to make them slower. However, that is just a general trend, and sometimes things are reversed.
Dijkstra's algorithm factors into two components: (1) time spent
selecting best paths and (2) time spent updating the best
currently known paths.
In a naive implementation, a best path to every unvisited vertex
is kept. Selection of a best path to a vertex v_0 is made by a
linear scan of the paths (O(V) each time). Locking in a path
causes us to possibly update paths to every vertex v_1 that is
touched by an edge from v_0 (constant time per out edge of v_0).
Overall, time complexity is O(|V||V|) selecting paths, and
O(|E|) updating paths. Since |E| is bounded by |V|(|V| - 1), the
time is O(|V| |V|) (insensitive to density).
Using a min-heap to store best known paths, we spend O(log(|V|))
time selecting each path, but an update takes O(log(|V|)). This
means the total time complexity is O(|V|log(|V|) +
In the case of dense graphs, this is worse: O(|V| |V| log(|V|)). We're trying to keep the heap organized to allow fast
extract, but there are too many updates and the heap is changing
too much iteration to iteration. OTOH, if the grpah is sparse,
|E| is in O(|V|), so we reduce the time complexity to O(|V|log(|V|)).
A fib heap keeps the same extract time, but updates are O(1)
amortized. Thus the time complexity is O(|V| log(|V|) +
|E|). If the graph is dense, this is O(|V| |V|) (as good as
naive). If the graph is sparse, this is O(|V|log(|V|)) (as good
as bin heap).
This is useful if we do not know whether the graph is sparse or
dense. However, I am not sure what the constants are on Fib
heaps. If you know the density of the graph, I certainly expect
using the appropriate of the first two ways is superior. Another
thought: you could always speculatively execute both algorithms
and see which finishes first :P
Anyway, I hope that's not too boring of me.
Edit: Grr. Asterisks.
One of the early lessons I learned is that designing functional versions of algorithms is a whole higher level of hard over and above implementing conventional versions. For me, getting to a functional or less imperative version is an iteration of an imperative preliminary design. There's a stage at which it is easier not to get bogged down with data structures as values and instead let them be variables because it allows me to simplify the transition to the next state as (next-state). I don't have to add additional parameters to every function and coordinate them between functions.
I think that it easy to lose site of the fact that functional programming is supposed to make our lives easier by making it easier to reason about our code. Jumping from imperative pseudo-code to an idiomatic functional implementation is not easier for me to reason about than from the pseudo-code to It's direct implementation. And it's easier to reason from a direct implementation to a more functional one after I've got a feel for the direct implementation.
Probably it's an intellectual shortcoming on my part that functional implementations don't come readily to me when looking at imperative pseudo-code. I just have to live with the fact that even money says I am below average and will probably have to be naughty and use mutation sometimes.
But Roughgarden is top drawer, too.
(Side note: the union-find algorithm is another interesting algorithm with the same "problem".)
I will add as a disclaimer that I'm somewhat new to the world of immutable data, and am still experimenting with this graph structure -- seeing how it scales with complicated data and relations.
How many nodes are there? Let's presume there are a lot, otherwise the problem is trivial and speed doesn't matter anyway. So now you have an N-long immutable array that you are copying every time you want to change a node pointer? So you have changed the operation of pointer-changing from O(1) to O(N)? What does that do to the run time of the algorithm?
Also, your garbage velocity just went WAY up. What does this do for the runtime of your program generally?
2. Excessive generation of garbage is a problem for functional languages in general; that is nothing specific to immutable arrays. Good compilers analyze object lifetimes and perform updates in-place when possible (e.g. Mercury does this). Even if your compiler doesn't do this, the garbage generated is constant or logarithmic with respect to the size of the array for most common immutable-array implementations.
So yes, I'm sure.
If you, the user, has "little insight" into the runtime of data structures you're using, you have bigger issues. Every standard library I've ever seen guarantees the asymptotic behavior of hash tables, sets, linked lists, etc. Immutable arrays are no different.
You should note, that log n factor is so small (up to 40 on the biggest data you will ever see), that it can be easily dominated by other constant factors in implementation (like cache locality, ...).
Also, both algorithms require identical data structures. After all, Dijkstras is just A* with a zero heuristic.
I do agree about the constant factor, though: it's likely that a binary heap would be faster on most data sets.
... And it is not easy to create their equally efficient general-purpose immutable counterparts. For purely functional languages, the worst-case slowdown is logarithmic in the number of memory cells used, because mutable memory can be represented by a purely functional data structure with logarithmic access time (such as a balanced tree).
In this case I think you could use (a pair of?) balanced trees to keep a mapping between the graph and the FIbonnaci heap, as well as for any "mutable" state you need to keep per node.
I would have the pointers in the Dijkstra's graph be pointers to a mutable data structure (we do this to make sure there is only one place that we need to update per node in the fibbonacci heap and to localize our mutability and keep it out of the rest of our code).
I would then need to update the update code in the fibbonacci heap to take a general function which it will run on the nodes that are updated and which gets passed the original and updated nodes pointer. In the Dijkstra case you simply need to write a function which takes these pointers and has the mutable data structure and modifies it accordingly.
Thus you have kept your generality, confined your mutability and solved your problem at the expense of one pointer indirection and an added callback function (and the associated documentation).
Modern architectures don't even provide random access for more than a handful of kilobytes. Anything that doesn't fit in the L1 cache incurs access latency roughly proportional to the logarithm of its size.
This must have implications for implementing a Merkle tree as an immutable data structure. Anyone know how a bitcoin client in clojure might deal with this?
Even Haskell, where the State monad itself is actually immutable, provides an escape hatch into mutable state for when it's really, truly required. Just if you do use it when it's not, great shame shall be visited on you and your kin. By which I mean someone in #haskell will very politely point out how you could have done it purely.
Easy undo can be achieved just as easily by reversing operations, in many cases. No need to keep two copies of a potentially large structure when you could just keep the diff and reverse apply it. Oddly, I think typically you would do both. Keep a few large snapshots with small diffs between stages. That is digressing, though.
As for memory-efficiency... not sure how keeping many potentially large copies is more efficient than just modifying a single copy.
Now, I can agree that in many cases it is nice that it prevents you from worrying about race conditions across threads. Though, I'm also not convinced that immutable things are any better than traditional locking strategies. Is that race truly won?
Which are cool, to be sure. They are far from new, however. So, I'll be a little surprised if it turns out they are a panacea.
List is L(a) = 1 + a L(a) (≡ Nil or Cons(a, L(a))
L(a) = 1/(1-a)
L'(a) = 1/(1-a)^2 = L(a)L(a) (≡ Pair of L(a) and L(a))
This is our zipper. One list holding for the tail and one list to remember how we get to where we are.
This one is trivial, but for e.g. trees it will be utterly fascinating.
So many ways to prioritize. So many ways to support the same basic operations, with different tradeoffs.
My semi-outdated site: http://leekillough.com/heaps/
I've recently put an alternative one in C++11 here: https://github.com/beniz/fiboheap