Hacker News new | past | comments | ask | show | jobs | submit login
Dijkstra’s algorithm and the Fibonacci heap (maryrosecook.com)
244 points by tosh on July 6, 2014 | hide | past | web | favorite | 95 comments

Please do note that, depending on the dataset size, it may be faster to just use a plain old binary heap. In this case, the big O cost for Dijkstra's algorithm is O((E + V) log V), as opposed to the better O(E + V log V) for a Fibonacci heap. Note, also, that Fibonacci heap operations are amortized time, so if you're not operating on a large enough dataset, the costs will not amortize enough and you may end up with a slower real time.

A binary heap also has better locality of reference, because you have more chances of a cache hit, especially when you're near the root. The Fibonacci heap, in contrast, keeps a number of pointers to memory locations that are all allocated on-the-go and possibly very far apart. If the binary heap is implemented as a B-heap, as described by PHK [1], it can be made even faster.

My point is: please benchmark. Always benchmark. "Stupid" algorithms and data structures may work much better for your case than you would have thought. Programmers are notoriously bad at finding the real bottlenecks... so please benchmark and profile. Fibonacci heaps, AVL trees, KMP, Coppersmith-Winograd, these are all little works of art and theoretically great solutions. With big data, they will perform better than more naive solutions. But that big data doesn't happen as often as you might think. Computers are weird, have too many quirks, and their operation will surprise you, no matter how skilled you are. So profile your programs.

[1]: http://queue.acm.org/detail.cfm?id=1814327

Among my favorite leading quotes from Knuth's works is this:

"The asymptotically best algorithms frequently turn out to be worst on all problems for which they are used." — D. G. CANTOR and H. ZASSENHAUS (1981)

Fun story: Knuth claims in the 1997 edition of TAOCP vol. 2 that dense matrices of size larger than 250 "rarely occur in practice" and that asymptotically fast matrix multiplication is "strictly of theoretical interest". In 1998, Erich Kaltofen posed "Open Problem 7: Convince Donald Knuth that these asymptotically fast methods are of practical value. If he pays you $2.56 for this technical error, you have solved this problem." As far as I know, this is still an open problem.

(250 is peanut-sized in cryptography, number theory or computer algebra, and asymptotically fast matrix multiplication is very much useful in practice in those domains.)

Are those matrices dense? It's an important qualifier. My corner of the computational world (finite-element related code) very frequently gets to millions of rows and columns but with only O(N) nonzero entries (where N is #rows which is approximately #cols too).

Matrices can be everything between dense and extremely sparse, depending on the application (sometimes you have a structured matrix and can use sparse methods to reduce it to a dense system of much smaller size). But yes, dense matrices with thousands of rows do occur. The applications where Strassen's asymptotically fast multiplication algorithm really helps are certainly rare compared to overall uses of linear algebra, but they do "strictly" exist.

Strassen multiplication usually starts to win over classical multiplication already for matrices with a few hundred rows (in special circumstances, it is even superior for 2x2 matrices). The Coppersmith–Winograd algorithm, on the other hand, is probably "strictly of theoretical interest" for the time being...

Spectral methods [1] are FE-like methods with non-zero basis functions over the whole domain. Indeed they produce full matrices. [1] http://en.wikipedia.org/wiki/Spectral_method

For a simple application in finance, consider a portfolio with n stocks. Then finding various statistics around

- expected portfolio mean & variance

- optimizing mean-variance subject to a risk budget (following Markowitz portfolio theory), etc.

would all entail operating on the n×n covariance matrix, which is typically dense. If your universe is the Russell 1000, S&P 500, etc, then n will be significantly larger than 250.

I actually just recently read that section. Was curious how accurate it was. Was also curious if I could take a stab at implementing one of the algorithms he touches on.

And yes, I was thinking that it was the "sparse" nature of typical large matrices that makes them impractical.

In finance, I've dealt with larger (but not much larger) dense matrices.

I like "Fancy algorithms are slow when n is small, and n is usually small." -Rob Pike

Thanks, this is a pet peeve of mine. Theoretical bounds are all nice and such, and it's cool if you can prove a better running time using some algorithm, but if it doesn't actually perform better in practice most of the time then it's not really worth much. I blame computer scientists who are in a rush to publish the best theoretical solution but never bother to consider whether it has practical impacts. Also textbooks who blithely claim you can do better with such and such a data structure using a certain algorithm.

See this StackOverflow question for some nice insights: https://stackoverflow.com/questions/504823/has-anyone-actual...

Why should everything have an immediate practical impact? "Theoretical" research often becomes useful in future as new developments happen. E.g. as computers get faster it becomes more and more feasible to tackle huge problems, i.e. so large that the asymptotically faster algorithms are faster in practice (that is, the problem size overcomes the large constant factor).

Please don't misunderstand my post as an attack on research that has a more theoretical impact. What I'm saying is that if you claim to have found a faster version of some algorithm, then it should actually be true (preferably with working code). I have no problem with people showing a theoretically better version as long as they explain whether or not it's useful in practice.

Also I would point out that as computers get faster things that were suboptimal before become feasible and begin to outperform the tricks people used before. A good example would be spellchecking, which had to be done using a wide array of cleverness but now we can just throw tons of computing power at it and get better results.

I believe the mistake to be yours. Algorithms is a largely theoretical field - when an algorithms researcher comes up with an algorithm with a lower asymptotic bound, that's all he or she is claiming - they're making no statements about its practical utility or the performance of an implementation. An implementation will almost certainly be provided, but that's besides the point.

An explanation of why it's useful in practice is again totally besides the point - and really would be taken to be rather insulting by many researchers - the motivations are often largely theoretical.

As a case in point, the fibonacci queue noted by OP was motivated by reducing the theoretical worst-case asymptotic complexity of Dijkstra's algorithm, and does that very - no benefit intended at all for normal programmers.

If people do not know what big-O means, that's their own problem. The algorithms people do their research, and their papers are published in the context of an informed reader - if someone does not know the definition of big-O and make mistakes based on it, the time wasted by them is probably their own fault.

In the case of this article, it seems as though the code originally ran in 200ms. This seems like a very acceptable amount - and one that would be improved more by using an algorithm like A* with decent heuristics than by simply trying to do the same work in less time.

It is worth pointing out here that Quicksort has complexity of Ɵ(n²) in the worst case. Yet it has worked out as the best one in practice in many applications.

That's true, but it's also worth pointing out that the worst case can be trivially prevented by shuffling the array before you Quicksort it. Which is maybe a confirmation of your larger point.

The principled way to avoid bad worst-case performance for quicksort is to select the pivot using a median of medians approach. That gives you good asymptotic performance even for adversarial inputs, but in practice it tends to be slower. Which fits well with the theme of this article.

You can trivially achieve bounded worst case performance with minimal overhead by doing median of medians only, say, one time in 5.

In the average case you do the expensive median of medians on a small fraction of the data and so only pay a few percent average penalty. And in the worst case you still get n log(n) performance.

How is the worst case not n^2 on a case you don't use medians of medians?

Every fifth pivot is chosen to be an actual median.

The median-of-medians approach is also somewhat tricky to implement (avoiding array copies and so on), although it admittedly looks neat on the slides of the algorithms lectures.

Strictly speaking, it is not true that shuffling first avoids the worst case. Given a deterministic shuffling function, some permutation of input needs to shuffle to sorted order, which will again trigger the worst case.

Of course, shuffling is still potentially helpful because ascending/descending order is often common in many applications.

Randomization of the pivots produces O(n log n) average running time with a very low probability of any other running time for any value of n where the worst case running time matters.

Even without randomization of pivots expected run time is O(n log n). If the order of the input data is random, I don't believe randomizing pivots changes the distribution of running times.

What changes is that one very common permutation of data (ascending data) does not have O(n * 2) performance.

There's little justification for expecting data to behave randomly unless our algorithm introduces it.

Correct. In general, real data will seldom, if ever, look like random data.

Shuffling with a too-small seed size will render the shuffle susceptible to brute force attacks; but in no way invalidates its use as a dodge for pathological (pre-sorted) arrays submitted to Quicksort in the course of normal usage. There's a better chance of getting eaten by a dinosaur than finding O(N^2) QS performance on shuffled data that comes from anything other than an attacking adversary.

You're right that shuffling doesn't avoid the worst case, but your explanation isn't quite correct. See my other comment for my explanation.

I'm not sure what you mean by deterministic shuffling. Shuffling is supposed to be random, and therefore non-deterministic. In practice, you'd probably use a pseudorandom function to implement shuffling, but the goal of pseudorandom functions is to be indistinguishable from random functions, so that detail shouldn't matter.

Shuffling can never prevent the worst case of any algorithm. "Worst case" means the worst out of all possible cases. Shuffling doesn't change the set of cases, and it doesn't change anything about any individual case.

Probably what you meant was that shuffling makes the worst case very unlikely.

On a practical level, if you have to shuffle the data before sorting to make your sort algorithm work better, you've most likely already blown any performance advantage. So you might as well choose a different sort algorithm.

I believe Big Theta means bounded above and below by `n^2`, where the bounds differ by a constant factor. Saying that Quicksort is Ɵ(n^2) means that is has worst case `n^2` AND best case `n^2`. But the best case of Quicksort is `n`.

So Quicksort is O(n^2); the worst case is <= cn^2, for a constant c.

Sorry to bring it up. I was wondering if I wanted to be the annoying guy who nitpicks on that; but then I thought you might be interested in the difference.

   [edit1: was wrong, don't want to mislead]
   [edit2: not sure I was wrong anymore, this is confusing]

No, it means the worst case is bounded above and below by n^2. That is, the worst-case scales no worse or better than n^2, but approximately proportional, asymptotically. (Whereas, for example, every O(n^2) algorithm also happens to be O(2^n))

The best case is something entirely separate, and can itself be bounded from above or below.

It doesn't make sense to say there's an upper and lower bound on the "worst" case. If that were true, the upper bound would be the new worst case, and the lower bound would be some middle case. A best or worst case is inherently always going to be big-theta.

This talks about our ability to reach or describe the worst case, so if we say it's Ɵ(n^2), we're saying that we can tell that the worst case is at least quadratic, and it's also no worse than quadratic. We might not be so good at the math and have only discovered that the worst case was at least linear and no worse than exponential (which is true of quicksort).

To help you wrt your edit 2, you seem to be confusing two things:

- the Ɵ() and O() notations can be used on mathematical function. For example 5n^2 + 3n = Ɵ(n^2) [^1] or 3n^2 = O(n^3).

- algorithms have different complexities: for best case, for worst case, or average case. It corresponds to as many mathematical functions.

Each of these functions can be given appropriate Ɵ() or O() bounds.

[^1]: really meaning λn.5n^2 + 3n ∈ Ɵ(λn.n^2), it's a relation between functions.

Except you are wrong. O(x) is an upper bound, Omega(x) is a lower bound, and something is Theta(x) iff it is bounded by Omega(x) and O(x) where (x) is the same[1].

And if a stackoverflow link isn't good enough, than go read CLRS, Dasgupta, or Skiena.

Quicksort has no Theta(x), it is O(n^2) and Omega(n log n) with an expected runtime of n log n. The reason that Quicksort performs so well is because it is only in very specialized cases where it devolves to its worst case.

[1] http://stackoverflow.com/questions/10376740/big-theta-notati...

Edit: Replied to the wrong commenter, so the replied to post isn't wrong, the grandparent post is.

After reviewing, I've understood two meanings people give to Theta(n):

    1. Theta(n) means that it's O(n), O(n^2), O(...), but 
       Theta(n) is a proper, tighter definition of the upper
       bound.  It doesn't say anything about the lower bound.

    2. Theta(n) means that it's tightly bounded up and down 
       by n. Both Omega(n) and some O(n) are equal, thus its 
I think 2 is the proper definition (thus my 2nd edit), but I'm not totally sure. I've let my original comment there with the edit chain in hope discussions would bring up the proper definition. Unfortunately I'm getting downvoted for that, which I find odd.

I was confused at first :)

Saying it is bounded by O(n²) is too vague, because it's the same as being bounded by O(n³) and so on. A linear algorithm is also O(n²) and O(n³).

That is why I emphasized that the worst case in particular has its bound within a constant factor of n², hence Ɵ.

I think the general point is that problems just aren't that interesting for small n.

Well the even-more-general point is that these problems usually aren't that interesting in real life, then.

Another good reason to try a binary heap: it would have only taken a few hours to implement. Then you can do some benchmarking, count your gains, and perhaps quit while you're still ahead.

Profiling is good, but I don't see how it would have helped her in this case. In order to profile the Fibonacci heap, you'd have to implement it first. She mentioned that it took twice as long as the original algorithm, and timing is a simple form of profiling, so she basically did profile (after implementing it).

What would have helped is intuition about algorithms and the ability to estimate how fast they'll be without having to implement and time them. This was a good exercise for developing that intuition.

It's useful to keep in mind that O(log n) is effectively O(1) in the real world, because on actual computers, integers are bounded by 2^64, which means that any logarithmic factor will be <= 64 (or some constant). When O(n) and O(n log n) algorithms differ in practical performance, the asymptotic difference is not usually the best explanation, IMO. Linear algorithms tend to be simple and make one pass over the data, which is going to make them fast. O(n log n) algorithms tend to be more complicated, make multiple passes over the data, and/or involve doing things that are slow in hardware, which is going to make them slower. However, that is just a general trend, and sometimes things are reversed.

I think as a practical matter, sure. But it's also really fun to explore "fancy" algorithms, even when you don't have a practical need for them. I certainly felt the tone of the article was less "I need to sort a shitton of data" and more "I'm checking out this cool thing."

I would never put KMP and Coppersmith-Winograd into same place in terms of practicality. KMP gives you very good and fast results on some data (especially data with repeated sequences). Coppersmith-Winograd (and even Strassen) will never be practical on any reasonable data we will see in near future (http://www.ics.uci.edu/~fastmm/FMM-Reference/paoloA.ishp-vi....)

Cool article.

Dijkstra's algorithm factors into two components: (1) time spent selecting best paths and (2) time spent updating the best currently known paths.

In a naive implementation, a best path to every unvisited vertex is kept. Selection of a best path to a vertex v_0 is made by a linear scan of the paths (O(V) each time). Locking in a path causes us to possibly update paths to every vertex v_1 that is touched by an edge from v_0 (constant time per out edge of v_0).

Overall, time complexity is O(|V||V|) selecting paths, and O(|E|) updating paths. Since |E| is bounded by |V|(|V| - 1), the time is O(|V| |V|) (insensitive to density).

Using a min-heap to store best known paths, we spend O(log(|V|)) time selecting each path, but an update takes O(log(|V|)). This means the total time complexity is O(|V|log(|V|) + |E|log(|V|)).

In the case of dense graphs, this is worse: O(|V| |V| log(|V|)). We're trying to keep the heap organized to allow fast extract, but there are too many updates and the heap is changing too much iteration to iteration. OTOH, if the grpah is sparse, |E| is in O(|V|), so we reduce the time complexity to O(|V|log(|V|)).

A fib heap keeps the same extract time, but updates are O(1) amortized. Thus the time complexity is O(|V| log(|V|) + |E|). If the graph is dense, this is O(|V| |V|) (as good as naive). If the graph is sparse, this is O(|V|log(|V|)) (as good as bin heap).

This is useful if we do not know whether the graph is sparse or dense. However, I am not sure what the constants are on Fib heaps. If you know the density of the graph, I certainly expect using the appropriate of the first two ways is superior. Another thought: you could always speculatively execute both algorithms and see which finishes first :P

Anyway, I hope that's not too boring of me.

Edit: Grr. Asterisks.

I've been taking Roughgarden's Algorithms: Design and Analysis on Coursera and implementing the assignments in Racket. So in a sense, I identified the author's plight with my own recent and current experiences.

One of the early lessons I learned is that designing functional versions of algorithms is a whole higher level of hard over and above implementing conventional versions. For me, getting to a functional or less imperative version is an iteration of an imperative preliminary design. There's a stage at which it is easier not to get bogged down with data structures as values and instead let them be variables because it allows me to simplify the transition to the next state as (next-state). I don't have to add additional parameters to every function and coordinate them between functions.

I think that it easy to lose site of the fact that functional programming is supposed to make our lives easier by making it easier to reason about our code. Jumping from imperative pseudo-code to an idiomatic functional implementation is not easier for me to reason about than from the pseudo-code to It's direct implementation. And it's easier to reason from a direct implementation to a more functional one after I've got a feel for the direct implementation.

Probably it's an intellectual shortcoming on my part that functional implementations don't come readily to me when looking at imperative pseudo-code. I just have to live with the fact that even money says I am below average and will probably have to be naughty and use mutation sometimes.

I highly recommend Tim Roughgarden. Best class I've taken on Coursera by far.

I took Discrete Optimization. and it's pretty good, too. But I was way out of my depth The cool thing is though that it gave me anchors for a lot of the algorithms in Roughgarden- I look at the algorithm and see an application for it from that class

But Roughgarden is top drawer, too.

This isn't a problem; you can get around it by indirecting such "pointers" through an (immutable) array. When you want to reference something, make sure it's in the array and reference its location. When you want to "mutate" something, update the array with that thing's old entry pointing to its new value instead.

(Side note: the union-find algorithm is another interesting algorithm with the same "problem".)

I came across a similar solution when creating graphs with immutable data structures. Just store your nodes in an array, and describe the links between them as pairs of indexes. It does require that extra level of indirection, as hinted at in the "Dual Index Problem" article he links to. It is a bit odd, since you're basically reimplementing mutable pointers, where each operation returns a new set of memory.

I will add as a disclaimer that I'm somewhat new to the world of immutable data, and am still experimenting with this graph structure -- seeing how it scales with complicated data and relations.

"This isn't a problem"? You're so sure?

How many nodes are there? Let's presume there are a lot, otherwise the problem is trivial and speed doesn't matter anyway. So now you have an N-long immutable array that you are copying every time you want to change a node pointer? So you have changed the operation of pointer-changing from O(1) to O(N)? What does that do to the run time of the algorithm?

Also, your garbage velocity just went WAY up. What does this do for the runtime of your program generally?

1. Yes, I'm sure. See my reply below.

2. Excessive generation of garbage is a problem for functional languages in general; that is nothing specific to immutable arrays. Good compilers analyze object lifetimes and perform updates in-place when possible (e.g. Mercury does this). Even if your compiler doesn't do this, the garbage generated is constant or logarithmic with respect to the size of the array for most common immutable-array implementations.

So yes, I'm sure.

(The language implementation can of course engage in strategies to avoid a full array copy, that you as the user have little insight into, but these are often questionable as they slow down run time in other cases, and anyway, they can only mitigate this problem, which is not going to go away.)

Uh, most good implementations of languages with immutable arrays perform O(1) or O(log n) array updates. The two I use most, Mercury and Erlang, both do (O(1) and O(log n) respectively).

If you, the user, has "little insight" into the runtime of data structures you're using, you have bigger issues. Every standard library I've ever seen guarantees the asymptotic behavior of hash tables, sets, linked lists, etc. Immutable arrays are no different.

There is no reason to implement fibonacci heap for Dijkstra. Either you have relatively small data (up to 10 mil. nodes and edges) and normal heap is a way faster or you have bigger data and you should start thinking about things like A* where fibonacci heap is again useless.

You should note, that log n factor is so small (up to 40 on the biggest data you will ever see), that it can be easily dominated by other constant factors in implementation (like cache locality, ...).

Why would you not use A* from the beginning? It's a trivial extension to Dijkstras and is often orders of magnitude faster when an informative heuristic is available.

Also, both algorithms require identical data structures. After all, Dijkstras is just A* with a zero heuristic.

I do agree about the constant factor, though: it's likely that a binary heap would be faster on most data sets.

Keep in mind that functional programming is inherently less efficient than a random access machine[1]:

... And it is not easy to create their equally efficient general-purpose immutable counterparts. For purely functional languages, the worst-case slowdown is logarithmic in the number of memory cells used, because mutable memory can be represented by a purely functional data structure with logarithmic access time (such as a balanced tree).

In this case I think you could use (a pair of?) balanced trees to keep a mapping between the graph and the FIbonnaci heap, as well as for any "mutable" state you need to keep per node.

[1] http://en.wikipedia.org/wiki/Functional_programming#Efficien...

Fortunately, most functional languages (including Haskell) provide mutable random access arrays, so the slow-down is theoretical.

But then the code is not "pure" anymore and it even isn't different than the non-functional one.

My solution would be to make two changes. One to the fibbonacci heap and one to the graph that Dijkstra's is being used on.

I would have the pointers in the Dijkstra's graph be pointers to a mutable data structure (we do this to make sure there is only one place that we need to update per node in the fibbonacci heap and to localize our mutability and keep it out of the rest of our code).

I would then need to update the update code in the fibbonacci heap to take a general function which it will run on the nodes that are updated and which gets passed the original and updated nodes pointer. In the Dijkstra case you simply need to write a function which takes these pointers and has the mutable data structure and modifies it accordingly.

Thus you have kept your generality, confined your mutability and solved your problem at the expense of one pointer indirection and an added callback function (and the associated documentation).

This just proves that "pure" in this sense is a useless concept.

How so? All I'm seeing is that some speedups aren't possible. It still seems like a useful concept, particularly since "faster" was never one of its promises (maybe "easier to parallellize", though.).

Interestingly, in some cases it is actually faster. Modern hardware is peculiar in that respect.

But you can still provide a pure interface. If you implement Dijkstra in an imperative fashion, with lot's of mutations etc., as long as you don't have any external side effects (changing global variables or input variables) and return an immutable result, the function is pure for all practical purposes, even though it is implemented in an imperative fashion/style.

The funny thing about that? There's no such thing as a random access machine. Everything's O(log n).

Modern architectures don't even provide random access for more than a handful of kilobytes. Anything that doesn't fit in the L1 cache incurs access latency roughly proportional to the logarithm of its size.

The theoretical slow down only holds for a strict functional language. In a lazy evaluation strategy language the mutation that happens when a thunk is evaluated and replaced with the result makes the asymptotic claims invalid.

Can you show an example to support your claim (of the invalidity of the asymptotic claims with lazy evaluation)? Given that the lazy term is evaluated (and thus the mutation happening) only once, how does that enable one to avoid the log(n) slowdown of updating mappings?

> Immediately, I discovered that tree structures are more complicated in languages like Clojure that have immutable state. This is because changing a node requires rebuilding large parts of the tree.

This must have implications for implementing a Merkle tree as an immutable data structure. Anyone know how a bitcoin client in clojure might deal with this?

You'd use mutable state, either in the form of java collections, or atoms and family.

Even Haskell, where the State monad itself is actually immutable, provides an escape hatch into mutable state for when it's really, truly required. Just if you do use it when it's not, great shame shall be visited on you and your kin. By which I mean someone in #haskell will very politely point out how you could have done it purely.

Right, using a mutable structure is the easy way. But is there a way that would maintain the advantages of immutability? (eg memory-efficient and easy undo)

Are those truly the advantages of immutability?

Easy undo can be achieved just as easily by reversing operations, in many cases. No need to keep two copies of a potentially large structure when you could just keep the diff and reverse apply it. Oddly, I think typically you would do both. Keep a few large snapshots with small diffs between stages. That is digressing, though.

As for memory-efficiency... not sure how keeping many potentially large copies is more efficient than just modifying a single copy.

Now, I can agree that in many cases it is nice that it prevents you from worrying about race conditions across threads. Though, I'm also not convinced that immutable things are any better than traditional locking strategies. Is that race truly won?

One doesn't need to keep large copies, since immutability allows common substructures to be shared. (That said, it still seems rather unlikely that an immutable implementation will use less memory.)

That's not a property of immutability. That's a property of certain specialized data structures, see "Purely Functional Data Structures" for a start, implemented in clojure.

Well, it's a property of basically everything except arrays. So it's more than just "certain specialized data structures".

To be fair, many folks learn immutable data structures with the so called "persistent" data structures.

Which are cool, to be sure. They are far from new, however. So, I'll be a little surprised if it turns out they are a panacea.

At first blush it seems fine to me; updating a purely-functional tree requires work that's O(depth of tree), and updating a Merkle tree also requires work that's O(depth of tree). I think that means that using a purely-functional structure to implement a Merkle tree requires at most a constant factor of extra time/space over using a mutable tree.

In the opposite direction of pragmatically avoiding "fancy" data-structures, you can find a description of a persistent min-heap via Brodal queues in Chris Okasaki's wonderful book _Purely Functional Data Structures_. Otherwise, it doesn't make too much sense to use the Fibonacci heap algorithm without mutable update.

Speaking of, I had Mr. Brodal in my algo class. That man thinks in algorithms. He can make it seem so intuitive and easy to understand, when he's explain it at the blackboard. That lasts until you're trying to implement it and get bogged down by details. I miss the algorithm classes.

That's so cool! Any fun stories or obscure algorithms that came up?

And zippers are just beyond cool. You can formally take a derivative of data structure to find one.

List is L(a) = 1 + a L(a) (≡ Nil or Cons(a, L(a))

L(a) = 1/(1-a)

L'(a) = 1/(1-a)^2 = L(a)L(a) (≡ Pair of L(a) and L(a))

This is our zipper. One list holding for the tail and one list to remember how we get to where we are.

This one is trivial, but for e.g. trees it will be utterly fascinating.

This series of articles goes into more detail: http://chris-taylor.github.io/blog/2013/02/10/the-algebra-of... . He doesn't get to derivatives until the third part, but it's all (IMO) fascinating, and the concepts of the first two are necessary to understand the calculus part.

Why does 1/(1-a)^2 = L(a)L(a) ?

I'm not familiar with this calculus of data structures, but I'd surmise it's because L(a) = 1 + a L(a) implies that (1 - a) L(a) = 1.

Because 1/(1-a) is L(a)

You may be interested in this paper www.cs.ox.ac.uk/ralf.hinze/publications/ICFP01.pdf. It solves Dijkstra's algorithm as an example for the implementation of a priority search queue data structure, which is very similar to a priority search tree. These structures act as a heap on one index and a search tree on a second index. So you get both efficient update and find-min.

I think your link is the perfect answer to the problem highlighted in the post (which is, incidentally, a problem I was facing myself, so thank you!). There's also an Haskell implementation of priority search queues: http://hackage.haskell.org/package/PSQueue-1.1/docs/Data-PSQ...

Heaps are the the most elegant data structure, IMO.

So many ways to prioritize. So many ways to support the same basic operations, with different tradeoffs.

My semi-outdated site: http://leekillough.com/heaps/

I love that page! I link to it from my own semi-outdated page: http://theory.stanford.edu/~amitp/GameProgramming/Implementa...

I in turn love your page -- so useful how you show all the data structures working together to enable a larger task. Bookmarked.

How about adding new elements with lower keys instead of decreasing the existing elements? Then, once we count that more than half of the n elements are duplicates, we could spend O(n log n) operations cleaning up the heap by recreating it.

Fibonacci heaps are very interesting beasts, but implementation is slightly complicated. For C++ developers there's one in Boost that is not so easy to use quickly.

I've recently put an alternative one in C++11 here: https://github.com/beniz/fiboheap

Ah, I went on this journey a decade or so ago when building a social network - "how are we connected" type feature, and tried to go down the same route with a Fibonacci heap - and eventually threw in the towel and implemented A* , then D* a while later when scale started to be an issue.

Why can't the Fibonacci heap node store a pointer to the graph node and when it is updated, use that pointer to store the new pointer to the heap node in the graph node?

Someone else mentioned this below, but it deserves repeating: Okasaki's Purely Functional Data Structures is definitely worth reading.

Why was the title changed? It was originally the same as the post's; "The Fibonacci heap ruins my life".

From the guidelines: "Otherwise please use the original title, unless it is misleading or linkbait." I think the original title qualifies as at least one of those.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact