Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What are your favorite algorithms?
163 points by EFruit on Jan 29, 2017 | hide | past | web | favorite | 89 comments
Post any algorithms you think are cool, inventive, or useful. PDF links or blog posts are appreciated.

My current favorite is SWIM. https://www.cs.cornell.edu/~asdas/research/dsn02-swim.pdf

The Fastest and Shortest Algorithm for All Well-Defined Problems: https://arxiv.org/abs/cs/0206022

Abstract: An algorithm M is described that solves any well-defined problem p as quickly as the fastest algorithm computing a solution to p, save for a factor of 5 and low-order additive terms. M optimally distributes resources between the execution of provably correct p-solving programs and an enumeration of all proofs, including relevant proofs of program correctness and of time bounds on program runtimes. M avoids Blum's speed-up theorem by ignoring programs without correctness proof. M has broader applicability and can be faster than Levin's universal search, the fastest method for inverting functions save for a large multiplicative constant. An extension of Kolmogorov complexity and two novel natural measures of function complexity are used to show that the most efficient program computing some function f is also among the shortest programs provably computing f.

The catch is that the constants in the big-O are enormous.

It is only an existential proof, the algorithm has never been implemented. The closest that comes to it is the proof searcher in Isabelle. (blast method)

I vote twice for this one!

Any laymen explanation ? Not able to grasp the concept.

It does two things concurrently: (1) search for an algorithm along with proofs for its correctness and time upper bound (2) run the fastest algorithm found so far, possibly aborting an slower algorithm when a faster one is found.

Everything is scheduled so it's asymptotically optimal. However, there's an added cost to search for algorithms. This cost is exponential in the proof length, but constant for each problem since it is independent from the inputs. Of course, this constant is ridiculously huge.

Funny thing is, you can offload most of this to compile time, which is what Isabelle is doing, but then you miss out on guaranteed optimal runtime. Also the proof database may not be complete anyway.

First you vote once, and then you vote again.

@theemathas thanks, @pwdisswordfish - nice one :D

Given the head of a linked list, how do you determine if it loops? eg, an l-shaped list is easy to determine - you simply process each element in the list until you find one without a subsequent element. But what if it's a 9-shaped linked list? You'll never run out of elements, so the best you could seem to do would be to store a reference to each element and check against all references to see if you've found a duplicate.

There's a way to do it in O(1) space, though.

If you start off two runners in the list, and each "step" move one twice and the other only once, if there's a loop they will eventually run into each other and be processing the same element. Simple, elegant, and nothing I'd have ever thought of. :)

I like my solution better:

The list node contains a pointer and some data, most likely another pointer, or a primitive type, or a struct of primitive types. Which means, that on all modern systems, a list node is aligned to at least 16 bits, much more likely 32 bits. That means that the pointer to the node always has the last bit set to zero. So:

    bool containsCycle(node_t *root)
        node_t *p = root;
        bool ret = false;
        if (!root) return false; // Empty list

        while (p->next && !ret)
            if ((uintptr_t)p->next & 1)
                ret = true;
            node_t *q = p->next;
            p->next = (uintptr_t)p->next | 1;
            p = q;
        // Reset all pointers
        p = root;
        while (p->next && ((uintptr_t)p->next & 1))
            p->next = (uintptr_t)p->next & ~1;
            p = p->next;
        return ret;

Hacky, trashes CPU cache, the code itself is on a sloppy side and, most importantly, it lacks the elegance of the original solution.

So I beg to differ, it's not really "better" by any metric.

It was tongue-in-cheek (see the smiley face), your critiques are all spot on, but there is one thing where it is objectively better:

Tortoise and hare can only detect that a cycle exists. This hack will tell you exactly which node the cycle starts on. That means that if you want to e.g. repair a list, you can just cut that last link and set it to either null or root. Granted, this wasn't the original problem.

Tortoise and hare can actually tell you which node the cycle starts on.

  1 -> 2 -> 3 -> 4 -> 5 -> 6
            ^              |

  Step:       0 1 2 3 4
  Tortoise:   1 2 3 4 5 
  Hare:       1 3 5 3 5 <- pointers reference same node, cycle exists
Then create another tortoise pointer at the head, iterate until both tortoises point to the same node.

  Step:       0 1 2
  Tortoise:   5 6 3
  Tortoise2:  1 2 3 <-cycle starts at 3

An implementation-specific hack. Neat, but hack nonetheless.

This algorithm is popularly known as "Floyd's tortoise and the hare algorithm".

This can also be used to determine the period of Linear Congruential PRNGs, i.e those in form of

    X(n+1) = (A * X(n) + B) mod C
which happens to be what many stdlib versions of rand() are.


Here's a visual representation of it:


Really cool!

Just ran it through on the office whiteboard, colour me impressed, really novel!

Did have to remind myself to start the double-jumper off before the single-pointer to prevent a short-circuit, but that's a great solution to cyclical lists.

Where did you learn about this?

This one is extremely popular in the interviewing circles thanks to CTCI.

Well... that's rather Ingenious.

1. Lempel-Ziv compression algorithms (both LZ77 and LZ78). They are practical, even if one code their basic versions will get good compression ratio. Moreover, they opened the whole big chapter of data compression (LZW, LZSS and more). 2. The algorithm behind rsync. Nice one. 3. Jarvis algorithm for finding convex hull. 4. Speaking of data structures I like xor-linked list, it's neat.

[1] https://en.wikipedia.org/wiki/LZ77_and_LZ78 [2] https://en.wikipedia.org/wiki/Rsync#Algorithm [3] https://en.wikipedia.org/wiki/Gift_wrapping_algorithm [4] https://en.wikipedia.org/wiki/XOR_linked_list

I'm a big fan of Huffman coding myself, but Lempel-Ziv definitely gets points for being simple and intuitive.

> 2. The algorithm behind rsync.

Do you mean the rolling checksum part?

All parts, I like how the simple parts were composed to built pretty complicated algorithm.

Yeah, I assume majewsky is wondering if you mean just the rolling checksum part because that's the most complex part. Rsync, as a whole, is certainly one of my favorites and the algorithm(s) it uses are some of the ones I most often call explicitly.

I feel I have to mention the algorithm for matching with mismatches which I published in my thesis (http://www.daemonology.net/papers/thesis.pdf), simply because it gave me a legitimate opportunity to use the phrase "Fourier Transform of the FreeBSD kernel".

As basic as it is, and completely uninventive, I've always loved Dijkstra's algorithm.

It was probably the algorithm that cemented my love of computer science, it's such an elegant solution to a problem and so simple once you understand it.

I was always thought it was a greedy way of doing a breadth first search.

Why "uninventive?"

I believe GP meant that picking Dijkstra's is an uninventive answer to the question of a favorite algorithm, since it's very well-known.

Ah that makes sense.

The burrows-wheeler transform is pretty interesting https://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transf.... It's behind the bzip compression format.

This is related to my current favourite algorithm: Because the BWT is closely related to the suffix tree of the original string, there's an algorithm to search for a substring of length 'm' in a BWT-ed string of length 'n' in O(m log n) time!

The one downside is that you have to pre-process the string, which takes O(n) time and between 5n and 9n space depending on exactly how you do it. But after that, you can do as many searches as you want practically "for free".

There's a paper outlining the various algorithms available here: (PDF) http://www.cosc.canterbury.ac.nz/research/reports/HonsReps/2...

I have never seen anything more elegant than Disjoint Set Datastructure https://en.wikipedia.org/wiki/Disjoint-set_data_structure

+1. I was in love with union data structure as well when I get to know about them. And always used to feel good when I used to solve any algo problem with it.

Another fan of union-find here as well. Simple, elegant and magical.

Also best (only?) practical use of the (inverse) Ackermann function!

Agreed. Bloom filters are amazingly useful for checking membership in a set while taking much less space than a database or hash table. They achieve this by being crazily "probabilistic", sacrificing a small amount of accuracy for speed and compactness. Still, this sort of broken, degenerate hash table has found a ton of uses in everything from spell checkers to distributed content networks.

This is possibly my favorite as well. MCMC in general is pretty magical.

ROAM: Real-time Optimally Adapting Meshes https://graphics.llnl.gov/ROAM/roam.pdf

Whitted's recursive ray tracing algorithm https://en.wikipedia.org/wiki/Ray_tracing_(graphics)#Recursi...


Here's a rather surprising one. There's an algorithm to generate a uniform random integer in the range 1..n with a KNOWN factorization. That means the algorithm is like a regular random number generator except it will also tell you how to factor it! Magic? yes! This always surprises people and i love results like this.

Search for the paper by Adam Kalai on how to generate random factorized integers easily. It uses log^2 n primality tests an average. The result is originally by Eric Bach who managed it using only log n primality tests but using a more complex algorithm.


One of those algorithms where you go "Shit so obvious and also so completely genius that I could never have thought of it"

Also, the internet as we know it would not work without it

Ukkonen's algorithm for linear time suffix tree construction. Knuth called it 'the algorithm of 1973'.

Token Bucket. Very lightweight way to implement rate limiting.

PHP implementation: https://github.com/bandwidth-throttle/token-bucket

Linear Hybrid Cellular Automata: https://pdfs.semanticscholar.org/7835/161f253ab117a3666fa8e7...

Parallelizable and efficient method for producing high quality pseudo random bits.

Designed to achieve maximal period (with 500 bit state the period of repetition is one less than 2^500).

Can be run backwards or forwards. Running it backwards undoes running it forwards and reproduces the pseudo random bits in reverse order.

Tomasulo's algorithm: https://en.wikipedia.org/wiki/Tomasulo_algorithm

A modified version of it is used in most CPU register out-of-order execution

Great question. Without a doubt, my favourite is the Burrows-Wheeler transform (as used by bzip2)



When I first read of HashLife I had the most extraordinarily vivid dreams of breaking all complexity barriers just by finding the right memoization scheme: http://www.drdobbs.com/jvm/an-algorithm-for-compressing-spac...

R* tree for indexing spatial data [1]. Also, if you enjoy spatial data structures, you can't go wrong with Samet's[2] _ The Design and Analysis of Spatial Data Structures_ and Applications of Spatial Data Structures_.

1: https://en.wikipedia.org/wiki/R*_tree

2: https://www.cs.umd.edu/~hjs/design.html

Randomized algorithms, specifically Monte Carlo (probably because it was the first I explored), have fascinated me since I met them. The idea of having a very simple/efficient method to look for a solution, instead of a very complicated one, and still be able to have a near optimal solution is amazing.


I really like Fleury's algorithm[0] - I used it to generate more efficient paths to draw on a canvas, for the purpose of displaying simple wireframe models. It may not be the most efficient, but I really like the simplicity.

[0] http://www.geeksforgeeks.org/fleurys-algorithm-for-printing-...

There are a lot of great answers here. Let me add another one I really love: the Knuth Morris Pratt algorithm for string matching. There are many features about it I especially love. First of all it takes a simple idea (when a mismatch occurs it's often possible to shift the pattern a lot more than just by one position) and takes it to it's logical conclusion (that of defining and computing the so called border lengths of prefixes). Second, the analysis of the precomputation step (the border lengths) is tricky, and amortised, but plainly obvious once you see it.

It shows the importance of making a good definition and thinking differently (for the run time analysis I thought for a while and failed to come up with an argument why it runs in linear time, but once you see it differently it's obvious :))

I only recently had the chance to understand this algorithm. Here's a little (annotated) gist I made after understanding it: https://gist.github.com/quantumelixir/3c410dd4a0afe0f2514e0f...

Number Theoretic Transform aka Discrete Fourier Transform.

The algorithms for computing it are beautiful and have enabled much of the technology we see around us today.

This. FFT is crazy awesome.

I really like majority voting algorithm to get majority number in array which appears more then half of the elements, Its the most simplest and elegant algorithm i have studied


I'm surprised how the solution doesn't have the hashmap implementation of it, which gives out o(n) and is much more simple. But the algorithm is interesting though!

@sameera_sy thats the point hash map solution will take O(n) space complexity as well consider a situation where you need memory efficient solution

Ford-Fulkerson algorithm for finding the maximum flow across a graph: https://en.wikipedia.org/wiki/Ford%E2%80%93Fulkerson_algorit...

The algorithm itself is interesting enough, but when applied to bipartite graphs, you can solve some tough problems very efficiently. For instance, one of my favorites is how Santa could possibly match a million kids with a million different gifts in order to maximize happiness. It turns out you structure the problem as a bipartite graph with nodes on one side for gifts and children on the other, and Ford-Fulkerson can guarantee the best matching in polynomial time (no small feat, given how many possible combinations there are!)

I've been diving into a bunch of the newer (computational) optimization algorithms but the old school ones that they teach in school is still extremely interesting and relevant to me. I've always been a huge fan of "Combinatorial optimization" [1]. I saw someone else mention this but the simplex algorithm is awesome as well! [2]

[1] https://en.wikipedia.org/wiki/Combinatorial_optimization [2] http://gizmodo.com/5934150/the-algorithm-that-controls-your-...

Also, fft gets an obligatory mention

Cooley-Tukey algorithm.


Cuckoo Hashing: http://www.lkozma.net/cuckoo_hashing_visualization/

"An elegant method for resolving collisions in hash tables... It has the rare distinction of being easy to implement and efficient both in theory and practice."

It does an amazing job of packing values into N hash tables so there's very little dead space but lookups are very fast.

1. Karatsuba algorithm for fast multiplication. 2. Heap's algorithm for generating permutations.

mostly because they clearly demo the power of human brain.

I recently made the observation that Karatsuba found his multiplication algorithm roughly ten years ahead of Strassen's matrix multiplication alg. And given that the crucial idea (using identities to save one multiplication) is already beautifully captured in Karatsuba's it takes away a little from the latter :)

My favorite is the Hindley-Milner type system algorithm: https://en.wikipedia.org/wiki/Hindley%E2%80%93Milner_type_sy...

It is what is used in ML based languages to infer type and apparently Haskell uses it a bit too

Knuth's DLX algorithm. Uses what he calls dancing links to perform backtracking in basically constant space with a clever trick on doubly linked lists.

He recently updated his implementation. Is very fun to read his notes on his implementation. In particular, he discusses things he had wrong.

If you like poor JavaScript implementations, I can post mine again.

Simplex. If you can transform your NP hard optimization problem into an LP, simplex can often work like magic.

If you can transform your NP hard optimization problem into an LP, you have proven P = NP because LPs can be solved in polynomial time.

That said, many combinatorial optimization problems that look quite similar to NP hard problems have very nice and efficient LP formulations, and for many NP hard problems, integer programming-based methods (which in the end mostly solve LP relaxations) are among the best algorithms. For example, planar TSP can be solved for tens or even hundreds of thousands of nodes using the LP relaxation, branch, and cut tool set.

Modern LP solvers don't just use the Simplex method though. I'm very partial to the Simplex method myself, but in practice, Interior Point Methods are often used.

Adding to what you say, OP might have meant the use of LP solvers in branch and bound type algorithms to solve hard problems after recasting them as integer programs.

Kalman filter!

It looks like SWIM scales better than Raft. Are there well-known production systems that use it?

SWIM is in no way an alternative to Raft, and vice-versa. SWIM is a membership algorithm, Raft is a consensus algorithm. They are often used together. (a SWIM alternative is plain Gossip and a Raft alternative is Paxos)

Cheney's two finger garbage collector with forwarding pointers. Still beating every other GC out there by factor of 10, if you can afford 2x memory

Contrastive divergence for RBMs. Incredibly simple, but absolutely beautiful how it modifies the energy surface. Also, just in general a cool use of Gibbs sampling.

Convolutional neural nets and backprop. That stuff is magical.

I also was fascinated as shit when I learnt about minimax and alpha beta search.

Game theory and neural nets have always fascinated me.


I also like the moral predecessor of this algorithm (Morris counting algo I think it's called) which solves the simpler problem of estimating the number of elements in a stream with loglog space. Curiously it appears as an exercise in CLRS.

I find the DC3/skew algorithm for linear-time suffix array construction mindblowingly clever!



I generally appreciate divide and conquer algorithms of any type

Lamport's bakery algorithm is pretty neat.

Someone already mentioned Tomosulo.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact