
Ask HN: What are your favorite algorithms? - EFruit
Post any algorithms you think are cool, inventive, or useful.
PDF links or blog posts are appreciated.<p>My current favorite is SWIM.
https:&#x2F;&#x2F;www.cs.cornell.edu&#x2F;~asdas&#x2F;research&#x2F;dsn02-swim.pdf
======
theemathas
The Fastest and Shortest Algorithm for All Well-Defined Problems:
[https://arxiv.org/abs/cs/0206022](https://arxiv.org/abs/cs/0206022)

Abstract: An algorithm M is described that solves any well-defined problem p
as quickly as the fastest algorithm computing a solution to p, save for a
factor of 5 and low-order additive terms. M optimally distributes resources
between the execution of provably correct p-solving programs and an
enumeration of all proofs, including relevant proofs of program correctness
and of time bounds on program runtimes. M avoids Blum's speed-up theorem by
ignoring programs without correctness proof. M has broader applicability and
can be faster than Levin's universal search, the fastest method for inverting
functions save for a large multiplicative constant. An extension of Kolmogorov
complexity and two novel natural measures of function complexity are used to
show that the most efficient program computing some function f is also among
the shortest programs provably computing f.

The catch is that the constants in the big-O are enormous.

~~~
zelah
I vote twice for this one!

~~~
palashkulsh
Any laymen explanation ? Not able to grasp the concept.

~~~
theemathas
It does two things concurrently: (1) search for an algorithm along with proofs
for its correctness and time upper bound (2) run the fastest algorithm found
so far, possibly aborting an slower algorithm when a faster one is found.

Everything is scheduled so it's asymptotically optimal. However, there's an
added cost to search for algorithms. This cost is exponential in the proof
length, but constant for each problem since it is independent from the inputs.
Of course, this constant is ridiculously huge.

~~~
AstralStorm
Funny thing is, you can offload most of this to compile time, which is what
Isabelle is doing, but then you miss out on guaranteed optimal runtime. Also
the proof database may not be complete anyway.

------
DDR0
Given the head of a linked list, how do you determine if it loops? eg, an
l-shaped list is easy to determine - you simply process each element in the
list until you find one without a subsequent element. But what if it's a
9-shaped linked list? You'll never run out of elements, so the best you could
seem to do would be to store a reference to each element and check against all
references to see if you've found a duplicate.

There's a way to do it in O(1) space, though.

If you start off two runners in the list, and each "step" move one twice and
the other only once, if there's a loop they will eventually run into each
other and be processing the same element. Simple, elegant, and nothing I'd
have ever thought of. :)

~~~
sherincall
I like my solution better:

The list node contains a pointer and some data, most likely another pointer,
or a primitive type, or a struct of primitive types. Which means, that on all
modern systems, a list node is aligned to at least 16 bits, much more likely
32 bits. That means that the pointer to the node always has the last bit set
to zero. So:

    
    
        bool containsCycle(node_t *root)
        {
            node_t *p = root;
            bool ret = false;
            if (!root) return false; // Empty list
    
            while (p->next && !ret)
            {
                if ((uintptr_t)p->next & 1)
                {
                    ret = true;
                    break;
                }
                node_t *q = p->next;
                p->next = (uintptr_t)p->next | 1;
                p = q;
            }
        
            // Reset all pointers
            p = root;
            while (p->next && ((uintptr_t)p->next & 1))
            {
                p->next = (uintptr_t)p->next & ~1;
                p = p->next;
            }
            return ret;
        }
    

:)

~~~
huhtenberg
Hacky, trashes CPU cache, the code itself is on a sloppy side and, most
importantly, it lacks the elegance of the original solution.

So I beg to differ, it's not really "better" by any metric.

~~~
sherincall
It was tongue-in-cheek (see the smiley face), your critiques are all spot on,
but there is one thing where it is objectively better:

Tortoise and hare can only detect that a cycle exists. This hack will tell you
exactly which node the cycle starts on. That means that if you want to e.g.
repair a list, you can just cut that last link and set it to either null or
root. Granted, this wasn't the original problem.

~~~
bfors
Tortoise and hare can actually tell you which node the cycle starts on.

    
    
      1 -> 2 -> 3 -> 4 -> 5 -> 6
                ^              |
                |______________|
    
      Step:       0 1 2 3 4
      Tortoise:   1 2 3 4 5 
      Hare:       1 3 5 3 5 <- pointers reference same node, cycle exists
    

Then create another tortoise pointer at the head, iterate until both tortoises
point to the same node.

    
    
      Step:       0 1 2
      Tortoise:   5 6 3
      Tortoise2:  1 2 3 <-cycle starts at 3

------
wmu
1\. Lempel-Ziv compression algorithms (both LZ77 and LZ78). They are
practical, even if one code their basic versions will get good compression
ratio. Moreover, they opened the whole big chapter of data compression (LZW,
LZSS and more). 2\. The algorithm behind rsync. Nice one. 3\. Jarvis algorithm
for finding convex hull. 4\. Speaking of data structures I like xor-linked
list, it's neat.

[1]
[https://en.wikipedia.org/wiki/LZ77_and_LZ78](https://en.wikipedia.org/wiki/LZ77_and_LZ78)
[2]
[https://en.wikipedia.org/wiki/Rsync#Algorithm](https://en.wikipedia.org/wiki/Rsync#Algorithm)
[3]
[https://en.wikipedia.org/wiki/Gift_wrapping_algorithm](https://en.wikipedia.org/wiki/Gift_wrapping_algorithm)
[4]
[https://en.wikipedia.org/wiki/XOR_linked_list](https://en.wikipedia.org/wiki/XOR_linked_list)

~~~
majewsky
> 2\. The algorithm behind rsync.

Do you mean the rolling checksum part?

~~~
wmu
All parts, I like how the simple parts were composed to built pretty
complicated algorithm.

~~~
The_Hoff
Yeah, I assume majewsky is wondering if you mean just the rolling checksum
part because that's the most complex part. Rsync, as a whole, is certainly one
of my favorites and the algorithm(s) it uses are some of the ones I most often
call explicitly.

------
cperciva
I feel I have to mention the algorithm for matching with mismatches which I
published in my thesis
([http://www.daemonology.net/papers/thesis.pdf](http://www.daemonology.net/papers/thesis.pdf)),
simply because it gave me a legitimate opportunity to use the phrase "Fourier
Transform of the FreeBSD kernel".

------
toomanybeersies
As basic as it is, and completely uninventive, I've always loved Dijkstra's
algorithm.

It was probably the algorithm that cemented my love of computer science, it's
such an elegant solution to a problem and so simple once you understand it.

~~~
taeric
Why "uninventive?"

~~~
samcal
I believe GP meant that picking Dijkstra's is an uninventive answer to the
question of a favorite algorithm, since it's very well-known.

~~~
taeric
Ah that makes sense.

------
daurnimator
The burrows-wheeler transform is pretty interesting
[https://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transf...](https://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transform).
It's behind the bzip compression format.

~~~
psi-squared
This is related to my current favourite algorithm: Because the BWT is closely
related to the suffix tree of the original string, there's an algorithm to
search for a substring of length 'm' in a BWT-ed string of length 'n' in O(m
log n) time!

The one downside is that you have to pre-process the string, which takes O(n)
time and between 5n and 9n space depending on exactly how you do it. But after
that, you can do as many searches as you want practically "for free".

There's a paper outlining the various algorithms available here: (PDF)
[http://www.cosc.canterbury.ac.nz/research/reports/HonsReps/2...](http://www.cosc.canterbury.ac.nz/research/reports/HonsReps/2002/hons_0205.pdf)

------
kul_
I have never seen anything more elegant than Disjoint Set Datastructure
[https://en.wikipedia.org/wiki/Disjoint-
set_data_structure](https://en.wikipedia.org/wiki/Disjoint-set_data_structure)

~~~
aryamaan
+1. I was in love with union data structure as well when I get to know about
them. And always used to feel good when I used to solve any algo problem with
it.

------
jnpatel
Bloom filters:
[https://en.wikipedia.org/wiki/Bloom_filter](https://en.wikipedia.org/wiki/Bloom_filter)

~~~
jones1618
Agreed. Bloom filters are amazingly useful for checking membership in a set
while taking much less space than a database or hash table. They achieve this
by being crazily "probabilistic", sacrificing a small amount of accuracy for
speed and compactness. Still, this sort of broken, degenerate hash table has
found a ton of uses in everything from spell checkers to distributed content
networks.

------
ewwertee2454
Metropolis-Hastings:

[https://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_al...](https://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm)

[http://michaeljflynn.net/2015/06/01/my-favorite-algorithm-
me...](http://michaeljflynn.net/2015/06/01/my-favorite-algorithm-metropolis-
hastings/)

~~~
useanalias
This is possibly my favorite as well. MCMC in general is pretty magical.

------
elihu
ROAM: Real-time Optimally Adapting Meshes
[https://graphics.llnl.gov/ROAM/roam.pdf](https://graphics.llnl.gov/ROAM/roam.pdf)

Whitted's recursive ray tracing algorithm
[https://en.wikipedia.org/wiki/Ray_tracing_(graphics)#Recursi...](https://en.wikipedia.org/wiki/Ray_tracing_\(graphics\)#Recursive_ray_tracing_algorithm)

PageRank

------
110011
Here's a rather surprising one. There's an algorithm to generate a uniform
random integer in the range 1..n with a KNOWN factorization. That means the
algorithm is like a regular random number generator except it will also tell
you how to factor it! Magic? yes! This always surprises people and i love
results like this.

Search for the paper by Adam Kalai on how to generate random factorized
integers easily. It uses log^2 n primality tests an average. The result is
originally by Eric Bach who managed it using only log n primality tests but
using a more complex algorithm.

------
gwicks56
Diffie-Hellman

One of those algorithms where you go "Shit so obvious and also so completely
genius that I could never have thought of it"

Also, the internet as we know it would not work without it

------
aserafini
Ukkonen's algorithm for linear time suffix tree construction. Knuth called it
'the algorithm of 1973'.

------
rbritton
Token Bucket. Very lightweight way to implement rate limiting.

PHP implementation: [https://github.com/bandwidth-throttle/token-
bucket](https://github.com/bandwidth-throttle/token-bucket)

~~~
Yokohiii
Didn't knew it by that name:

[http://nginx.org/en/docs/http/ngx_http_limit_req_module.html](http://nginx.org/en/docs/http/ngx_http_limit_req_module.html)

------
zelah
Linear Hybrid Cellular Automata:
[https://pdfs.semanticscholar.org/7835/161f253ab117a3666fa8e7...](https://pdfs.semanticscholar.org/7835/161f253ab117a3666fa8e74e008782a789cd.pdf)

Parallelizable and efficient method for producing high quality pseudo random
bits.

Designed to achieve maximal period (with 500 bit state the period of
repetition is one less than 2^500).

Can be run backwards or forwards. Running it backwards undoes running it
forwards and reproduces the pseudo random bits in reverse order.

------
NTDF9
Tomasulo's algorithm:
[https://en.wikipedia.org/wiki/Tomasulo_algorithm](https://en.wikipedia.org/wiki/Tomasulo_algorithm)

A modified version of it is used in most CPU register out-of-order execution

------
gjvc
Great question. Without a doubt, my favourite is the Burrows-Wheeler transform
(as used by bzip2)

[https://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transf...](https://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transform)

[https://www.youtube.com/watch?v=4n7NPk5lwbI](https://www.youtube.com/watch?v=4n7NPk5lwbI)

------
akkartik
When I first read of HashLife I had the most extraordinarily vivid dreams of
breaking all complexity barriers just by finding the right memoization scheme:
[http://www.drdobbs.com/jvm/an-algorithm-for-compressing-
spac...](http://www.drdobbs.com/jvm/an-algorithm-for-compressing-space-
and-t/184406478)

------
ryandrake
R* tree for indexing spatial data [1]. Also, if you enjoy spatial data
structures, you can't go wrong with Samet's[2] _ The Design and Analysis of
Spatial Data Structures_ and Applications of Spatial Data Structures_.

1:
[https://en.wikipedia.org/wiki/R*_tree](https://en.wikipedia.org/wiki/R*_tree)

2:
[https://www.cs.umd.edu/~hjs/design.html](https://www.cs.umd.edu/~hjs/design.html)

------
pedrorijo91
Randomized algorithms, specifically Monte Carlo (probably because it was the
first I explored), have fascinated me since I met them. The idea of having a
very simple/efficient method to look for a solution, instead of a very
complicated one, and still be able to have a near optimal solution is amazing.

([https://en.wikipedia.org/wiki/Monte_Carlo_algorithm](https://en.wikipedia.org/wiki/Monte_Carlo_algorithm))

------
bananicorn
I really like Fleury's algorithm[0] - I used it to generate more efficient
paths to draw on a canvas, for the purpose of displaying simple wireframe
models. It may not be the most efficient, but I really like the simplicity.

[0] [http://www.geeksforgeeks.org/fleurys-algorithm-for-
printing-...](http://www.geeksforgeeks.org/fleurys-algorithm-for-printing-
eulerian-path/)

------
110011
There are a lot of great answers here. Let me add another one I really love:
the Knuth Morris Pratt algorithm for string matching. There are many features
about it I especially love. First of all it takes a simple idea (when a
mismatch occurs it's often possible to shift the pattern a lot more than just
by one position) and takes it to it's logical conclusion (that of defining and
computing the so called border lengths of prefixes). Second, the analysis of
the precomputation step (the border lengths) is tricky, and amortised, but
plainly obvious once you see it.

It shows the importance of making a good definition and thinking differently
(for the run time analysis I thought for a while and failed to come up with an
argument why it runs in linear time, but once you see it differently it's
obvious :))

I only recently had the chance to understand this algorithm. Here's a little
(annotated) gist I made after understanding it:
[https://gist.github.com/quantumelixir/3c410dd4a0afe0f2514e0f...](https://gist.github.com/quantumelixir/3c410dd4a0afe0f2514e0f554cddab05)

------
pmalynin
Number Theoretic Transform aka Discrete Fourier Transform.

The algorithms for computing it are beautiful and have enabled much of the
technology we see around us today.

~~~
pitaj
This. FFT is crazy awesome.

------
haidrali
I really like majority voting algorithm to get majority number in array which
appears more then half of the elements, Its the most simplest and elegant
algorithm i have studied

[http://www.geeksforgeeks.org/majority-
element/](http://www.geeksforgeeks.org/majority-element/)

~~~
sameera_sy
I'm surprised how the solution doesn't have the hashmap implementation of it,
which gives out o(n) and is much more simple. But the algorithm is interesting
though!

~~~
haidrali
@sameera_sy thats the point hash map solution will take O(n) space complexity
as well consider a situation where you need memory efficient solution

------
drewmate
Ford-Fulkerson algorithm for finding the maximum flow across a graph:
[https://en.wikipedia.org/wiki/Ford%E2%80%93Fulkerson_algorit...](https://en.wikipedia.org/wiki/Ford%E2%80%93Fulkerson_algorithm)

The algorithm itself is interesting enough, but when applied to bipartite
graphs, you can solve some tough problems very efficiently. For instance, one
of my favorites is how Santa could possibly match a million kids with a
million different gifts in order to maximize happiness. It turns out you
structure the problem as a bipartite graph with nodes on one side for gifts
and children on the other, and Ford-Fulkerson can guarantee the best matching
in polynomial time (no small feat, given how many possible combinations there
are!)

------
syntaxing
I've been diving into a bunch of the newer (computational) optimization
algorithms but the old school ones that they teach in school is still
extremely interesting and relevant to me. I've always been a huge fan of
"Combinatorial optimization" [1]. I saw someone else mention this but the
simplex algorithm is awesome as well! [2]

[1]
[https://en.wikipedia.org/wiki/Combinatorial_optimization](https://en.wikipedia.org/wiki/Combinatorial_optimization)
[2] [http://gizmodo.com/5934150/the-algorithm-that-controls-
your-...](http://gizmodo.com/5934150/the-algorithm-that-controls-your-life)

------
deepnotderp
Also, fft gets an obligatory mention

~~~
ewwertee2454
Cooley-Tukey algorithm.

~~~
deepnotderp
Split-Radix

------
jones1618
Cuckoo Hashing:
[http://www.lkozma.net/cuckoo_hashing_visualization/](http://www.lkozma.net/cuckoo_hashing_visualization/)

"An elegant method for resolving collisions in hash tables... It has the rare
distinction of being easy to implement and efficient both in theory and
practice."

It does an amazing job of packing values into N hash tables so there's very
little dead space but lookups are very fast.

------
ww520
[https://en.wikipedia.org/wiki/Extendible_hashing](https://en.wikipedia.org/wiki/Extendible_hashing).

------
sAbakumoff
1\. Karatsuba algorithm for fast multiplication. 2\. Heap's algorithm for
generating permutations.

mostly because they clearly demo the power of human brain.

~~~
110011
I recently made the observation that Karatsuba found his multiplication
algorithm roughly ten years ahead of Strassen's matrix multiplication alg. And
given that the crucial idea (using identities to save one multiplication) is
already beautifully captured in Karatsuba's it takes away a little from the
latter :)

------
jetti
My favorite is the Hindley-Milner type system algorithm:
[https://en.wikipedia.org/wiki/Hindley%E2%80%93Milner_type_sy...](https://en.wikipedia.org/wiki/Hindley%E2%80%93Milner_type_system#Towards_an_algorithm)

It is what is used in ML based languages to infer type and apparently Haskell
uses it a bit too

------
taeric
Knuth's DLX algorithm. Uses what he calls dancing links to perform
backtracking in basically constant space with a clever trick on doubly linked
lists.

He recently updated his implementation. Is very fun to read his notes on his
implementation. In particular, he discusses things he had wrong.

If you like poor JavaScript implementations, I can post mine again.

------
Mr_P
Simplex. If you can transform your NP hard optimization problem into an LP,
simplex can often work like magic.

~~~
nhaehnle
If you can transform your NP hard optimization problem into an LP, you have
proven P = NP because LPs can be solved in polynomial time.

That said, many combinatorial optimization problems that look quite similar to
NP hard problems have very nice and efficient LP formulations, and for many NP
hard problems, _integer_ programming-based methods (which in the end mostly
solve LP relaxations) are among the best algorithms. For example, planar TSP
can be solved for tens or even hundreds of thousands of nodes using the LP
relaxation, branch, and cut tool set.

Modern LP solvers don't just use the Simplex method though. I'm very partial
to the Simplex method myself, but in practice, Interior Point Methods are
often used.

~~~
110011
Adding to what you say, OP might have meant the use of LP solvers in branch
and bound type algorithms to solve hard problems after recasting them as
integer programs.

------
santaclaus
Kalman filter!

------
notforgot
It looks like SWIM scales better than Raft. Are there well-known production
systems that use it?

~~~
cube2222
SWIM is in no way an alternative to Raft, and vice-versa. SWIM is a membership
algorithm, Raft is a consensus algorithm. They are often used together. (a
SWIM alternative is plain Gossip and a Raft alternative is Paxos)

------
rurban
Cheney's two finger garbage collector with forwarding pointers. Still beating
every other GC out there by factor of 10, if you can afford 2x memory

------
useanalias
Contrastive divergence for RBMs. Incredibly simple, but absolutely beautiful
how it modifies the energy surface. Also, just in general a cool use of Gibbs
sampling.

------
nojvek
Convolutional neural nets and backprop. That stuff is magical.

I also was fascinated as shit when I learnt about minimax and alpha beta
search.

Game theory and neural nets have always fascinated me.

------
kovrik
HyperLogLog

~~~
110011
I also like the moral predecessor of this algorithm (Morris counting algo I
think it's called) which solves the simpler problem of estimating the number
of elements in a stream with loglog space. Curiously it appears as an exercise
in CLRS.

------
wfunction
I find the DC3/skew algorithm for linear-time suffix array construction
mindblowingly clever!

------
pizza
AIXI

~~~
AstralStorm
Nonexistent.

------
wwwhatcrack
I generally appreciate divide and conquer algorithms of any type

------
jakeogh
Lamport's bakery algorithm is pretty neat.

------
CalChris
Someone already mentioned Tomosulo.

RSA.

------
deepnotderp
Backpropagation

------
chaotic-good
Sequitur

