Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What's your favorite elegant/beautiful algorithm?
733 points by algobuddy on Oct 17, 2018 | hide | past | favorite | 491 comments



I always push my devs to really study the Bittorrent protocol. The elements of the protocol are all fairly easy to understand, but it's gorgeous to see how elegantly it solved a social problem rather than a computational problem. The tit-for-tat and fast-finish protocols are incredibly graceful ways to create virtuous cycles in a social/technical hybrid, and replaced very real vicious cycles in previous protocols.

Various TCP backoff algorithms are also a great study in how to approach problems with insufficient knowledge of the system under test, and building desirable properties (reliable in-order delivery) from a undesirable input (arbitrary packet loss, and loss in the face of congestion). I make all the kids try to implement their own, then walk through the history of TCP approaches to the modern day.


> The elements of the protocol are all fairly easy to understand, but it's gorgeous to see how elegantly it solved a social problem rather than a computational problem.

One more protocol / paper which fits this description, imo, is Bitcoin paper [0]. It's easy and accessible to all and elegently explains how to solve a complex problem with existing tools. Not sure it can be called as an 'algorithm'

[0] - https://bitcoin.org/bitcoin.pdf


BitTorrent is foundational to the rise of decentralized stores and sharing: IPFS, Dat Protocol, WebTorrent, Storj, Holochain, etc.

An Introduction to Kademlia DHT & How It Works

http://gleamly.com/article/introduction-kademlia-dht-how-it-...


DHT Algorithms are nice, the Chord is really simple and easy to understand.


Any suggestions where I can read about the BitTorrent protocol?


There is an official document on the protocol [0], but it doesn't go much into the details. Theory.org's resource [1] does a much better job.

[0] - http://www.bittorrent.org/beps/bep_0003.html

[1] - https://wiki.theory.org/index.php/BitTorrentSpecification


I implemented a BitTorrent client as a sophomore in college, available here: https://github.com/war1025/Torrent

It worked pretty well. Used it for several years until I started making real money and decided I could buy things rather than pirate them.

I had only been coding for a year or two at that point, so it is probably filled with lots of odd choices, but it also isn't super optimized like I would guess the more well known clients might be, and so might be easier to parse.


I'm the same way in that I haven't been using torrents in awhile. But a few legit things they're used for us Linux distros so if you ever feel like helping in that endeavor you can seed out some ISOs.


I would be interested in that too :)


I second this; Kademlia has always struck me as one of the most beautiful and elegant algorithms out there; Bittorrent is wonderful too.


Bencode is such a piece of junk though.


Extremely simple one, but my favorite is an algorithm for determining if two words are anagrams of each other:

The Fundamental Theorem of Arithmetic states: "every integer greater than 1 either is a prime number itself or can be represented as the product of prime numbers and that, moreover, this representation is unique, up to (except for) the order of the factors."[1]

So to determine that two words are anagrams of each other, you can assign each letter to a unique prime number (a = 2, b = 3, c = 5 etc.), then compute the product of those numbers, and if they're equal then those two words are anagrams.

[1] - https://en.wikipedia.org/wiki/Fundamental_theorem_of_arithme...


Nice in theory, but in practice you wouldn't implement it like that, especially if the words can be longer than your machine integer allows. Sorting and comparing is more elegant than invoking a BigNum library, imho (and has smaller footprint). This shows that theoretical elegance != implementation elegance.


I was curious how soon overflowing a native integer would come up. The "worst case" would be all "z"s (which map to 103), so how many characters does floor(log_103(2^n-1)) get you?

    int32  zzzz      (4 characters)
    uint32 zzzz      (4 characters)
    int64  zzzzzzzzz (9 characters)
    uint64 zzzzzzzzz (9 characters)
But that's the worst case, not many real words have several "z"s in them. How often are real words affected? I did some experiments with /usr/share/dict/words:

    1-( 36123/123115) = 70%  overflow int32
    1-( 39774/123115) = 68%  overflow uint32
    1-(117909/123115) = 4.2% overflow int64
    1-(118533/123115) = 3.7% overflow uint64


Hmm, nice result. That makes me want to map letters onto primes in order of frequency. So "e" is 2, "t" is 3, etc., see: http://pi.math.cornell.edu/~mec/2003-2004/cryptography/subs/...


At this point, you're yak shaving, where the sane thing to do is simply to count the number of times each letter occurs.

Better complexity-wise as well, both in terms of speed and storage: O(n) and O(1), respectively. [1]

The proposed algorithm uses O(n) storage (to store the immense product), and given that integer multiplication complexity is somewhere between O(n) and O(n^2), we end up[2] with at least O(n^2) runtime complexity!

[1] ...assuming the input data is less than 15 million terabytes in size; O(n log n), O(log n) respectively for arbitrary length input. Storing the number of input bytes takes O(log n) space, and integer increment can take O(size).

[2]https://en.wikipedia.org/wiki/Computational_complexity_of_ma...


OK, on my (longer at 235886 words) /usr/share/dict/words, I find that:

  sequential encoding: 21882 words overflow 2**64 (9.3%)
  frequency encoding: 2945 words overflow 2**64 (1.2%)
Some other trivia:

  mean #bits for those words over 64 bits: 
         seq = 72.0; freq = 69.3
  largest #bits: 
         seq = 115.4; freq = 101.0
  word w/ largest #bits: 
         seq = thyroparathyroidectomize [1]
         freq = pathologicopsychological [n/a]
[1] https://www.merriam-webster.com/medical/thyroparathyroidecto...


Now add support for Unicode. :)


Or, for better time complexity (with a bit of extra space) making a map of char -> frequency and comparing the results.


That's Fourier transform!


Funny. :) But a Fourier transform is reversible.


The title asked for beautiful/elegant algorithms, not efficient ones! :)


In computing, inefficient algorithms are not beautiful.


I wrote some comparison some time ago https://github.com/remusao/anagrams-bench/blob/master/README...

Unfortunately, as you suggest, the complexity of arbitrary precision multiplication kicks in and performance suffers.


One might also choose not to bring in a BigNum library if you don't need to find every anagram. Here's an exhaustive list of anagrams I found with sorting and comparing that a quick Fundamental Theorem implementation missed:

   [ 'basiparachromatin', 'Marsipobranchiata' ]
   [ 'configurationism', 'misconfiguration' ]
   [ 'constructionism', 'misconstruction' ]
   [ 'pericardiacophrenic', 'phrenicopericardiac' ]
   [ 'anatomicophysiologic', 'physiologicoanatomic' ]
   [ 'petrographically', 'pterylographical' ]


Note this is out of 20K anagrams.

Here's my code: https://github.com/brlewis/brlewis.github.io/blob/master/201...

On my machine, Fundamental Theorem of Arithmetic finished in 1.703 seconds, sorting letters in 1.954 seconds.


This one came up a lot when I was researching the most efficient way to generate anagrams, but the numbers are growing way too fast. In the end, a frequency map and comparing the frequencies worked best. My use case is a bit special, as I'm creating anagram sentences, so a Trie or similar things wouldn't work. Check out https://anagrams.io if you're interested.


I have a pet implementation of the frequency map that I'm overly fond of, for ascii strings:

  (1) keep an array of length 127 that you re-use and set to 0 between calls
  (2) for each character in the first string, increment the array at the character's index
  (3) for each character in the second string, decrement the array at the character's index
If you end up with all 0s, it's an anagram.


Better to use a hash map instead of an array. When iterating the second string, as soon as you see a character without an existing value it's not an anagram. If you finish the second string, check that all values are zero.

This way you only need to check the character values you actually use, and not all 127. It also generalizes trivially to larger character sets.


An array is simply an optimized specialization of a hash map.


His approach is the same as yours. As soon as the algorithm sees a character from second string where the value for that char in the array is zero, the second string is not an anagram of first string and can return false immediately.


Doesn't that mean in the end you have to check 127 values for if they are 0?

Or 64, if you store numbers as 32-bit integers and compare them as 64-bit using a union type.


You could also keep a counter of the number of non-zero entries and update on zero/non-zero transitions.


Given the constraints of the problem, it's the sanest thing to do.


Diffie–Hellman. I know that cryptography can get much fancier and more clever, but Diffie–Hellman took a concept my intuition told me was impossible and showed that it's possible in a really simple, elegant way. Learning about it was the first time I realized how beautiful the math behind computer science is.

It's also a great insight into just how fundamental the concept of computational complexity is.


And here is an elegant explanation of the Diffie-Hellman protocol https://www.youtube.com/watch?v=YEBfamv-_do


Another explanation that is not as technically accurate, but really clicked on me to understand how it's possible to establish private communication in the open, is the following:

You put your secret message in a box, put a lock on it that only you have the key to, and send it to the other party. They, unable to open it, put on a second lock of their own, and send it back. You remove your lock, leaving theirs, and once again send it to the other party. Finally, they remove their lock too and can open the box without anyone else having had that possibility.

What can also be inferred from this, is how DH is vulnerable to a man-in-the-middle attack. Someone involved in the delivery could pretend to you to be the other party and to them to be you.


Absolutely amazing. This is one of those instances when analogy is exact (mixing colors) and doesn't dilute the original motif (One way function).


> a concept my intuition told me was impossible and showed that it's possible in a really simple, elegant way

I agree but, to be fair, the key ingredient ("discrete logarithms are hard") is not simple at all.


I am so glad that I took Cryptography as an elective in college even though the course brought down my grades :)


So true. Securely exchanging keys is an abstract concept until you see those paint colors mixing!


I love the idea of how essentially we have never met and we're shouting across a crowded room but no one else can understand our conversation.


Link?



There's also this. The paint mixing analogy starts around 3 minutes in.

https://www.youtube.com/watch?v=YEBfamv-_do


That's awesome, thanks


Thank you for this. This probably ranks as the algorithm with the greatest coolness / simplicity ratio I've heard of.


+1 for Diffie-Hellman... and the paint analogy.


I was just blown away when my professor taught me this.


Second to this.

Totally blew my mind back then when I was trying to understand how asymmetrical cryptography works.


My first programming project was a tic-tac-toe game with a computer opponent. I painstakingly copy-pasted about a hundred nested `if` statements to check for a winner & decide the computer’s next move.

Several years later I saw a Matlab demo that did this by indexing the grid using values from a 3x3 magic square[1]. In a magic square, every row, column, and diagonal has the same sum. So checking for a winner was just checking if a player’s three moves added up to 15. Finding a space that would make the computer win, or block the human opponent from winning, was subtracting two moves from 15 and checking if that spot was available.

[1]: https://en.wikipedia.org/wiki/Magic_square


This is cool. I never actually realized that magic squares exhausted all of the sums to 15; in other words, there is no triplet which sums to 15 and is not already a row, column, or diagonal. No false positives, so to speak.

I wonder if this is true of larger magic squares.


It is not. There are indeed exactly 8 ways to pick three elements from the set [1,9] such that their sum equals 15. But for 4x4, there are 86 (!) ways to pick four elements from [1, 15] so that their sum is 34 (the magic constant for 4x4), whereas only ten of the combinations are used in construction of the square.

Some Python:

  from collections import Counter
  from itertools import combinations

  def number_of_sums(n, M):
    combs = combinations(range(1,n**2+1), n)
    sums = (sum(c) for c in combs)
    return Counter(sums)[M]

  print(number_of_sums(3, 15))
  print(number_of_sums(4, 34))


In high school I made a tic-tac-toe game that implemented what I thought was a clever trick for its has-winner? routine: starting with a 3x3 grid of ones, I multiplied each of its rows, columns, and diagonals by a unique prime number. This results in a board that internally looks something like

238, 22, 494

21, 10659, 39

665, 55, 1105

so checking for a winner was equivalent to checking if the gcd of a player’s spaces was greater than 1.

Much less efficient! But it looks like I was on the right track in searching for compact numerical alternatives to traversing the board and measuring strides.

Edit: fixed an arithmetic error


Nitpick: this only works if the player played exactly three moves (in the version of Tic Tac Toe I know you can play for up to 5 moves). Generalizing to 4 moves is fairly simple, to 5 moves less so; you can test all '3 from 5' combinations, which is only 10 possibilities, but I doubt the code to do this is easy to read and understand after the fact.


I'd disagree here; I still think it's quite elegant. For example, in python, it would look something like

  from itertools import combinations
  magic_square = [2,7,6,
                  9,5,1,
                  4,3,8]
  def did_user_win(moves):
    # Convert moves to magic square values
    magic_values = [magic_square[move] for move in moves]
    # Check if any sum of three moves equals 15
    for three_values in combinations(magic_values,3):
      if sum(three_values) == 15: return True
    return False


Point taken. I love itertools :D


I thought I had an original and clever solution to TTT... my solution was to define all winning combinations as sets of (x,y) coordinate pairs and each time a player made a move, check to see if any of the winning combinations was a subset of the player's set of coordinate pairs.


Oh wow. My mind is boggled. Boggled I say!


The Gale-Shapley Algorithm to solve the "Stable Marriage" problem. 2012 Nobel Prize in Economic Sciences for its wide-ranging use in medicine, education, and resource allocation. It's fairly easy to implement a basic version of it, feels intuitively obvious once explained, and has been applied to everything from organ transplants to student placement in elementary schools. Really, any place you have two groups where individual members have a preferential ranking of the other.

https://en.wikipedia.org/wiki/Stable_marriage_problem#Soluti...


I remember when this "Nobel Prize" was announced I immediately started reading the paper and trying to figure out how it could work in code. It seems like it could have amazing economic potential if put to a good use, but until your comment I hand't known of any real-world uses outside of organ transplantation. Very excited to see it being appreciated by more computer scientists! I need to look into it again for sure!


The organ transplant algorithm is not really that similar to the stable marriage algorithm. It's a different mechanism based on the housing allocation problem. See https://www.nber.org/papers/w10002

I think they are lumped together because they are both "mechanism design" problems that were studied by Roth and part of his prize.


Interesting! I knew that med school applicants were accepted on some sort of ranking and matching process but didn't realize there was a real algorithm behind it. This, or one similar, seems to be what's behind it.


There's no matching process at the beginning of medical school, but there is for assigning med school graduates to residencies. This variant of the problem is NP-hard, so there's no exact solution, but matching still works pretty well. https://web.stanford.edu/~alroth/papers/rothperansonaer.PDF has all the details.


When you are referring to this "variant", are you referring to maximal matching, or bipartite matching when the number of people and schools don't equal?


Not the parent, but I think they refer to the fact that with admissions, matches are permanent (once a student accepts a school's offer, said offer can't be rescinded), while Gale-Shapely requires provisional pairings until all the rounds have been gone through and a final result is available.


The issue is that couples want to be placed together. That makes everything trickier than if people just placed independently.


Kleinberg and Tardos start off their algorithms book with this problem, except they formulate it in terms of college juniors seeking internships and employers.


NYC public schools use a version of this too: https://www.nytimes.com/2014/12/07/nyregion/how-game-theory-...


The real shocker is that it took until 2012 for this pivotal algorithm to win a Nobel Prize. David Gale had passed by then! The Nobel committee must be backlogged to Hell and back.


The "2 watched literal" algorithm used in sat solvers (here is a random blog post I found by googling http://haz-tech.blogspot.com/2010/08/whos-watching-watch-lit... ).

The algorithm has many lovely features. It is very efficient -- it is used in basically every SAT solver with minimal modifications. It's not entirely trivial it works, particularly the backtracking part. It is a good example of how there are algorithms which are extremely hard to do functionally (the whole algorithm is about moving pointers around).

It's also a good example of practical Vs theoretical efficiency. In the worst case it is no more efficient than the algorithm it replaced, in practice it is hundreds of times more efficient. The "not moving back" feels like it should save at most half the time, it on fact speeds things up by often 10 times (this is because watches are often moved around until they reach some "boring" variables, where they lay undisturbed).


Awesome, thanks for mentioning this! I completely forgot about 2-literal watching and I hadn't ever thought about the idea that it'd be difficult to implement functionally. It's absolutely clever. And one other intuition I would like to add to it is that, from the way I see it at least, it is a cheap "detector" for when 3-SAT degenerates into 2-SAT, which finally becomes polynomial-time solvable -- which is exactly the kind of thing you want to have a detector for!


this is a neat way to avoid moving those pointers around: https://www.cs.kent.ac.uk/pubs/2010/2970/content.pdf


While this is very cute I'd be interested to know what kind of code it turns into -- I imagine it will either boil down to the same, or will be much slower (but I am happy to be impressed)


Fast inverse square root, sometimes referred to as Fast InvSqrt() or by the hexadecimal constant 0x5F3759DF, is an algorithm that estimates 1/√x, the reciprocal (or multiplicative inverse) of the square root of a 32-bit floating-point number x in IEEE 754 floating-point format.

https://en.m.wikipedia.org/wiki/Fast_inverse_square_root

I especially love the story around it.


It’s nice but it’s deprecated, we now have hardware support for that in CPUs.

PC: https://software.intel.com/sites/landingpage/IntrinsicsGuide...

ARM: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc....


This is how I tell people that Computer Science can still have magic to it.


Binary search. Very simple, incredibly powerful; can search on data or math function. It's the basis for other CS concepts.

JS implementation: https://gist.github.com/netgusto/90c8e0e7019a832cbf95eac58e1...


Regarding "very simple" - as I recall from some book, first bug-free implementation appeared only after several years after invention / initial description of the algorithm. From wikipedia: "A study published in 1988 shows that accurate code for it is only found in five out of twenty textbooks" (https://en.wikipedia.org/wiki/Binary_search_algorithm).


Someone did a blog about that book, and challenged people to do their own bug-free implementations. https://reprog.wordpress.com/2010/04/19/are-you-one-of-the-1...


It was in Programming Pearls, IIRC.


I like this version:

  low = -1
  high = arr.length

  while(high - low > 1){
    mid = (high + low)/2
    if(arr[mid] <= x) low = mid else high = mid
  }
This eliminates all branches from the body of the loop (gets compiled to conditional move). It's also more general because it can be used to find the right position to insert a new element in a sorted array.


and it's also a broken version.


Actually, I can definitely say that this version is not broken, i.e. I've formally proven it to be mathematically correct assuming that:

1. high and low are bigints (such as when using Python, for example)

2. the input array is sorted (needed for any binary search algorithm)

3. this missing code at the bottom is added:

  if (0 <= low && low < arr.length) {
    if (arr[low] == x) {
      return low;
    }
  }
  return -1;  // (or "raise Not_found" or whatever you use to indicate that 'x' was not found).
You can find the formal proof in WhyML below. This includes proofs that:

1. all array accesses are in bounds [checked automatically]

2. there is no undefined behavior (such as division by zero) [checked automatically]

3. the function always terminates [enforced automatically, proven with the help of the 'variant' clause]

4. there are no overflows [checked automatically, not possible since we're using bigints]

5. if an index is returned, the array has an element with value equal to x at that index [the 'ensures' clause]

6. if instead the function returns "not found", the array does not have any element with a value equal to x [the 'raises' clause]

... all this assuming that the input array is sorted [the 'requires' clause]

Code + proof here (it makes more sense if you can read OCaml syntax): https://clbin.com/jbTk8


Awesome! Interesting that it can prove that given only the loop invariant.

The inner loop could be factored to a subroutine with the following contract: let P be a predicate on the integers such that (i <= j) => (P(i) => P(j)), and let low be such that P(low) = false and high such that P(high) = true. The subroutine returns i such that P(i) = false and P(i+1) = true. The subroutine further promises to only call P(k) on low < k < high.

This may be applied to the predicate P(k) = (x <= arr[k]) on the range (low,high) = (-1,arr.length) to find the point at which the predicate flips from false to true (we're essentially pretending to know that there is a small value at arr[-1] and a large value at arr[arr.length], and the subroutine promises to only call P(k) on values strictly between -1 and arr.length).

Is it possible to do this with WhyML?


Actually, my loop invariant was unnecessarily confusing (I was sleep deprived), here's a better version: https://clbin.com/OhyJ3

Your question is very interesting and I think the answer is 'yes'. I will try to implement it and report back.


Ok, so this proved to be more difficult than I anticipated and it took me a while, but fortunately I really enjoyed the challenge!

It seems that the answer is definitely "yes", it can be done with WhyML.

There are a few minor caveats:

1. I had to change the function with the while loop to return an option instead of raising the Not_found exception, which makes the code a bit uglier. This is because I ran into the following bug in Why3 (it's already fixed in the master branch, but not in the most recent release): https://gitlab.inria.fr/why3/why3/issues/214

2. It turns out that the value of "high" is not statically known at compile time, because it depends on the size of the array. So I changed "high" to be a function of the array rather than a constant (I did the same to "low" for consistency).

3. Similarly and consequently, the predicate P(i) is not only a function of "i". It's also a function of the value being searched for ("x"), i.e. if P(i) is true or not depends on which value you're searching for. This also means that P() is also a function of the specific array being searched for (because some arrays may have "x" but others not).

4. You said that "we're pretending to know that there is a small value at arr[-1] and a large value at arr[arr.length]", but unless I misunderstood something, we can't actually pretend that, this must be written in the code somewhere, otherwise it won't work. For simplicity, I have chosen to implement those rules in the predicate itself. However, this is also enforced by the function with the while loop, i.e., it always makes sure that whatever predicate you have chosen, it has to implement those rules. I am sure that this could be changed, but I think it would complicate the proofs.

The result is that in module "GenericBinarySearch", like the name says, we have a neat generic binary search function which works with any predicate P (called "pred" in the code), any function "low", any function "high" and for any type.

And in module "ArrayBinarySearch" we have an instantiation of "GenericBinarySearch" which works for sorted arrays!

Also note that the axiom in the code is only an axiom in the generic module (it specifies what must be true of P()), but when the generic module is instantiated, Why3 verifies that the axiom is actually true for the specific instance of the predicate (so there is no cheating).

You can find the code in the link below. Enjoy!

https://clbin.com/Ov3NX


Very nice. I don't quite understand why you need the option in GenericBinarySearch. Doesn't the function always return Some?

I had something like this in mind:

    let binary_search (pred: int -> bool) (low: int) (high: int): int
        requires { pred low = False }
        requires { pred high = True }
        requires { forall i j. (i <= j) -> (pred i -> pred j)}
        ensures  { pred result = False /\ pred (result+1) = True }
    =
        let cur_low = ref low in
        let cur_high = ref high in

        while !cur_high - !cur_low > 1 do
            variant { !cur_high - !cur_low }
            invariant { pred !cur_low = False /\ pred !cur_high = True }
            let mid = div (!cur_high + !cur_low) 2 in
            if not (pred mid) then
                cur_low := mid
            else
                cur_high := mid
        done;
        !cur_low
Can something like that work?

  let predicate pred (i: int) (x: t) (k: int)
  = if i <= low x then false else
    if i >= high x then true else
    k <= x.arr[i]
Those tests are only necessary for the proofs, right? At run time the predicate will never be called with i <= low or i >= high, but only on low < i < high, because if high - low > 1 then low < (high + low)/2 < high. That's what I meant by pretending that the predicate returns false/true on -1 or arr.length: we don't actually need the predicate to work on those indices, because we never call the predicate at those indices.


> Can something like that work?

Yes, indeed! Your version is not only a lot more simple and elegant than mine, but your proof code was enough for Z3 to automatically figure out that everything works.

Therefore, I hereby declare that you are ready to use and learn more about Why3/WhyML.

Here's a working complete implementation of your version: https://clbin.com/cwvKY

There's only a minor caveat with your version: there's no formal guarantee that `binary_search` won't call `pred` below `low` or above `high`, because (as far as I know) we can't encode those requirements if we pass the function as an argument. Maybe there's a way to do that, but I don't know WhyML that deeply.

> Those tests are only necessary for the proofs, right?

I think so. We can probably separate `pred` into a logical version and a run-time version, making sure that the result of the logical `pred` matches the result of the run-time `pred` when `-1 < idx < length arr`.

I think this should be very easy to do with WhyML, I will try and figure out if I can do it.


That would be nice, then the run time predicate can be fast while the logical version remains elegant.

What happens if you remove requires { forall i j. (i <= j) -> (pred i -> pred j)}? I think it should still check, actually. That property ensures that the answer will be unique, but the algorithm will find a point pred i = False /\ pred (i+1) = True even if the predicate does not satisfy that property.

If you add the uniqueness to the ensures, does Z3 still automatically prove it correct?

  ensures { forall j. pred j = False /\ pred (j+1) = True -> j == result) }


> What happens if you remove requires { forall i j. (i <= j) -> (pred i -> pred j)}? I think it should still check, actually.

If I simply remove that 'requires', then Why3 cannot prove the postcondition of `binary_search` automatically anymore (using the Z3, CVC4 and Eprover automatic provers that I have installed).

Specifically, Why3 tries to split the postcondition into the 2 parts: `pred result = False`, which gets verified correctly, and `pred (result+1) = True`, which cannot be verified automatically.

I think this is because nothing stops `high` from actually being lower than `low`. If someone passes those 2 arguments reversed (high as low and low as high), then probably the algorithm wouldn't work, right? At least, Why3 is not convinced that it would work.

However, if I add this loop invariant: `!cur_low < !cur_high` and this function precondition: `requires { low < high }`, then it all works fine! (both are required).

Diff here: https://clbin.com/wVkhO And full code here: https://clbin.com/2gBfJ

I tried separating `pred` into logical and run-time versions (one with the ifs and the other only with the comparison), and it all seems to work fine, except for one problem:

The run-time version of `pred` is a partial function (it only works for valid array indices), so it needs a precondition. However, when I pass `pred` as an argument to `binary_search`, I can't / don't know how to specify that the argument needs the precondition.

Therefore, Why3 complains that the precondition of `pred` may not be respected (all other verification conditions are proven to be valid).

I could do what I did before (making `low` and `high` functions, etc) but that greatly complicates the code...

Maybe there is some way to do that, but currently I don't know how.


> I think this is because nothing stops `high` from actually being lower than `low`.

Ahh, right. I guess that's exactly the type of oversight that a checker is for :)

We could return the pair (!cur_low, !cur_high) and have the postcondition that pred (fst result) = False and pred (snd result) = True and abs (first result - snd result) = 1. Then it would work also if low > high, but I'm not sure this is useful in practice...

> The run-time version of `pred` is a partial function (it only works for valid array indices), so it needs a precondition. However, when I pass `pred` as an argument to `binary_search`, I can't / don't know how to specify that the argument needs the precondition.

If I'm understanding this correctly, you want to do something like this:

  let binary_search (pred: (i:int) -> bool requires { low < i < high }) (low: int) (high: int): int
But Why3 does not support this?

If you add a precondition like that to pred, wouldn't that also prevent requires/ensures/invariant from calling pred on arguments that don't satisfy the precondition? In the precondition we do want pred low = False /\ pred high = True, but the run time predicate only allows pred k for low < k < high?


> If I'm understanding this correctly, you want to do something like this:

Exactly!

> But Why3 does not support this?

As far as I can tell, it doesn't. I get a syntax error if I either try to name the argument to pred or if I try to add a 'requires {}'.

Maybe they will add this functionality to a future version, or maybe there is already a different but simple way to do this (but I don't know how).

> If you add a precondition like that to pred, wouldn't that also prevent requires/ensures/invariant from calling pred on arguments that don't satisfy the precondition?

No, logical/proof functions are always total, they cannot be partial.

One option is to always define what the function should return for the entire domain of its arguments.

The other main option AFAIK is to define the results only for a restricted domain that interests us and leave the function undefined outside this restricted domain (but in this latter case, we won't be able to extract conclusions about what the function returns outside this restricted domain).

However, as far as I know, the latter option needs to be implemented differently in Why3, specifically as an abstract predicate/function, and then you separately define axioms about things you know about the predicate/function. The disadvantage is that if you make a mistake in one of the axioms (say, you accidentally define that the function returns both True and False for the same input), then you are introducing an inconsistency which allows you to prove anything you want (i.e. you would be able to trivially prove that 2 = 3). This is undesirable, of course.

I think I saw somewhere that if you define a predicate `P` in Why3 that works both for runtime and for proofs, and then you add a precondition `A` to this predicate, then Why3 will automatically add an implication to the predicate for proofs, i.e. if use this predicate in a proof, the predicate will become `A -> P(x)` instead of just `P(x)`. But I'm not entirely certain about this, I could be wrong.

Unfortunately Why3 is not very well documented, the vast majority of what I've learned so far has been through reading the examples, looking through the git history, and trial-and-error.

> In the precondition we do want pred low = False /\ pred high = True, but the run time predicate only allows pred k for low < k < high?

Exactly, this is why I was trying to add a new predicate (for proofs only), which returns False for i <= low and True for i >= high, but calls the other predicate otherwise.

However, I still run into the same problem: I cannot specify that a function argument needs a precondition, and therefore Why3 cannot tell that the precondition doesn't get violated...


thanks, this is a great comment(and the rest of the thread). Yes, I meant that it doesn't work without assuming #1, and it had missing #3.


How so? Are you talking about (high + low)/2? Whether that is correct depends on the language. With 64 bit arithmetic it's correct in practice, and with arbitrary size arithmetic it's correct in practice and in theory. It only really causes problems if you're using 32 bit arithmetic on a 64 bit machine.


Just keep telling yourself that.

Or, look up the right answer.


Perhaps instead of being mean, you could be helpful and post a link to said answer.


With 64 bit variables it probably works. But for less here is the correct implementation: https://ai.googleblog.com/2006/06/extra-extra-read-all-about...


This implementation:

    mid = ((unsigned int)low + (unsigned int)high)) >> 1;
essentially extends the bit width from 31 bit to 32 bit by going from signed to unsigned, and is thus correct up to array lengths of order 2^31 rather than 2^30. Using an even larger bit width should classify as at least as correct as this.

In summary,

(low + high)/2 or (low + high) >> 1 is correct up to 2^30 for 32 bit signed and up to 2^62 for 64 bit signed, up to 2^31 for 32 bit unsigned, and up to 2^63 for 64 bit unsigned.


A cool thing about binary search is you can also use it in real life. It's great for troubleshooting.


`git bisect` is an indispensible git trick. You give it commit where you know the bug exists, and one where you know it doesn't and it'll use binary search to help you find where the bug came from. It's absolutely wonderful.


I always struggle to see why people bang on about git bisect. You need tests to make it work. If you have tests, why aren't you running them continuously? If you're running them continuously, why do you need git bisect?


Sure; if you never write code that contains bugs that your unit tests don't catch, then "git bisect" is probably useless to you.

As for the rest of us mere mortals, we sometimes write code that has bugs without discovering those bugs right away. In a complex system, it might not be easy to observe an incorrect behavior and immediately deduce the root cause. "git bisect" allows you to retroactively track down the commit that introduced a behavior change, even if it was something you didn't originally think to test for.


If the bug is a regression, you write the test after finding it, and execute it with git-bisect. If the test starts to pass, you have the commit where it will fail.


If I can write a failing test, I can likely fix it. If I am interested in seeing who broke it and when, I can run git blame on the thing I'm fixing.

Given that my test suite is generally in the same repo as the code, I'd need to write something that patched my new test into the codebase at every git bisect commit, recompile and run it.

I can see that this may occasionally be useful if you have an extremely hard to find bug, but for me it's pretty rare. In fact I've done this once, ever.

Hence my skepticism when people describe this as "git's killer feature" or whatever other hyperbole.


You don't really need automated tests. You can use a reproducible bug report as test input, then use git-diff to determine the first point it time it appears.


Because your tests didn't have 100% coverage.


It's useful for projects where maintaining tests is impractical such as the linux kernel. When someone tries to find a regression in the kernel they are probably going to write a test which grep's dmesg for a certain output, or sometimes the regression is a kernel panic and you are basically checking if your system crashes or not on each bisect step.


Yep, I use this all the time to figure out which SVN commit introduced a bug (run the last deployed version and verify it didn't have the bug, then binary search the commits in-between that and HEAD until I figure out which one caused it).


I recently learned about git bisect. Very useful!


git bisect is cool, but this also works with like electronic circuits, AV signal chains, etc...


Have a look at interpolation search.

The complexity is O(log log n), under the assumption of a uniform distribution of the data.

https://en.wikipedia.org/wiki/Interpolation_search


I've used it as a very simple equation solver in a pinch. Yes, it's a very naive approach for most equations, but getting started solving it satisfactory with a solution given is very welcome.


...wait, how? I’ve never heard of a use like this.


If f is monotonic and continuous, you can solve f(x) = y for x by successively narrowing an interval [a, b] where f(a) <= y <= f(b) (or vice versa for decreasing functions). In the case that the range of f spans the entire real numbers, the initial range can be determined as the smallest range [-2^k, 2^k] where k is an integer (a strategy often used for open-ended binary search).


Interestingly, with floats you can also binary search on the binary representation. That narrows down the answer to the most precise answer representable in floating point in at most 64 steps for 64 bit floats. If you bisect the interval in the middle at each step, then it could take more steps (possibly more than 1000).

It turns out that doing this is pretty much equivalent to what you suggest, if you narrow down the initial range [-2^k, 2^k] by binary searching on k, because that's just binary searching on the exponent in the binary representation.


Interesting! I didn't really think about this but this is how I implemented sqrt() and division in an interview. Didn't think of how it generalizes!


Isn't this bit:

const mid = Math.floor((right + left) / 2);

susceptible to overflow?

EDIT: Hm, perhaps not (in JS). Number.MAX_SAFE_INTEGER is much greater than I expected.


Indeed, Java's binary search had this bug for a while! It was found through automatic verification, by the way.

https://ai.googleblog.com/2006/06/extra-extra-read-all-about...


also some CS books


Incidentally, as outlined in this Julia FAQ, native machine arithmetic with overflow will deal with this correctly.

    mid = (lo + hi) >>> 1
https://docs.julialang.org/en/v1/manual/faq/index.html#Why-d...

EDIT: legibility


Thank you, fixed it!

const mid = Math.floor(left + (right - left) / 2);

https://stackoverflow.com/q/27167943


Not in Javascript, where everything is a double precision float. You would lose precision at about 2^51, but that’s not a limit that will meaningfully affect us for good while.


Oh really? And what if you aren't storing the data? You just want to find such n, that f(n) >= m and f(n+1) < m. Now perhaps it might become an issue.


The FFT algorithm is perhaps the most elegant and useful work of the 20th century. Modern communications would probably not be possible with out it but it's utility isn't limited to electronics It is used across the list of scientific disciplines and even in finance and economics. Next time you make a call on your smart phone, hoist a beer to Cooley, Tukey, and Carl Gauss


I have heard it called the most important algorithm of the 20th century. Not sure I buy that, I would say it's the most important algorithm in EE though.


It's also critical for factoring prime numbers and SETI signal intelligence processing. Truly a gem.


Factoring prime numbers?


https://en.wikipedia.org/wiki/Prime-factor_FFT_algorithm

https://math.stackexchange.com/questions/977955/is-there-a-w...

My experience with this is from using Prime95 two decades ago as part of a distributed computing project to factor Mersenne prime numbers

https://en.wikipedia.org/wiki/Prime95 | https://www.mersenne.org/


Failing to factor them can be pretty important.


Even after working with it for 15 years I still find new and interesting things about it.


I like Dijkstra's algorithm for finding the shortest path between two nodes in a graph.

It's not so much that it is "beautiful", but it is a remarkably simple algorithm that a human can follow manually. The reason it is efficient is easily understood (it's easy to see how we are able to "finalize" nodes because it's obvious there is no shorter path to that node), and it takes what would otherwise be a complicated task and makes it manageable.


What I like about Dijkstra's algorithm is not that you can calculate the shortest path between two nodes, but that you can calculate the shortest path from one node to every other node, and do it one pass in O(n) time. I'm not sure it's at all obvious that it should be possible to do that.


>do it one pass in O(n) time

Mind you, the “n” here is not the number of nodes!

More precisely, Dijkstra’s algorithm runs in O(E + V * log V) time (if the priority queue is implemented with a heap), where E is the number of edges and V is the number of nodes.

In the general case, E = O(V^2) (i.e. fully connected graph), so Dijkstra’s algorithm can calculate the shortest path to V nodes in O(V^2) time. It sounds less staggering this way.

And in a general graph where negative edge-weights are permitted, it’s actually impossible to find the shortest path between two nodes without finding the shortest path between the source node and every other node! It’s pretty easy to see why: you can’t be sure you’ve actually found the shortest path until you’ve covered all the edges, because there’s always the possibility of a very large negative edge to be visited.

Dijkstra’s algorithm improves on this general case by exploiting the fact that the edges must have positive weight.


> And in a general graph where negative edge-weights are permitted, it’s actually impossible to find the shortest path between two nodes without finding the shortest path between the source node and every other node!

If there are negative edge-weights and cycles may occur, there is no shortest path between some nodes. You can keep going through a cycle whose total weight is negative getting "shorter" and "shorter".

It's like a race where one shortcut takes you back in time and leaves you where you started. You can use it to finish the race as far back in time before you started as you want.


Sure, but between "all edges have positive weights" and "there are cycles with negative total weight" there are still graphs that have negative edge weights, yet there's no negative-weight cycle.

Example:

V = { A, B, C }

E = { A -> B, B -> C, C -> A }

weight(A -> B) = 1

weight(B -> C) = 1

weight(C -> A) = -1


Right, yes.


You are, of course, correct. That was sloppy writing on my part.


I've always suspected that Dijkstra's shortest path algorithm was invented many times before he wrote it down. The algorithm itself is relatively straight forward and I can imagine that a competent programmer not aware of it could come up with it independently.

I mean, even Dijkstra himself came up with it because he needed it for a telecom gig. He can't have been the first one in the world who needed to find the shortest path in a graph. He just also happened to be a scientist, experienced in the whole "how to publish a paper" thing.


In one of the documentaries about him, he remarked that having written it up, he had no idea where to publish it.


For similar reasons, I really like the A* pathfinding algorithm. If you implement it on a grid with obstacles, draw the grid on screen, and update it as the algorithm runs, it's both beautiful and very intuitive to watch.


K - means clustering, which I learned about recently.

For non-machine learning folks, this algorithms helps in grouping relevant items together. As a practical, using this algorithm you can segment your customers based on their purchase history and interests.

The actual algorithm is very simple, imagine a lot of points on the 2D plane and each of them represent customers, whom you want to segment/cluster. Now, chose random cluster points and assign those customers to these points, by their distance. That is, whichever customer is near to a point, it belongs to that. At the end of iteration, you have lots of customers assigned to each cluster. Now, for each cluster, take the mean of those customers and the move the cluster to this mean value. Slowly, the cluster points move to the center with all the customers surrounded, belonging to that cluster.

read more - https://en.wikipedia.org/wiki/K-means_clustering


The scikit-learn page on clustering [1] is a really great way to see K-means and other algorithms side-by-side. The pictures really make it easy to see how all the algorithms perform on various data distributions.

[1] http://scikit-learn.org/stable/modules/clustering.html


The interesting part is solving the "closest pair problem" which is a part of the clustering algorithm. Hurts my head just thinking about it, god knows how someone came up with a solution like this. https://www.geeksforgeeks.org/closest-pair-of-points-using-d...


The closest-pair algorithm is something you might use in single-linkage clustering, not K-means clustering:

https://en.wikipedia.org/wiki/Single-linkage_clustering


Lesson two of this course does a great job explaining K-means and its connection to Expectation Maximization. https://www.udacity.com/course/machine-learning-unsupervised...


I prefer mean-shift for it's simplicity, as it even creates it's own clusters.


Could you explain what you mean by that a little more? I looked at the Clustering section of the Wikipedia page for Mean-Shift:

https://en.wikipedia.org/wiki/Mean_shift#Clustering

but I don't get how C and r are chosen, nor how the method is supposed to be "non-parametric" given that you need such parameters to begin with.

Also, how are you supposed to assign all points to separate clusters, and how would you determine how many clusters to use in the first place?

Having looked at the that page I also definitely don't think it's simpler than K-means!

Edit: This article gives a much more accessible descriptions than wikipedia:

https://spin.atomicobject.com/2015/05/26/mean-shift-clusteri...

To answer my own questions, the number of clusters is determined by the number of max points of the kernel density function, but the kernel bandwidth does have to be specified.


Thanks for not just looking for an answer to your own question but also sharing it!

> So how does mean shift come into the picture? Mean shift exploits this KDE idea by imagining what the points would do if they all climbed up hill to the nearest peak on the KDE surface. It does so by iteratively shifting each point uphill until it reaches a peak.

I wonder if this would still work with high-dimensional data. Distance and space start to act weird in higher dimensions, right?


Without looking much further into it, I can't imagine it going very wrong. My intuition says that the density function will be bounded, since you're working with a finite number of points, therefore you'll always be able to climb to a local maximum.


The problem I see is the word "near" in "climb to nearest peak". To quote the curse of dimensionality wiki page:

> The common theme of these problems is that when the dimensionality increases, the volume of the space increases so fast that the available data become sparse. This sparsity is problematic for any method that requires statistical significance.

https://en.wikipedia.org/wiki/Curse_of_dimensionality


Are you concerned that in a very high dimensional space you'd need an absurd number of points in order to produce a meaninful KDE, since otherwise the points will be too sparse? If so I think I'm with you... you'll probably just end up with a very large number of small, weak clusters.

Although I wonder if you can't account for that by simply increasing the kernel bandwidth. Perhaps not. Perhaps this is why mean-shift seems to be mostly used for computer vision and image processing, where (I guess?) the number of dimensions is low.


K-means is not an algorithm, it's a heuristic for an Np-hard problem.


It is absolutely an algorithm in the sense of "a set of rules to be followed". I think you mean that it doesn't guarantee an optimal solution. That just means it's a heuristic algorithm, same as simulated annealing is a heuristic algorithm for solving optimisation problems.


Nope. An algorithm has to be effective. You can find pathological cases for k-means such that it will never converge on anything useful. So if you set your termination case to be convergence it will never terminate and if you don't then it will never be effective.


I think you might be in the minority in this opinion. Many algorithms have pathological cases but are still considered algorithms


Minority? This is directly from Knuth.


Knuth defines effectiveness as: "... all of the operations to be performed in the algorithm must be sufficiently basic that they can in principle be done exactly and in a finite length of time by a man using paper and pencil."

K-means and other heuristic algorithms fit that description.


BogoSort is an algorithm. Not a very good algorithm, but an algorithm nevertheless.


No it absolutely is not.

The lack of fundamental computer science knowledge in this thread is alarming.


Negative 2 points on a post saying that a computational method that possibly never terminates is not an algorithm... Oh dear...


It never terminates with probability 0.


As far as I can tell you're only arguing against poor implementations of K-means. If you demand that the score strictly improves at each iteration then the algorithm must terminate.


And how do you "demand that the score strictly improves"? It's an NP-hard problem.


K-means implementations generally terminate once there's an iteration where the score doesn't improve. This happens when there is convergence to a local minimum or - less likely - the state hops between two nearby local minima with the same score. But it will terminate on something, and most of the time that something will be pretty good.

I saw your mention of Knuth elsewhere, I looked it up and he demanded that

> An algorithm must always terminate after a finite number of steps ... a very finite number, a reasonable number

This is a pretty niche characterization and almost certainly not what the original post was asking for. However, I concur that there is no guarantee on how quickly K-means terminates or on how good the output will be,. But... if you're going to be that strict about it you would even have to rule out the Simplex Algorithm, which everyone I've ever spoken to thinks of as an algorithm.


In that sense kmeans may be better referred to as a 'computational method' rather than an algorithm.


Indeed.


Isn't a method that gives an approximate or best-fit estimate to a problem still an algorithm, if it terminates?


No. You can't prove that k-means does anything useful.


Is the definition of "algorithm" that you're using here useful?


It's one of the most fundamental concepts in computer science and underpins decades of research. You can decide if it's useful.


This isn't a classroom, and your pedantry isn't adding anything useful to the conversation. We all understand these pedantic quibbles you're arguing about... and what the community is more or less collectively saying is "in this context, we don't care about the distinction between an 'algorithm' in the textbook sense, and a 'heuristic' in the textbook sense".


Nah. Most of them don't understand the difference. If you did you wouldn't can it pedantry.

I personally don't find heuristics beautiful. That's why I commented.


To be fair, you haven't explained at all clearly why you don't think k-means adheres to Knuth's notion of an algorithm.

Your objection seems to be

> You can find pathological cases for k-means such that it will never converge on anything useful

As has been pointed out more than once, a good implementation of k-means is guaranteed to terminate in a finite time. And whatever you mean by "useful" doesn't seem to appear in Knuth's definition of an algorithm.



I was gonna mention this one as well! So simple and clever.

I like the explanation in the case study in Sedgewick's Algorithms [1].

Compare its simplicity to connected-components [2], another elegant algorithm but with a perhaps a bit more involved implementation.

1: https://algs4.cs.princeton.edu/15uf/

2: https://algs4.cs.princeton.edu/41graph/CC.java.html


Yeah, this is one of my absolute favorites as well. Such a neat idea. The analysis is also fun, where the complexity of operations is O(alpha(n)), where alpha(n) is the Inverse Ackermann Function. That function is just fun to contemplate.

You wanna flex your muscles with this data structure, this is a fun Project Euler problem: https://projecteuler.net/problem=186


I was recently in a coding interview where I had heard of the original problem (a version of "transform one word to another one letter at a time, making sure that each step is a real word") so instead of just skipping the question, we went straight to the "extra credit" question of "can you tell that there is a path from one word to another faster than just calculating the answer?"

After hearing the word "memoization" go through my head (part of "Hacking the Coding Interview." which I recommend at least for the wealth of example questions for practice), I basically walked my way through making a disjoint-set data structure using the words as keys and the set number as values. I'm prettyy sure that making that up on the spot is what got me the job.


I love this algorithm (I like implementing type systems) but I always feel a bit naughty when I implement it, as AFAIK it can't be implemented with immutable data structures (not efficiently, at least).


I like implementing type systems

Could you expand on this? This is something I've just started reading about. I'd be interested in good resources to use to get started. At the moment I've just started reading TAPL.


Check out https://github.com/tomprimozic/type-systems there's been a few HN threads about it as well. I can also answer any specific questions you have, or if you want further resources I can try and find them... (there's a good online book I have in mind, but I've no idea how to find it right now!)


Some great stuff in this repo thanks. I'm particularly interested in resources that build a type system up step by step, from very simple and working towards Hindley-Milner


Hm... I'm not sure it works that way. Type systems are quite fragile beasts, if you change one thing you can easily break the rest. Especially when it comes to type inference! Although I'd say that HM is quite simple, especially if you don't consider polymorphism (i.e. if you require that every function parameter has a concrete type like int or bool).

In fact, that might be a good starting point - first implement something with the above 2 types, and where every variable and function parameter has a type annotation. From then, you could (1) add more complex types, like function types, tuples or lists, (2) implement type propagation where each variable has the type of the value it's assigned (like auto in modern C++), and then (3) go full HM type inference.

TAPL definitely sounds like a good resource. The next book might be this one: Advanced Topics in Types and Programming Languages https://www.cis.upenn.edu/~bcpierce/attapl/frontmatter.pdf


Have you read TAPL? I'm curious how you think that stacks up to readily-available online resources.


No, I've just skimmed over some of its chapters to clarify some concepts. I've mostly learned by reading code and then trying to implement my own type systems. Algorithm W is really quite simple and basic.

Another good resource could be Programming Language Zoo http://plzoo.andrej.com/ that covers different evaluation techniques as well.

In general, I've been "involved" in PL design for quite some time, so I've no idea where I've gained all the knowledge I have... but in recent years, there have been quite a few modern resources, even a few on HN IIRC!

e.g.

http://craftinginterpreters.com/

http://createyourproglang.com/

(disclaimer: I haven't read them)


Exponential backoff: I don't know if this is a published algorithm.

Basically, background processes need to keep retrying an operation until it succeeds. (Make an API call to a server, upload a file, ect, ect.) If the retry interval is too small, you can DOS the remote server. (Server returns a 5xx error because there's a corner case that hits a defect.) But, if the retry interval is too large, your background process can run too slowly when it encounters transient errors. (Temporary network glitch, database under high load because of a cache flush.)

So, we pick an ideal small retry interval, and the maximum tolerable interval. (In my case, it's usually 30 seconds and 15 minutes.) Then, when errors happen, the retry interval keeps doubling until it hits the largest interval. So, the operation is retried at 30 seconds, 1 minute, 2 minutes, 4 minutes, 8 minutes, and then every 15 minutes until it succeeds.

The result is that transient errors don't cause large delays. Situations where a defect prevents a request from submitting don't DOS, because the retry interval throttles itself longer and longer.


Ideally, you also want to implement "jitter" in the retries. AWS has a good article about it: https://aws.amazon.com/blogs/architecture/exponential-backof...

The summary (from the article) is that:

    sleep_time = random_between(0, min(2 ** retries, max_sleep_time))
so that retries are "spread across" the delay window.


Reminds me of the original Ethernet algorithm to deal with collisions on coax.



This is also used in TCP with timeouts and ACKs. The idea is you have an expected timeout (ETO) and a retransmission time out (RTO). Initially, you set RTO to ETO and then you send a packet. If you don't get an ACK back, and the timer for the RTO expires, you then set RTO = 2 * RTO. Otherwise, reset RTO = ETO. I think they referred to this as 'exponential averaging' however.


I tend to use a min/max equation that acts similar to a fibonacci backoff, allowing for a couple rapid retries before the exponential backoff kicks in. Provides a little extra speed for the transient problems.


This can't be an algorithm? I implemented it on a loop and a limit duration ( an hour) and a Max interval of a minute ( starting from 5 seconds) in 5 lines of code


A short algorithm doesn't make it any less of an algorithm :)


The Euclidean algorithm to compute the greatest common divisor of two numbers.

In Javascript, if you have two numbers x and y that are both greater than zero, you compute the greatest common divisor xyGcd in one beautiful line of code:

  var xyGcd = function gcd(a,b){ return b ? gcd(b, a%b) : a; }(x, y);
In my opinion this is the greatest one-liner ever.


> In my opinion this is the greatest one-liner ever.

I dunno, I really like Haskell's one-line infinite Fibonacci sequence:

    > let fibs = 1 : 1 : zipWith (+) fibs (tail fibs)
    > take 7 fibs
    [1,1,2,3,5,8,13]
    > fibs !! 1000
    70330367711422815821835254877183549770181269836358732742604905087154537118196933579742249494562611733487750449241765991088186363265450223647106012053374121273867339111198139373125598767690091902245245323403501
I don't think it's the most efficient solution, but it really showcases lazy evaluation with a simple example.


Isn't it a-b instead of a%b?


a % b is a way to do it in fewer iterations than a - b.


I loved this article on the elegance of deflate http://www.codersnotes.com/notes/elegance-of-deflate/ finishes with this:

It's not one algorithm but two, that dance and weave together in harmony. You can pick dictionary matching, or entropy coding, and you can select between them on a per-byte basis with no overhead. That's what I find so clever about it - not that it can do one thing or the other, but that it can choose either, and yet represent them using the same language. No extra markers, nothing to say "oh, now we're switching to entropy mode", nothing to get in the way.


There's something fundamentally appealing about Genetic Algorithms to me. Even though they don't apply everywhere, and even though "hill climbing with random restart" can be just as effective in many cases, there's just something awe inspiring about the idea of mimicking evolution.

I also really like other related "nature inspired algorithms"[1] like Ant Colony Optimization, Particle Swarm Optimization, etc.

[1]: http://www.cleveralgorithms.com/nature-inspired/index.html


I have to agree. The fact that evolution is the only process we know have spawned consciousness just resonates with me.

While our algorithms might mimick evolution poorly, there's something raw about it. And it's not totally forgotten in research: https://blog.openai.com/evolution-strategies/

I stumped upon Differential Evolution which I implemented here https://github.com/peheje/nim_genetic/tree/master/Differenti...

It's fun to see how easy it is to define a new problem for it to solve.

Would like to apply it to create NN or Trees for classifications or the like.


Agreed. Ever since I first discovered them, I've decided that I eventually want to specialize in these types of algorithms. They may not be the most efficient algorithms to solve a problem, but they're incredibly fascinating, and their potential is endless.

I was actually considering implementing one during development of the game I'm working on to generate a soundtrack (since I'm not a musician), but the problem is the fitness function would require a user rating of each generated sound sample, and to get any kind of decent results, that would require me to personally listen to and rate thousands of songs. Either that or outsource it through SoundCloud or something.


I may be stretching the definition of algorithm here, but I've always been fascinated by signed distance functions / signed distance fields for 2D / 3D shapes.

Being able to do the full set of boolean operations on shapes with just `min` and `max` operations is pretty cool. Doubly so because boolean operations on standard geometry is such a difficult, messy problem.


Bloom filters are interesting. A bunch of different hashes as kind of fingerprint that make search fast but with caveats: negative answer is negative, positive answer might be negative.


HyperLogLogs are a great sketch data structure of which a bloom filter is a special case. The cool thing about them is that you can perform approximate set operation cardinalities in runtime proportional to sketch size rather than the cardinalities of the sets you’re comparing.


+1 for bloom filters


I've always appreciated the Bresenham's line algorithm. It's a very simple algorithm and has been "superseded" by Xiaolin Wu's line which can also do antialiasing, but ever since I learnt about it, I've been very fond of it. Also the fact that it's an algorithm for use in computer graphics helps, because for me it's very easy to get excited about visual stuff as opposed to the more abstract CS stuff, which I have deep appreciation for but can't understand most of or kick myself enough to try.


> It's a very simple algorithm and has been "superseded" by Xiaolin Wu's line which can also do antialiasing

Actually, not really! Bresenham's Line Algorithm secretly has a very useful generic low-level algorithm at its core. It just happened to be created for drawing lines first.

The thing most people miss is that it describes a way to do error-free repeated addition of a fraction using only integers and addition. In fact, you can add many different fractions, as long as they all share a denominator.

That is hugely powerful in the right context: imagine an embedded context where you have to add fractions to a counter, but the counter should be an integer value at all times. Maybe you must add different fractions in different situations. For example, an embedded device with two non-matching frequencies to keep track of over longer periods of time. With this method you're guaranteed that no matter how long you keep adding, the sum will always be correct.


Yep; it's one-dimentional variant of Error Diffusion (which itself is a notably beautiful dithering algorithm), where "input" = slope of line and output is quantized dy=0 or dy=1.

Dithering also appears in audio. The old PC Speaker had merely 1/0 states. It's supposed to be limited to square waves, limited and unpleasant, right? But at some point people figured out if you flip 1/0 fast enough you can approximate continuous values: http://bespin.org/~qz/pc-gpe/speaker.txt

I once played with using "error diffusion" like dithering to play wav files. In theory that should produce better audio than the simple fixed-table and PWM dithering suggesting in above article, but I did this on a much newer computer (Pentium?) by which time PC Speaker was irrelevant (only for hack value) and only reached 30-40 bits per input sample at 100% CPU. Plus as that article explains, simple PWM might(?) actually produce less noise from timing irregularity.

But the technique is not merely an obsolete hack! Every CD player that visibly brags it has "1-bit DAC" uses a closely related technique, though in fast hardware. See: https://en.wikipedia.org/wiki/Delta_modulation https://en.wikipedia.org/wiki/Delta-sigma_modulation Take me with a grain of salt, I always get confused trying to grok these... Full explanations of the noise shaping properties that make Delta-Sigma highly useful involve quite a bit of signal-processing, but the core accumulate-compare loop is a very simple algorithm...


The Bresenham Circle Algorithm is cool too. You can draw a circle with it in just a few lines of code: https://en.wikipedia.org/wiki/Midpoint_circle_algorithm


Boyer-Moore (grep) : https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string-sea... because it's cool.

LR parser https://en.wikipedia.org/wiki/LR_parser because Aho & Ullman

https://en.wikipedia.org/wiki/Church_encoding 'cos it shows how fundamental lambda calculus is in a simple way (and how functional programming is so superior (flame wars intended :-) )


I'm surprised no one mentioned PageRank (or maybe I missed it).

Another favorite is RAFT, simply because of how elegant and simple it is and how easy it is to understand.

Also - I used to work in a company that did P2P video streaming (pre WebRTC) and the way the protocol leveraged Reed Solomon Coding was just awesome. https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_cor...


Probably the most commercially successful algorithm ever invented.


Seconding this. Its super easy to understand and explain to others (albeit implementation I have found to be somewhat more challenging when you dig into the details)


Yeah, once you start considering log rewrites I remember it got a bit more challenging. But the main idea is super elegant and beautiful.


Duff's device. Because you will first look at it in disgust, but then you realize how clever it really is.

See: https://en.wikipedia.org/wiki/Duff%27s_device


Quite a few EA Sports games have Duff's Device in locations of computational bottlenecks. I know because I put them there. Wonderful little mechanism.


That's really interesting to hear. I tried Duff's Device myself, once upon a time, and found it made no measurable difference, so I pulled it back out. This was a long time ago, so I had just assumed that a similar optimization was just built into most C compilers these days. Does it vary by toolchain? I believe I was using clang at the time.

(I was still quite wet behind the ears at the time, so it's also more than possible that I was just doing something wrong.)

All that said, agreed, wonderful little mechanism. It's one of those things that every C programmer should take the time to really understand.


Yes, modern compilers do loop unrolling nowadays, so Duff's device doesn't really have a use. But it's certainly a wonderful bit of history. I found Duff's device when I was researching coroutines in C.


Is there ever a case nowadays where the compiler doesn't do this for you automatically?


No, this is just a wonderful bit of obsolete hacker history. Unless if you mess with retro-architecture for fun (like old PDP machines, be they emulated or real), then it's somewhat relevant.


The first time I saw that thing I just thought "That just can't be valid C syntax". It's quite an amazing hack that seems ugly at first but seems more and more elegant the more you think about it.

These days compilers usually do loop unrolling on their own, so the device has mostly lost its purpose, but it can apparently also be used to implement something similar to coroutines in C, which is nice.


Disgusting indeed: A way to write irreducible loops that doesn't use a goto. Good idea to measure the resulting code, given how poorly most optimizers deal with such loops.

Even sadder because the same effect can be achieved cleanly (and "optimizably")by using as "switch" followed by a "while".


You should read Tom Duff's original email about this:

https://www.lysator.liu.se/c/duffs-device.html

A quote of the man himself about this: "I feel a combination of pride and revulsion at this discovery."


It's interesting how this seems to be not quite an algorithm - I suppose "device" is an appropriate name. The "See also" section on Wikipedia leads to some good stuff.


Karger's randomized contraction algorithm for finding a min-cut. It's a common algorithm to introduce students into the world of randomized algorithms.

Also a shameless plug. My friend and I came up with this pseudo-polynomial time algorithm for subset sum that can be taught in a single session. It is faster than the standard dynamic programming algorithm. https://arxiv.org/abs/1807.08248


I actually read some of this paper! I liked your FFT trick.


Thanks! Do you use that algorithm for anything in your research?


Not directly. I briefly thought that a certain computational number theory problem might involve subset sums, but alas, it turned out to be the wrong approach.


By far my favorite is Depth First Search (DFS). It's almost a bread and butter algorithm for a lot of problems (at least for quick tests) with only a couple lines long in its recursive form. Some example usage: - combinations - permutations - generating sequences - exploring tree structures - finding paths - solving pretty much any search space problem and much more. It's not the most efficient for the large majority of these cases but it allows you to throw 10-15 lines of code to quickly test an idea before actually working on it.


1. The "Skew Algorithm", aka "DC3", aka Kärkkäinen-Sanders. It uses radix sort in an extremely brilliant way to construct suffix arrays in linear time. I found these explanations helpful (though it still took me some time to digest): http://www.mi.fu-berlin.de/wiki/pub/ABI/SS13Lecture3Material...

2. Fast Fourier transform (FFT). It's another quite brilliant algorithm used for decomposing a function into its frequency components in linearithmic time (among other things).


I would also vote for FFT. It's amazing how widely it's used and what sort of tricks you can do in frequency domain. Multiplying polynomials? Simple! Multiple 2D projections of a 3D object? Just FFT them, do a couple of simple steps and you will get a 3D model.


Do you have some details for these applications ?


The multiplication is concisely described here http://numbers.computation.free.fr/Constants/Algorithms/fft....

For 2D -> 3D, see https://en.wikipedia.org/wiki/Projection-slice_theorem for a simple overview


I have always liked DBSCAN: https://en.wikipedia.org/wiki/DBSCAN

DBSCAN is used for clustering in applications, much like k-means clustering, but in k-means clustering, the number of clusters must be known in advance (hence the k)

DBSCAN solves this problem by for every point in the graph checking if a certain threshold of vertices is within a certain radius. If this is the case, it will add these vertices to the new cluster and repeat the process for these nodes. It is very simple, so much that you can easily explain this to non-technical people.

In my previous job I used it for detecting whether for one word in a signal there only existed one representation or multiple (used for toggle bit detection)


If you like DBSCAN, I would recommend checking out OPTICS. It is not a clustering algorithm per se, but generates information from which clusterings for multiple settings of DBSCAN parameters can be "queried" cheaply. The DBSCAN and OPTICS papers share an author. One my favorite algorithms. This is the original paper - [1], and this looks like a helpful presentation on the topic - [2]. Since you mention k-means, I would point out that unlike k-means which finds convex-shaped clusters only, DBSCAN/OPTICS identify clusters of any shape.

[1] http://www.dbs.ifi.lmu.de/Publikationen/Papers/OPTICS.pdf

[2] https://www.cse.buffalo.edu/faculty/azhang/cse601/density-ba...


That definitely looks very interesting, I'll look into that.

Clustering on non linearly separable clusters is also one of the reasons we chose DBSCAN back then :)


The rsync algorithm.

There is a theorem that it is impossible in general to remotely compare two files with less network traffic than it requires to simply send one file over the wire. But rsync does it. How? By accepting a theoretical but unlikely possibility of coming up with the wrong answer.

What rsync does is compare hashes of ranges of stuff. If the hash comes out the same, the two are assumed to be identical with no further investigation. It is possible that different files will be thought identical. But it is unlikely that any two hashes ever generated by rsync have accidentally been the same when the underlying files were different.

(I do have to say "accidentally" because rsync uses MD5 as a hashing algorithm, and people have deliberately created collisions in MD5.)


rsync was indeed a dramatic "this should not be possible" feat. But the way it uses rolling hashes is not the simplest possible. One side sends hashes of fixed-size chunks, the other scans input for rolling hash matches, which only works for 2 sides and requires non-cachable CPU work on at least one side.

There is an even simpler idea at the core of many modern backup programs (eg. bup, attic, borg)! Possibly pioneered in `gzip --rsyncable`, not sure. https://en.wikipedia.org/wiki/Rolling_hash#Content-based_sli... Each side slices files into chunks at points where rolling hash % N == 0, giving chunks of variable size averaging N. These points usually self-synchronize after insertions and deletions. This allows not just comparing 2 files for common parts, but also deduplicated storage of ANY number of files as lists of chunk hashes, pointing to just one copy of each chunk!


rsync also checks modification dates. you would have to be extremely unlucky for both the checksums and the modification dates to be ok but for the files to be different


The Gilbert-Johnson-Keerthi convex shape intersection algorithm.

It can take any two convex sets and tell you if they overlap, while converging on a separating axis or a penetration point, if it exists. All you need is a support function that returns a point in the set that is furthest along a given direction vector. The algorithm works in an arbitrary number of dimensions.

Basically, it operates on a new shape which is the Minkowski difference of the two intersected sets. If the difference-shape contains the origin, then the two shapes overlap. The algorithm iteratively constructs simplexes inside the Minkowski difference, trying to build one that encompasses the origin. This allows it to have essentially constant memory for a given dimension of problem.

It's a very elegant formulation of a very general problem. It's so pleasing that you can intersect cylinders with cones, or oriented cubes with frustums, or polytopes with superellipses all with the exact same single algorithm.

The Wikipedia writeup is terrible, so I may do my own with nice illustrations at some point.


Neat. I googled, and have been reading this writeup: https://www.medien.ifi.lmu.de/lehre/ss10/ps/Ausarbeitung_Bei... —which seems quite nice so far.


Wow. I'm surprised to see this on here. I used to work Johnson, he's is a super nice guy and always made time to have long conversations with me about optimization.

I believe this algorithm is widely used in the video game industry?


Video games and anywhere you need a rigid body collision simulation. So TV & film visual effects is another common use.


Also researchers who need fast approximate physical dynamics. For example the Mujoco simulator or Drake.


I like the A* pathfinding algorithm. It's not particularly elegant or beautiful on itself, and it's very dependant on the heuristics applied & the topology of the terrain used. Yet the results tend to be quite pleasing to watch.

Also, it has a great website dedicated to it, with lots of interactive demos:

http://theory.stanford.edu/~amitp/GameProgramming/


Actually it's pretty elegant by itself, Consider this all traversals are same,

val someDS;

while(!someDS.isEmpty()){

  addChildren(someDS.pop()) 
}

Now replace, someDs with

Queue -> BFS

Stack -> DFS

Priority Queue, with priority of distance so far -> Dijkstra.

Priority Queue, with distance + heuristic -> A*

Its beautiful.


The development of "Shakey the robot" led to the discovery of A* along with a few other algorithms.

https://en.wikipedia.org/wiki/Shakey_the_robot


Minimax search https://www.cs.cornell.edu/courses/cs312/2002sp/lectures/rec... How knew that a couple of line of code could be used cause so many interesting behaviors like cooperation, betrayal etc.


Yes and alpha beta pruning. Your link mentions that.


Dancing Links (Knuth's paper: https://arxiv.org/pdf/cs/0011047.pdf )

> In computer science, dancing links is the technique suggested by Donald Knuth to efficiently implement his Algorithm X. Algorithm X is a recursive, nondeterministic, depth-first, backtracking algorithm that finds all solutions to the exact cover problem. Some of the better-known exact cover problems include tiling, the n queens problem, and Sudoku.

~ https://en.wikipedia.org/wiki/Dancing_Links

Also SEQUITUR http://www.sequitur.info/

> Sequitur (or Nevill-Manning algorithm) is a recursive algorithm developed by Craig Nevill-Manning and Ian H. Witten in 1997[1] that infers a hierarchical structure (context-free grammar) from a sequence of discrete symbols. The algorithm operates in linear space and time. It can be used in data compression software applications.

~ https://en.wikipedia.org/wiki/Sequitur_algorithm

Part of the reason I like these both is that they each require sharp pointer manipulation to work efficiently.

I'm a functional programming fan, these beautiful and useful algorithms remind me that purity isn't everything, eh? Also, it's fun to think about how you might get FP code to emit correct pointer-munging code for them.


+1 Dancing Link. Donald Knuth rules !


Shor's Algorithm blew my mind after getting the slightest grasp of it thanks to Scott Aaronson: https://www.scottaaronson.com/blog/?p=208

The LLL lattice basis reduction algorithm and the Coppersmith method are also jaw-dropping


Levenshtein distance, a dynamic programming algorithm for determining the edit distance between two sequences - the number of insertions, deletions or substitutions required to convert one sequence to the other. https://en.wikipedia.org/wiki/Levenshtein_distance


Interestingly enough Levenshtein can also be implemented as an A* variation iirc.


Gosling used a variant of it to optimize screen updates in emacs.


Quickselect. Very elegant, fast and solves the problem. I'm surprised many people don't know about it.

https://en.m.wikipedia.org/wiki/Quickselect


In the tradition of naming things terribly this is called std::nth_element in C++


It would be hard to beat Floyd–Warshall algorithm. Think about the problem of finding shortest paths between all pairs of nodes in graph. Sounds trivial? Would you believe if someone told you it can be done in literally 5 lines of code? When I saw that for the first time I was in the disbelief. The elegance comes from these fact,

- Very non-trivial problem

- Just 5 lines of code

- Probably the most language agnostic algorithm

- Outrageously fast if graph fits in cache

- You can explain it to even non-programmers

- You can wear it on T-Shirt

- Doesn't require any fancy sub-algorithms, libraries or whatever


> Probably the most language agnostic algorithm

Are you sure about this point? I'd like to see a purely functional implementation of this, e.g. in Haskell without monoids. I believe it's significantly harder than a C implementation (which is indeed a few lines)


Not really, if you wanted to do a version using only immutable arrays:

  import Data.Array
  
  fw :: Int -> Array (Int, Int) Int -> Array (Int, Int) Int
  fw 0 graph0 = graph0
  fw k graph0 =
    let fw' = fw (k - 1) graph0 in array (bounds graph0)
    [ ((i, j), min (fw' ! (i, j)) ((fw' ! (i, k)) + (fw' ! (k, j))))
    | (i, j) <- range (bounds graph0)
    ]
The only difference with a mutable implementation is that you use a new array for every k, whereas in a mutable version you would reuse these.


IIRC you can express it with matrix multiplication, but there might be some big caveats on that.


Tarjan's strongly connected components algorithm.

It finds strongly connected components in a graph (read: cyclic dependencies), while doing a topological sort (read: you could use it for a package manager to determine which packages to install first).

It is proof of a very deep understanding of the nature of graphs.

Edit: https://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_...


At my work, we have a simulation system that uses Tarjan's algorithm to determine the order that update's need to be applied in order to properly propagate to all downstream nodes. Thought it was pretty neat when I realized our connected components code could do that for us.


Gradient descent is quite amazing in its power and simplicity. If we can figure out a good loss function and a model architecture with enough parameters, calculus can basically take care of the rest.


This pretty much my same pick. I'm completely blown away by the possibilities of a combination of gradient descent and clever loss/architectures. After learning how output shape and loss definition can result in wildly different tasks such as landmark detection, classification, or single shot multibox detection(bounding boxes AND classification wuuut?!) it really gets the imagination running wild.


Two phase commit, for example to guarantee once and only once message delivery in MQTT. I like how far you can get with such a simple protocol:

    A to B: send message (repeat  until ack)
    B to A: ack (resend if a repeat of message is received)
    A to B: commit message (repeat until commit ack)
    B to A: commit ack (resend if a repeat is received, but commit only once)


My favorite elegant and trivial algorithm has always been merge sort as it looks in Lisp/Scheme. My favorite messy algorithm is simulated annealing for its intuitive sledge-hammer approach. In graphics I like some off-screen rendering methods which are elegantly simple in applying brute-force to use the whole frame buffer as a lookup table.

For object picking (determining what the user clicked on) in a complex visualization, it is often easiest to draw all objects with a simple color-mapping renderer which follows the same occlusion rules as your visualization but renders each object as a solid blob in a distinct color. You draw the scene, look at the pixel color under the mouse, and use the color as an index into the table of objects.

For ray-cast volume rendering, you have a problem somewhat like object picking but you have to solve it simultaneously for all pixels in the rendered scene. You have to determine the ray intersections of each pixel's perspective through your volumetric data grid, so you can run a sampling loop to integrate the 3D scalar field values along that ray. When your grid has a simple cube/box shape, you can render a polygonized box with the viewing perspective and trivially color-map it so each surface of the box has an RGB value encoding its XYZ volume coordinates. Your volumetric pixel shader, running on the GPU, can then independently lookup these XYZ positions out of screen-sized buffers to determine the start and end positions for each screen pixel's ray integration loops as 3D texture coordinates.


The graph isomorphism protocol for Zero Knowledge Proofs. I think it's really elegant and easily understandable. Especially if you haven't seen a ZKP protocol before, it seems like it should be impossible, and then once you see the example, it seems so natural.

An example discussion from google: https://jeremykun.com/2016/07/05/zero-knowledge-proofs-a-pri...


I personally enjoy the Fenwick Tree: https://en.m.wikipedia.org/wiki/Fenwick_tree

What is it used for? It allows one to both query and update prefix sums in O(log n), for a normal array this would be O(n)/O(1) respectively.

The cool thing is that it can be emulated with an array and as such it takes just the same amount of space as an array.

The best part is that they can be implemented (simple version) in about 10 lines of C++ code.


The backpropagation algorithm is at once simple and incredible. It enables automatic parameter optimization for millions of parameters for extremely complex, nonlinear functions, all with just a basic knowledge of calculus.


Burrows-Wheeler transform https://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transf...

Has something of Cantor's diagonal argument about it.


Can you elaborate on how this is related to diagonalization (other than the surface resemblance)?


I always thought there was something almost magical about this one.


Long division, because it shows that you don't need computers to run algorithms and that, in fact, any kid can do it.


I find it pretty cool that it also works with polynomials in what's often called 'algebraic long division'


And it was amazing the feeling of power that gave him. https://urbigenous.net/library/power.html


This is like choosing your favorite friend, I love a lot of algorithm's in their own, special way. I could also answer in families of algorithms, but that would be cheating. Some of my favorites have already been mentioned, too.

So, I'll mention Monte Carlo integration. It's very simple to implement, it was one of the first tasks my first CS professor gave freshman students, and he did it for the same reason I love it; it gives such profound insights about how computers can solve complex mathematical problems humans can't. I shiver every time I solve a problem with a Monte Carlo method.


Gaussian integration.

You might have heard of "2nd order" or "4th order methods to calculate an integral. This means that the error drops off with the number of sampling points like N^-2 or N^-4, respectively. But Gaussian quadrature has spectral accuracy, which transcends this measure. Error goes down like an exponential of N.

Another neat feature is polynomials of degree 2N-1 or less are integrated exactly with this method. So if you have just 10 sampling points, you can calculate exactly the integral of a 19th order polynomial (times, perhaps, some known weighting function).


This is a bit misleading. If the integrand is analytic, then the Gauss quadrature converges geometrically.

For example with f(x) = sqrt(1 + sin(x)) and you approximate integral f(x) dx from -1 to 1 with n Gauss points, then the the convergence is geometric:

    n | approx
    1 | 2.0------------- 
    2 | 1.9172----------
    3 | 1.917703--------
    4 | 1.917702153-----
    5 | 1.917702154417--
    6 | 1.91770215441681
   
But with a function that is not differentiable such as g(x) = |x|, there is definitely not geometric convergence.

      n | approx
      1 | 0.0-----------    
     10 | 1.007---------
    100 | 1.00008-------
   1000 | 1.0000008-----



what?? the solution set is infinite, how does it choose?


Right!? Such is the magic of orthogonal polynomials. If one chooses the integration grid to lie at zeros of the appropriate orthogonal polynomial, information is gained.

https://en.wikipedia.org/wiki/Gaussian_quadrature


I really like Floyd's algorithm for sampling n elements without duplicates in linear time.

https://fermatslibrary.com/s/a-sample-of-brilliance


That's a neat algorithm. You could also do part of a Fisher-Yates shuffle, if you're ok allocating the array.


Surprised nobody's mentioned Hyper Log Log (or its more recent variants). A probabilistic constant-space unique cardinality estimator (i.e. if I see a billion events, how many are unique?) that also supports set operations.


The underlying reasoning behind PCSA/HLL (that if you squint, you can use the zeros and ones from a hash as analogues to coin-flipping, and can use the probability of the rarest event seen to estimate the total number of events seen) is excellent.

Count-min-sketch is also beautifully simple.


Really nice list of algorithms here...

Math is definitely foundational for Computer Science and for that reason one of the most interesting algorithms I've personally encountered is the one for computing Principal Component Analysis (PCA)[1] using Singular Value Decomposition (SVD)[2].

It's just amazing how you can easily represent N dimensional objects in 2D. It's used a lot in Machine Learning. One of the interesting applications of the algorithm is Face recognition (eigenfaces method).

[1] https://medium.com/100-days-of-algorithms/day-92-pca-bdb6684...

[2] https://web.stanford.edu/class/cme335/lecture6.pdf


The "FRAIG" algorithm is pretty cool. To find equivalent nodes in an And/Inverter graph (e.g., a hardware circuit), it operates by... (1) simulating the graph on random numbers to identify nodes that might be equivalent (this can be done with huge parallelism by using wide, bitwise and/not operations) (2) using SAT to determine if these candidates really are equivalent (3) building a new network that canonicalizes all of the nodes you've found to be equivalent.

It can be a really effective pre-processing step for many kinds of circuit analysis.

https://people.eecs.berkeley.edu/~alanmi/courses/2005_290A/p...


more like a reduction to SAT than a standalone algorithm though.


I really like Viterbi. It has applications all across computer vision and filtering and it takes full advantage of the magic of dynamic programming.


Same here! I blogged about it over 5 years ago (time flies) [1]!

Amusingly, despite having done a ton of Computer Vision (I worked for an image recognition company for 4 years) I never used it for that purpose. The main practical use case I had was related to the use of Trellis modulation [2] for band-limited communication channels.

[1] https://blog.separateconcerns.com/2013-03-03-viterbi-algorit...

[2] https://en.wikipedia.org/wiki/Trellis_modulation


R-Trees for spatial indexing.

What?: Quite powerful data structure for geographic data.

Why?: I dabbled with geodata for quite a bit before discovering PostGIS and the R*-tree. Operations that took me several seconds before (geojson+ruby) could be computed in well under 100ms directly on the Database.

[0]: https://en.wikipedia.org/wiki/R-tree



Agreed. +1 for "Floyd's Tortoise and Hare".

-ss


Myers Diff. It hits the trifecta:

1. An irrefutable improvement over the state of the art.

2. A short paper which can be understood after only a dozen readings or so. I mean really understood, with visualizations and everything.

3. A practical algorithm which can be implemented by nearly anyone (even me).


I'm by no means a Haskell evangelist, but I believe it's a great educational tool for devs.

You've made a Fibonacci number algorithm. Recursive it's slow, iterative it's nasty. There's another way

> let fib = 0 : 1 : zipWith (+) fib (tail fib)

Then to get the 10kth Fibonacci number you can

> fib !! 10000

It's fast. It's tiny. It has no risk of stack overflow because it's not a recursive function. It illustrates how and why lazy evaluation is important and even better than eager evaluation in many cases. I use this principle in my C#/Java work a lot.

It's my favorite.


Or the even more succinct

    fibs = 0 : scanl (+) 1 fibs
scanl is similar to a fold, but returns a list of successive reduced values from the left.


That's fantastic!


My favorite is the FLAC algorithm. It's fairly simple to explain - take a signal input, pattern match a generative signal that closely matches the original or part of it, and the output is a combination of that generative code and the difference between it and it and the input.

Second favorite is the LISP REPL. Specifically the Eval function, thanks to this classic video of Sussman https://www.youtube.com/watch?v=0m6hoOelZH8


The actual algorithm you are describing is called https://en.wikipedia.org/wiki/Linear_prediction .


That video, appropriately enough, also contains a definition and explanation of what the Y Combinator is.


Shamir's Secret Sharing. I don't have a smart description, it's just wonderful. Split the secret in N parts, require N - K parts to reconstruct it. Use cases are numerous.


SSS is one of my favorites because it takes concepts I learned from basic algebra, transliterates then to finite fields, and turns them directly into industrial-grade crypto.


For me, that would be Eller's algorithm: http://www.neocomputer.org/projects/eller.html

It generates perfect mazes of arbitrary size M*N, using only O(max(M, N)) memory.


Dijkstra's Shunting algorithm, also called the Twin Stack algorithm, is a work of art in its simplicity and function. It can convert an algebraic express to a RPN sequence that can be executed on a stack machine. I like to use it to teach students the value of using stacks.


Been learning about Hashlife[0] the last few days. It's pretty cool.

[0] http://www.drdobbs.com/jvm/an-algorithm-for-compressing-spac...


I agree. If an algo can demonstrate the huge gain between a clever algorithme and a naive one, it's this one.



Fortune's algorithm for generating Voronoi diagrams from a set of points on a plane is a beautifully elegant algorithm with a unique visual component.

Check the external links on the wikipedia page for an excellent implementation that you can run in your browser.

https://en.wikipedia.org/wiki/Fortune%27s_algorithm


Perturbation theory[0] is an algorithmic way to find approximate solutions to problems you don't know how to solve. The idea is to perturb a known solution to a closely related problem. It is pivotal in quantum mechanics.

[0] http://www.cmls.polytechnique.fr/perso/paul/6paul2.pdf


Any dynamic programming algorithm.

Heap sort using an array impressed me as an undergrad.

FFT continues to amaze me as a scientific programmer.


Learning Huffman Coding in SICP was an amazing thrill--and all these years later it's still up there in bang-for-the-buck neatoness.


I remember having my mind blown studying the Ukkonen algorithm, for string search prefix trees. Especially the fact that was built from the naïve implementation O(N^3) brought down to O(N)


Sieve of Eratosthenes, visualized here:

http://xrd.github.io/angular-algorithms/web/index.html#/algo...

Spelling eratosthenes is harder than this elegant little algorithm.


I would never have made it through Project Euler 1-100 without that thing


I'm into computer graphics, so for me it's raytracing. The principle is simple, so one can quickly code some "shiny spheres" example, but then there's a lot of interesting connected stuff to learn: linear algebra, spatial data structures, Monte Carlo extensions, ...


Constructive Solid Geometry using BSP trees.

There is a javascript implementation [0] which is where I saw it the first time, it is about 500 lines of fully commented code. The algorithm is so simple, I absolutely love it.

Basically, when performing a union on two meshes, we determine the 3d BSP trees from both meshes, then simply clip each mesh using the other's tree. The output is the union of the two leftover meshes.

And since both intersection and difference can be written as a combination of inversion and union, they are simply composed from those.

I can't find a good paper on the algorithm though, not sure where exactly it originated.

[0] https://github.com/evanw/csg.js/blob/master/csg.js


Nice! I studied the library by porting it to TypeScript.

https://github.com/birkir/csg.js/blob/patch-1/csg.ts


Nice work...

I've ported this algorithm to Python, but mine is nowhere as succinct as yours (or the original). For my needs, I require the meshes to remain watertight (if the inputs are watertight), and this specific implementation doesn't preserve it. But I still love the algorithm due to its conceptual simplicity.


Any implementation of the Poisson Disc Sampling algorithm. Really nice for evenly spacing out player starting locations on a world map :)

https://www.jasondavies.com/poisson-disc/


Did anyone mention Quicksort - one of the top 10 algorithms of the last century? source: https://algs4.cs.princeton.edu/lectures/23Quicksort.pdf


Not exactly an algorithm, but I like recursion as a pattern to be just insane to even think about.

You can build an entire working structure from base operations, exit conditions and logic that scale to higher states of the operations.

Extending the same, TCO is another very elegant concept in CS.


Same feeling here about recursion. I find it really cool that some algorithms can be described quite concisely and often simply, using recursion, sometimes more so than iterative or non-recursive algorithms for the same problem.

The same goes for some data structures, a simple example being a list, as often defined in functional programming languages. E.g.:

A list is either:

- empty, i.e. []

- a head (an element) followed by a tail (which is also a list), i.e. h | t

Lists of any size fit that definition because of the recursive definition.

Similar for binary tree, being either just a node, or a node with left and right child, each of which is a binary tree.


check out a book called "Godel, Escher, Bach". It's an incredible book about recursion in life, and in some sense makes the argument that consciousness arises fundamentally from recursive loops (a system examining itself)


The first time I implemented the backtracking algorithm it felt really special. I was struggling to solve 8 queen problems for a couple of nights, banging my head around it. And then when it worked, I knew my love for programming was not going anywhere any time soon.

It was ugly, I had used 8 for loops, each running for 8 iterations, but, I still felt awesome when it worked. Before this, I had written only lab programs. Next, I applied the backtracking algorithm to solve Sudoku recursively, and that's how it all began. I entered the world of algorithms.

I know it's just one form of brute force algorithm, maybe doesn't even count as one, and is not that elegant, but, it will always remain my favorite.


> I know it's just one form of brute force algorithm, maybe doesn't even count as one, and is not that elegant, but, it will always remain my favorite.

Backtracking is really a depth-first search over the search tree, so it's definitely an algorithm and I would say it's "elegant" in some sense (like simplicity).


Kalman Filter - Smoothed out all the outliers on a location tracking project I worked on - https://en.m.wikipedia.org/wiki/Kalman_filter


This. Kalman filters have proved to be very good at finding a signal in noise. A-GPS is one of the most common use cases, but I've used it in forecasting and HVAC control among others.


I think Krylov subspace methods are particularly sophisticated and elegant, and they come with the added advantage of being best-in-class.


Lempel–Ziv–Welch is pretty cute. I love how the decompressor/receiver builds up the dictionary in sync with the compressor/sender at the same point in the datasream without ever needing to store/send the dictionary.


I loved that algorithm because of its elegance and simplicity and also as you pointed out about the data stream. It would be fun to implement this as hardware block and that was what it was intended for i.e. on the fly compression and decompression between interfaces.


Oh, man. So many answers. The blockchain of bitcoin is pretty gorgeous, if that counts.

AES is beautiful as well. And I love Huffman encoding. hm, what else. Bloom filters are awesome.

I suspect FFT is beautiful, too, but it surpassed my ability to understand.


Out of curiosity, what do you find beautiful about AES?

I personally think that Speck is pretty cool because of its extreme simplicity, but I can't see anything special about AES.


I'm sorry, I meant to say RSA. My mistake. I find the mathematics of it quite elegant.


> Eller's algorithm creates 'perfect' mazes, having only a single path between any two cells, one row at a time. The algorithm itself is incredibly fast, and far more memory efficient than other popular algorithms (such as Prim's and Kruskal's) requiring storage proportional to only a single row. This makes it possible to create mazes of indefinite length on systems with limited memory.

http://www.neocomputer.org/projects/eller.html


I like the method of building a binary heap "from the bottom". The elegant part is to see, that it is done in linear time :)

I also like "greedy" (Matroid) algorithms and Huffman trees :)


Gosh, I have a few!

Fistly, wavelet trees! Wavelet trees with something like RRR encoding for the bit vectors lets you work with massive datasets in positively tiny space and constant time. They're not even expensive to construct! My favorite introduction to the whole space of rank/select-friendly structures is Alex Bowe's tutorial: https://alexbowe.com/wavelet-trees/

I constantly agitate for people to realize that we now have an algorithm for O(n) generalized sorting. It's called discrimination sort, and while it's a complex subject its quite elegant once you internalize the algorithm. There's a talk by Prof. Henglein here: https://www.youtube.com/watch?v=sz9ZlZIRDAg

2017 saw the recommendation of a truly phenomenal approach to indexing data, using simple leaning structures. These indexes are poorly explored, there's still a ton of headroom here, but the paper is an incredibly cool read and very approachable: https://arxiv.org/abs/1712.01208

Another really cool family of algorithms and associated structures in distributed systems is CRDTs. They're poorly understood, but in research they have largely superseded operational transforms. The Wikipedia page branches off into lots of good research: https://en.wikipedia.org/wiki/Conflict-free_replicated_data_...

If we slightly broadened the lens of what is an algorithm, my favoriate category-theoretic approaches to solving problems include Recursion Schemes which are a technique to totally separate the specification of recursive algorithms over data structures from both the structures and the code that the algorithm runs to compute a result (Patrick from Github has a great series on them here: https://blog.sumtypeofway.com/). I also love the modern and rapidly growing "Ghosts of Forgotten Proofs" as a way to let the compiler assert you've properly built and/or terminated values: https://github.com/matt-noonan/gdp-paper/releases/download/j...


Discrimination sort seems interesting, though couched in such complex terms that it's hard to follow (admittedly, I read the paper[1] instead of watching the video).

As I understand it, it's a generalized MSD radix sort, and the difference in complexity analysis from traditional pairwise sorts is what the N stands for.

For strings, we know that a fixed-size prefix (say a character) that can take on a known finite set of values is capable of discriminating; that is, being a partial order on strings. From this we can construct a radix sort on strings that never has to revisit a character. So the net running time is approximately O(total number of characters in all strings). In contrast, a naive comparison sort may operate in O(nlogn) comparisons, but each comparison has a cost proportional to the length of the string, since we have to re-compare from scratch, such that this works out closer to O(log n * total number of characters in all strings).

My recollection is that MSD radix sorts are somewhat space-intensive, especially if we want them to be stable. I think O(n) space is required. Nonetheless, this is very interesting.

[1] https://pdfs.semanticscholar.org/7e49/0023f84845c750f4a04b7d...


Do you have any real-world examples where wavelet trees are used? They look really interesting, I'm just not sure where I'd ever use them!


In the hashing department:

- Zobrist hashing. It's completely incredible that such a trivial scheme is a near-perfect hash for sets. That's before you even consider the useful ability to compute hashes of related sets with added/removed-elements by simple XORing. (Caveat: I'm repeatedly tempted to use this magic for large sets; but it stops working well when number of elements > number of bits, because the matrix is over-determined, so there are easy-to-find subsets that contribute 0)

- Cuckoo hashing. Before going into the specific way it pushes around elements to usually fit exactly 1 per slot, there is a basic idea to grok that makes 2 possible places for an element work much better than 1: "the power of 2 choices". This is also the reason even a little load-balancing is effective. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.25.8... is a great overview.

- Merkle DAGs. Immutable content hashes are about the only non-painful form of distributed pointers humanity has found. OK, that's an idea not exactly an algorithm, but for example the easy ability to recursively diff 2 git trees is a neat algorithm.


The Sauvola adaptive thresholding algorithm did a great job as i tried to clean up scanned text images before OCR processing. It's quick, quite easy to implement and produced great results. Here is a sample: http://scikit-image.org/docs/dev/auto_examples/segmentation/...


Not an algorithm per se but the proof to the Halting Problem blew my mind with it’s simplicity.


SimHash is beautiful once you understand how it nudges apparent randomness into useful order.


A better voting algorithm for a better democracy: allow more than one vote per person. This eliminates numerous problems with today's one-vote-per-person. It's simple (no voter confusion) and results in better outcomes.

https://80000hours.org/podcast/episodes/aaron-hamlin-voting-...


No you should get just one vote, but dole out any number of fractions of it to whomever you wish as long as the total is 1.

If only the general population understood fractions :-)


The whole point of the alternative I link to is that it's simple. If you like the Democrats _and_ the Green Party, you can now vote for both. This prevents the Green Party from "stealing" Democratic votes.

Because of our the current poorly-designed system, we have an artificially low number of votes for non-Democrat and non-Republican parties.



Apirori algorithm for finding frequently occurring subsets in a set of sets of objects. I find the algorithm beautiful because:

1) It takes advantage of a simple insight: all frequent subsets must be unions of other subsets that are at least as frequent.

2) It uses a Trie data structure in a very beautiful way, traversing it in-order while searching for elements from a lexically sorted list of lexically sorted lists.


I find Reed–Solomon error correction and forward error coding mind blowing.


I've always been a fan of Geohashing: https://www.movable-type.co.uk/scripts/geohash.html

Really simple, elegant solution to a hard problem. There are some alternatives that are arguably better in ways, but none are nearly as simple to comprehend


The Boyer–Moore majority algorithm [1] which allows you to find the majority element (if one exists) in an unsorted array in linear time and constant space.

[1] https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_majority_v...


..and its generalization [2], which finds the (at most) k elements whose frequency is more than 1/(k+1) in the unsorted stream (k=1 being the majority algorithm in [1].)

[2] https://www.cs.bgu.ac.il/~dinitz/Course/SS-12/Karp-frequent-...


Fenwick Tree (Binary Indexed Tree)


cool data structure indeed.


BOGO sort! /s

I really like backtracing algorithms used to solve huge graphs without actually visiting all nodes. Pretty elegant IMO.


I don't know if it can be considered an algorithm, but i found bezier curves generation super useful and simple


Well, on the one hand, it's "just" nested linear interpolation with some control points, and traditionally, interpolation falls under mathematics. On the other, the wiki page mentions computer contexts first[0].

I think most of us would agree that there is a fuzzy boundary between mathematics and algorithms, or even that the latter is a subset of the former. Bresenham's Line Algorithm[1] is an algorithm, but it can also be interpreted as a kind of mathematics to determine points on a grid based on a path through that grid.

More importantly though: yes, it is a very useful and simple tool in many contexts.

[0] https://en.wikipedia.org/wiki/B%C3%A9zier_curve

[1] https://en.wikipedia.org/wiki/Bresenham%27s_line_algorithm


Floyd–Steinberg dithering of images:

https://en.wikipedia.org/wiki/Floyd%E2%80%93Steinberg_dither...

It can simulate color transitions with fewer colours by using a simple error diffusion algorithm.


I love this algorithm. Had a lot of fun implementing this in Java and Javascript. It gets a bit tricky though if colors have an alpha channel.


Euler's proof of infinitude of prime numbers.

https://automorph.wordpress.com/2012/06/20/eulers-analytic-p...


not an algorithm


I like Fortune's algorithm for Voronoi regions. Especially if it is being visualized - watching the beach line sweep through and the polygon regions form is kind of hypnotic.

I also have spent way too much time trying to get an implementation of it right, so there's a little stockholm syndrome.


The Algorithm or the concepts behind Asymetric key encryption https://en.m.wikipedia.org/wiki/Public-key_cryptography

This makes so much of online trust and verification possible.


Weighted randomization.

Objective: You have a list of things with a popularity score and want to get a random item from the list, but you don't want just ANY random item. You want it to be a generally popular item.

How it works: You take a key for the weight, we'll say `popularity` (0-100). And you have maybe 1,000 rows of content, each with said weight.

You sum up all the popularities to variable X. You set a random number between 0 and X; call it R. You iterate through your 1,000 rows (Y), and each iteration you subtract Y's `popularity` from your random number, R, and when R <= 0... you use that row.

---

Quick Example: https://jsfiddle.net/rL6a01ku/

Press "RUN" over and over to see the numbers change, but basically still stay the same.


The "binary heap" https://en.wikipedia.org/wiki/Binary_heap is a very useful and extremely simple algorithm; it is in my toolbox of fundamental algorithms.


B Trees. You can call them a set of algorithms or a data structure if you prefer, but its core idea can be expressed as an algorithm: making the root of the tree emerge from leaves. It's kind of democratic!

Edit: it's also a massively practic idea that powers most databases.


And many filesystems, too, NTFS, Ext4, etc.


Xiaolin Wu's algorithm for antialiasing

https://en.m.wikipedia.org/wiki/Xiaolin_Wu%27s_line_algorith...

Very simpel in concept and applies to all continues functions...


https://en.wikipedia.org/wiki/Paxos_(computer_science)

It took me days/weeks of processing to understand it (and appreciciate the problem(s) it solves).


It's very concise but very tricky at the same time, and it solves a fondamental problem. Even though I've spent some time studying it, I'm not sure I'd be able to figure it out by myself.


Hashcash, a proof-of-work system adapted by Bitcoin:

https://bitcoinmagazine.com/articles/genesis-files-hashcash-...

1. It solves an extremely practical problem (spam) using a very simple idea (provide a value that when appended to a message gives a hash value within an acceptable target range).

2. It can be understood by non-experts who simply understand (or accept) the one-way nature of cryptographic hash functions.

3. It can be implemented manually, given a working hash function.

4. It sat around for a decade in obscurity until someone dusted it off to build a system most thought was impossible.


I love many of those posted already (in particular FFT and DH) I just wanted to add DHT, for example Kademlia. I also like RC4 for its simplicity, unfortunately it's no longer considered secure.

PID controllers are also a great invention, does that count as an algorithm?


Reservoir sampling or Algorithm R. It allows you to randomly choose a sample of k items (with equal probability) from an "infinite" stream of items. At first it sounds impossible, but the solution and proof are extremely simple and elegant.



Minimax. I love games. It blew my mind that a single algorithm could play so many games to perfection.

Given chess was a bit too computationally intensive but pair minimax with a neural net evaluation function and you get close to what AlphaZero is doing.


High order algorithms for stochastic differential equations.

https://epubs.siam.org/doi/abs/10.1137/09076636X

Somehow, even when things are almost surely not differentiable anywhere, you can develop algorithms which to a higher order (matching some idea of non-differentiable Taylor series) approximate the function. This is a beautiful idea since when you first do deterministic numerical analysis, all of the derivations require differentiability, but now this really expands your idea of what differentiable means.


I like various, older compression algorithms: Huffman, LZ77/LZSS, LZ78/LZW.


In graphics.... I'm a fan of - Sphere tracing distance fields (Ray marching distance fields)

and

- Marching Cubes


I really like sleepsort and sometimes ask interviewees to implement it during interviews. In JavaScript, its simple to implement and reinforces the asynchronous nature of the language. Its fun and usually gets a good laugh.


HN is great. There are so many fun algorithms listed here. My favorites are already mentioned but a few important ones are missing:

Boyer-Moore fast substring searching. Simple to understand and has great performance. [1]

Alpha-Beta game tree search. I really like Knuth’s paper on it. [2]

[1] https://en.m.wikipedia.org/wiki/Boyer–Moore_string-search_al...

[2] Artificial Intelligence Volume 6, Issue 4, Winter 1975, Pages 293-326


Google's hash map [1], which is a fresh look at a heavily studied problem.

[1] https://www.youtube.com/watch?v=ncHmEUmJZf4


You may also find the YouTube channel PapersWeLove interesting [0]. I particularly recommend Casey Muratori's episode, where he talks about marching cubes, quaternions, and a fast way of computing the distance between complex objects in 3D [1].

[0]https://www.youtube.com/user/PapersWeLove/videos

[1]https://youtu.be/SDS5gLSiLg0


LZW I implemented this ages ago to decompress tiff files. I thought it was amazingly elegant that the decompression side of it built its dictionary on the fly as the compressed data came in.


I like skip lists (technically a data structure, but with associated algorithms). I love the simplicity vs. other balanced tree algorithms. Also I'm a fan because it's a relatively recent development that shows that progress is still possible in fundamental areas.

https://en.wikipedia.org/wiki/Skip_list

Some runners up: Gosling's dynamic programming algorithm for optimizing text editor screen updates and gap buffers.


I had an assignment where I needed to implement generic Skip lists in Java, I had a great time doing it, I really enjoy them as a data structure


My absolute favorite is probably radix-sort. It's simple enough to explain to a child, easy to implement correctly relative to comparison sorts, and occasionally actually useful.


Euclidean method to calculate GCD using visualization [1]

Kosaraju two pass algorithm, this one blew me over when I first read it and I am still impressed by the ingenuity of this algorithm [2]

[1] - https://www.youtube.com/watch?v=kiFfp-HAu64&t=326

[2] - https://en.wikipedia.org/wiki/Kosaraju%27s_algorithm


I want to point out that this actually makes a great interview question.

I've had really good results asking this. I find it's a great question, because it allows the candidate to show off their knowledge, and use some knowledge that they likely prepared for ahead of time. I also like it because sometimes I get to learn something really interesting from the candidate.

It's also an easy way to filter people when they can't name even one algorithm, or their favorite is "that sorting algorithm"


The Rabin-Karp string search algorithm.

Wikipedia entry:

https://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm

Implementation example (2 GB/s scan on modern 3GHz CPU):

https://github.com/faragon/libsrt/blob/master/src/saux/ssear...


K-means clustering, dead simple unsupervised clustering algorithm.


I’ll mention not a CS algorithm but a machine learning algorithm: random forest.

I love the idea of adding not _structure_ but _randomness_, in order to better find a solution to some problem.


Xor Swap.... https://en.wikipedia.org/wiki/XOR_swap_algorithm

not because it's super practical ( and there are other variations using other operators )

I like it because it is super simple and when I first encountered it early on in my learning, it was really not immediately obvious why it worked. It's probably been the simplest piece of code that's surprised me.


I was asked this in an interview. I was a bit pissed


Doesn't sound too bad; a basic property of XOR is that two of them cancel out, so y ⊕ x ⊕ x is just y.



From a pure algorithm perspective: Buzhash. It's extremely simple and elegant (it mainly builds on the reversibility of XOR) and extremely fast. Used e.g. in Attic/BorgBackup for data deduplication.

For data structures: Either Hash Array Mapped Tries (HAMT; used in most immutable data structures) or Log-structured Merge Trees (LSM-Trees; an important concept for databases, mainly very useful because random access on "spinning rust" is very slow).


My favorite algorithm is of course one of my own: a dynamic error correction code which is capable of on-the-fly fixing of errors occurring due to data races in a highly parallel unsynchronized computation. I demonstrated it in the domain of cellular automata, which has non-trivial concurrency topology (dependency DAG) and is Turing complete. Yet, I see little possibility of using it for anything practical. A barren beauty. Sigh.


BPE (byte pair encoding) as used in state of the art neural translation models , maximize the information per token by greedy iterative merging of the most common (lowest information) tokens, until a predefined max number of tokens is reached https://en.m.wikipedia.org/wiki/Byte_pair_encoding


Raft Consensus Algorithm¹ - This algorithm plays an important part of many modern database systems. It has a wide variety of implementations in many languages which makes it easy to study and solid academic backing. As well as a cool visual representations of how the consensus actually works as seen in the link:

¹ https://raft.github.io


Cantor's proof [1] of impossibility of a surjection into the powerset.

Although, calling this an algorithm may offend applied mathematicians.

I would call it an algorithm since explicitly defined sets are constructed to show the contradiction.

[1] https://en.wikipedia.org/wiki/Cantor's_theorem#Proof


Interpolation search [1], which allows you to search a sorted array in O(log(log(n))) time, assuming its elements are uniformly distributed (for instance, you are searching a sorted array of hashes.)

[1] https://en.wikipedia.org/wiki/Interpolation_search


icgrep (http://www.icgrep.com) has a clever approach to searching text for arbitrary sets of characters in a parallel manner. If 128-bit registers are available and we're processing 8-bit characters (for example), the next 128 characters from a file are first loaded into 8 SIMD registers. Then a bitwise transposition is performed so that r0 contains bit 0 of each input character, r1 contains bit 1, etc., up to r7. Then with a series of logic operations across those registers (imagine the sum-of-product result of a k-map of characters of interest), searches for any arbitrary subset of characters can be performed in parallel, leaving 1s in the corresponding position of the result register for each match and 0s elsewhere. icgrep supports full regex patterns and Unicode (using 21 registers instead of 8), but what interests me are the elegant transpose and logic steps.


The one I’ve discovered most recently is Fibonacci hashing.

https://probablydance.com/2018/06/16/fibonacci-hashing-the-o...


'Smart' Bingo cards generator https://groups.google.com/forum/m/#!msg/comp.lang.php/ECOu7o... from R. Rajesh Jeba Anbiah

Disclosure: Met Rajesh couple of times


Nagle's algorithm: I re-apply it whenever I need very simple throttling of events that could occur in rapid succession.


Nothing Earth shattering but the algorithm for finding anagrams: just sort by letters and compare resulting strings. It was probably the first elegant algorithm I saw before even seeing a computer. I still have 70-ish chars long Perl oneliner I wrote nearly 20 years ago, which lists anagrams from the provided wordlist.


Leslie Lamports Paxos protocol has to be in to top 5 algorithms/protocols (https://lamport.azurewebsites.net/pubs/lamport-paxos.pdf). This enabled so much progress in computing its unreal.


I think the beauty comes out when referring to the problems the algos solve:

a. Solving the nut/bolt matching programming challenge with Merge sort.

b. Backtracking e.g. finding the ordered pair of braces

c. Not to mention the various tree/ graph walk problems

d. I screwed my Google interview last round not knowing the problem could be easily solved topological sort


By far Dekker's algorithm to ensure mutual-exclusion in concurrent programming: (https://en.wikipedia.org/wiki/Dekker%27s_algorithm)

Less than 10 lines of code to create a piece of art.


In terms of physical beauty, my favorite algorithm is the voronoi diagram. The 2D variant is rather boring, but the 3D version looks great and is capable of representing many natural phenomena from cellular division to material fragmentation (a common use is procedural generation of rocks).



Earler Parser algorithm for all kinds parsing all kinds of context free grammars.

One reason I am amazed is because it's a dynamic programming solution to a highly formalized, abstract problem. I consider it as a kryptonite to my inner, formalism enthusiast functional programming fanboy persona.


Counting sort. It's extremely elegant and has odd complexity properties that make it interesting.


The Sieve of Eratosthenes is one of my favorites, and surprisingly has yet to be mentioned.

I imagine I would spend many days researching prime numbers if I hadn't read "The Mystery of the Aleph" in time. I'm keen to keep what sanity I have left.


Priority encoder based on bit-twiddling: A&(-A) isolates least significant set bit of A. Allows you to make fast round-robin arbiters in FPGAs by taking advantage of the dedicated carry chain.

Also the compress and extract algorithms from Hacker's Delight.


A good path finding algorithm makes me happy. One that balances performance and efficiency.


Viterbi algorithm used in convolutional codes

It is used in DVB-T combined with Reed-Solomon code for example.

https://en.wikipedia.org/wiki/Viterbi_algorithm


http://mathworld.wolfram.com/PSLQAlgorithm.html

With some clever tricks you can use it to "recover" a surd from a decimal

ie 1.4142135624 -> sqrt(2)



Problem: are two points in the same polygon? Solution: a line drawn between those two points has an odd number of intersections with the polygon if they are both in it (or outside of it), and even number if they are not.


That's under-informative, we only know that both points are in the same polygon or outside of it. The underlying elegant algorithm is that of knowing whether a point is inside a polygon: a ray from that point out to infinity intersects the polygon an odd number of times if the point is in it, even otherwise.


Rock, Scissors, Paper in a circular array.

All you have to do to test for win condition is check if the other player has the next item in the array.

Equality is a tie.

Base case is a loss.

It's also my favorite case of structuring data in a way that makes the algorithm self-evident.


I love the Graham scan algorithm for computing convex hulls of sets of points. That and the fast exponentiation algorithm, which can be used to compute linear recurrences (e.g. fibonacci) in O(log n) time.


One of my very favorite basic ones: Point in polygon.

https://en.wikipedia.org/wiki/Point_in_polygon


Hungarian Algorithm for cost optimization:

https://en.wikipedia.org/wiki/Hungarian_algorithm


Metropolis-Hastings (aka Markov chain Monte Carlo). Without this very simple (in principle) algorithm and its many descendants, much of modern applied Bayesian statistics would be impossible.


Hash functions are so powerful in so many algorithms and optimizations


cps transformation (continuation passing style). Transform any control flow patterns (loops, calls, non-tail recursion, exceptions, coroutines) into nothing but closures and tail calls!


Also a step into non determinism and some logic programming.


My favorite is the sleep sort.

Previous discussion: https://news.ycombinator.com/item?id=2657277



Euclid's GCD algorithm, and Russian peasant multiplication.


I never heard about "Russian peasant multiplication" before. When I read it, I thought you were referring to Karatsuba multiplication. As it turns out, you didn't.

http://mathforum.org/dr.math/faq/faq.peasant.html

https://en.wikipedia.org/wiki/Karatsuba_algorithm


Yup. Applying Russian multiplication to matrices is surprisingly effective for evaluating recursive relations (it can evaluate the nth Fibonacci number in O(log n)) Karatsuba is also a nice 'proof-of-concept' algorithm (it shows that it is possible to do better than O(n^2) at multiplication in the simplest way possible) that's also very practical - and it has a nice anecdote behind it :)


I like this version of the Euclid algorithm:

Make a rectangle with side lengths of 2 positive real numbers. Put the biggest square inside (i.e. against a short side) that you can, add as many as fit. If there's still a gap, put the biggest square that fits in that, repeating until filled.. If you never fill the gap, the ratio is irrational. The number of squares of each size used gives the ratio's continued fraction.


in visual form (hold shift and move mouse): https://shaunlebron.github.io/ratios/


Though it's a learning model, not an algorithm: Support Vector Machine. I love the geometric intuition behind max-margin hyperplanes leading to good generalization.


I love the Sieve of Eratosthenes. Even if it's not the most optimal, I find it a very clever and beautiful way to find prime numbers up to a given point.


So the typical sieve of Eratosthenes is O(N log log N). Perhaps surprisingly, you can actually write a sieve that's O(N).


Naïve Bayes. A prediction algorithm that can be developed from simply creating hash tables. Very neat and elegant and results are pretty darn good.


The median of medians algorithm still occupies a special place as one of the first algorithms that seemed to be magical.


The (fairly simple) nimber algorithm for optimal nim play which can then be applied to other games e.g. dots and boxes


I learned this while hanging out in the library (Martin Gardner’s column in Scientific American). I put it to good use years later in college: I decided to implement a Nim playing circuit (out of RTL logic gates, yuk!) for my digital design lab course final project in 1975. It was a complex sequential circuit, but the result was quite remarkable. It played perfect Nim with up to four piles of 1 to 16 tokens.


I'm reading dots and boxes by Elwyn Berlekamp. It describes how to relate the two games. That might interest you.

I like your project idea. I'm not hard core enough to build something from transistors but I'd love to give nin a go in assembler on an arduino. Something to add to my backlog!


Fisher-Yates shuffle. An elegant and surprisingly intuitive O(n) algorithm that doesn't take up extra memory.


Check out Sattolo's algorithm as well. Good writeup here: https://danluu.com/sattolo/


When learning to analyse algorithms, I found Kruskal's and Dijkstra's algorithms very interesting.


Fast Fourier Transform. I remember learning about using roots of unity and thinking it was black magic.


x = x + y; y = x - y; x = x - y;


Exponential smoothing

It's such a simple formula, but manages to smooth and filter signals nicely.


Boyer-Moore substring matching


I really like how simple but powerful MapReduce is for solving some problems.


Quicksort


TimSort


KMP algorithm for finding M patterns in one text in time O(M+len(Text)).


Binary Space Partititoning


Raft, because it's designed to be easy to understand.


I'm a fan of the three parameter sign fit https://ieeexplore.ieee.org/document/469117. Sorry for the paywall. Essentially if you want to fit a sinusoid to some data, and know the frequency, you can transform it into a linear least squares problem. If anyone is interested I can type up the math when I'm not on mobile, it's pretty straightforward.

Honorable mention: Kahan summation algorithm https://en.m.wikipedia.org/wiki/Kahan_summation_algorithm


HyperLogLog (more of a data structure, but still)


Projected Gauss-Seidel for solving (mixed) LCPs.


Bogosort. It doesn't work (until it does).


Oblivious transfer [1], especially as it pertains to secure multi-party computation (MPC). It's not as easily visualized as Diffie–Hellman or Shamir's Secret Sharing (both of which have already been mentioned), but a good chunk of what MPC allows for [2] seems like magic.

[1] https://en.wikipedia.org/wiki/Oblivious_transfer

[2] https://en.wikipedia.org/wiki/Socialist_millionaires


Huffman Encoding.


Breadth first search is simple but effective


Memoizing the fibonacci function.


Yeah that’s a great example of memoization. Also, the surprising:

    phi = (1 + sqrt(5))/2.0
    def fib(n):
      return round(phi**n / sqrt(5))


ICP for 3d point cloud alignment


Floyd steinberg dithering


Burrows-Wheeler transform


Linear search.

No really. Do you see why?


Quicksort in Haskell.


Knuth's DLX


no love for simplex ? how strange... :)


i like the algorithm of nutrition mainly oriented on non-liquids. It's so elegant in its simplicity. namely: step 1: eat step 2: poop


Binary search


The Galerkin method, which is the underlying concept that finite element analysis uses. Suppose you want to approximate the solution to a PDE with a finite sum of basis functions.

The solution usually lives in an infinite dimensional Hilbert space, whereas our space of possible approximate solutions is a finite-dimensional subspace.

To find the "best" solution that lives in the finite-dimensional basis-function (or trial function) space, we simply choose the coefficients of the basis functions such that the residual is orthogonal to the approximate solution.


I like this, but I'll also contend that it's part of a broader question in applied math: What does it mean to be zero?

One way to define this is to say that a vector is zero when it's orthogonal to every other vector in some space. This gives rise to Galerkin and Petrov-Galerkin methods to solve differential equations. However, it also gives rise to linear system solvers such as conjugate gradient (CG).

Another way to define zero is to say that a vector is zero when it's norm is zero. This gives rise to least-squares finite element methods to solve differential equations. It also gives rise to linear system solvers such as GMRES.

Anyway, I agree with you, but wanted to add that it's an idea part of a greater strategy that gives rise to a huge number of good algorithms.


There is an algorithm to construct a polynomial over GF(2) of several variables in algebraic normal form that generates an arbitrary sequence when the variables are incremented like a radix-2 binary number.

In other words: these are polynomials with n variables. every variable is either zero or one. addition is modulo 2. Each variable corresponds to a single bit in a binary number. Can we construct a polynomial that produces any given length 2^n sequence?

One way this is done by induction on the number of variables. I find this proof to be extremely compelling. I derived it myself one time and it has stuck with me.

The faster way to do this is through a Walsh transform. This proof converts the problem to a matrix equation by simply manipulating a sum. The Walsh matrix itself is an interesting fractal because it's a Hadamard matrix.

These two simple proofs give different perspectives


Quicksort ( especially implemented in a functional language like haskell ). It's a perfect mix of an elegant algorithm and a beautiful implementation. Simple, elegant, beautiful and powerful. What more can you want in an algorithm.


Came into the comments to find this, and I'm going to expand on your description: I feel like quicksort isn't [uniquely] beautiful except when used alongside bubble sort to explain computational complexity (both big-O "worst case" and big-&Omega; "best case"). These were the examples that really made the concepts "click" for me.


Mildly amusing anecdote:

I was on the panel for interviewing software engineer candidates at a large software company where I worked earlier. I asked one junior candidate (having a few years of experience) to tell me how she would solve the set cover problem [1].

I illustrated the problem with a concrete example: a project needing to fill roles with different tech skills; there is a pool of candidates, each of whom had one or more of those skills; and the goal is to find the minimum set (i.e. number) of candidates, the union of whose skills matches the total set of skills needed for the project. (Contrived problem, of course, since just having multiple skills might not mean a candidate would have the bandwidth to use all of them in the project.)

She started out by making up an example set of skills, an example pool of candidate names and their skills, thought for a bit, and then started describing how she would solve the problem.

I asked her to stop, and then, using an OOP analogy, said something like: You are giving a solution for an instance of the problem. Can you give me a solution for the class of the problem? :) That is, give a solution in generic terms, without using a specific input data set. Don't remember whether she could do that or not. But she was quite good at all the other areas tested on, and got the job.


Had forgotten to give the link for the set cover problem (the [1] above). Here it is:

https://en.wikipedia.org/wiki/Set_cover_problem

I didn't know the following about it before (from the Wikipedia page):

[ The set cover problem is a classical question in combinatorics, computer science and complexity theory. It is one of Karp's 21 NP-complete problems shown to be NP-complete in 1972.

It is a problem "whose study has led to the development of fundamental techniques for the entire field" of approximation algorithms.[1] ]


Sieve of Eratosthenes, an ancient algorithm that stood the test of time.


B-trees are really beautiful. I also like the Aho-Corasick algorithm as used by fgrep. I actually started to reinvent this algorithm myself before finding out it was already done. It's essentially a way to add links to a trie such that you can find all occurrences of multiple substrings within a larger string with one pass through the larger string.


A properly implemented queue that utilizes the CPU compare and swap instructions on the pointers for nearly overhead-free, mutex free concurrency.


Here are some algorithms that I find elegant / beautiful:

- many recursive algorithms (I said why I think so, in another comment in this thread, agreeing with kamaal's comment about recursion - https://news.ycombinator.com/item?id=18236708 ), and recursive data structures too; to repeat: elegance and simplicity, although you have to think for a while to grok some of them - then it suddenly becomes clear how they work.

- Huffman encoding and decoding (mentioned by someone else here too; I had also commented about this on HN earlier (I think in an HN thread about old BYTE magazine issues being available on the Internet Archive). I had seen an elegant Huffman algorithm in an old BYTE issue; the author was Jonathan Amsterdam; IIRC, a tree was used to both build the codes and decode the encoded data;

- Depth First Search is cool; others in this thread said it too; "The Go Programming Language" book (code at gopl.io) has a nice example of it, which they use to implement a topological sort, to find a valid ordering of all (e.g. computer science) courses, given the prerequisite courses for each course. The code for it is pretty short and clear, which makes it more cool. Topological sorting has many uses. Scheduling (somewhat similar to the course ordering above) is one such use. Another interesting one is the tsort Unix command, which I used to use in C program compiler / linker commands in my early Unix C programming days. A typical usage (IIRC) is to pipe the lorder command to tsort as part of the compiling / linking process, and use the output in the surrounding compiler or linker command (using shell command substitution). I forget the exact details now (it probably involved a pipeline using the commands cc, ar, lorder and tsort), but it can be looked up.

Someone else mentioned XOR (exclusive OR). I once wrote two C programs for encrypting and decrypting files using this property of XOR, which I read about somewhere:

- if A XOR B gives C,

then C XOR B gives A,

for any bit patterns A, B and C.

Putting it in other words, if you XOR a byte A (from your input) with a bit pattern B (of byte length), then XORing the output (C) with the same bit pattern B, gives you A back as the output. So you can use it to encrypt and decrypt bytes, although the algorithm is easily decipherable, if you know about it. So caveat lector: it is not strong encryption, at all.

Demo of that in Python:

In [133]: for c in 'abcdefghij':

     ...:     oc1 = ord(c)

     ...:     oc2 = oc1 ^ 255

     ...:     oc3 = oc2 ^ 255

     ...:     print oc1, oc2, oc3

     ...:
97 158 97

98 157 98

99 156 99

100 155 100

101 154 101

102 153 102

103 152 103

104 151 104

105 150 105

106 149 106

You can see that the numbers in the 1st and 3rd columns above are the same. Another interesting observation is that the numbers in the middle column are incrementally decreasing.

So that rule can be used to encrypt the bytes of a file, by XORing each byte with some specific byte, and writing the XORed results to an output file. To decrypt the file, just XOR each byte from the output (now input) file with the same specific byte as earlier. I had written a C program to accept a string of characters and an input filename, on the command line,and to cyclically use the bytes in that string, to XOR with the bytes in the input file, while both encrypting and decrypting. It worked, but I was surprised to see (IIRC, it was done quite a while ago) that the same string was sometimes seen repeatedly occurring (as plaintext) in the encrypted file. Don't know the mathematical / cryptographical reason for it, if any.


>Don't know the mathematical / cryptographical reason for it, if any.

I wonder if tptacek or any other crypto expert here can explain it. maybe there is some mathematical formula or principle behind the phenomenon I observed.


Bash fork bomb


  :(){ :|:& };: #Don't run this at home


The right tool for the job


Hello World! in python




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: