Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What's your favorite elegant/beautiful algorithm?
733 points by algobuddy on Oct 17, 2018 | hide | past | favorite | 491 comments

I always push my devs to really study the Bittorrent protocol. The elements of the protocol are all fairly easy to understand, but it's gorgeous to see how elegantly it solved a social problem rather than a computational problem. The tit-for-tat and fast-finish protocols are incredibly graceful ways to create virtuous cycles in a social/technical hybrid, and replaced very real vicious cycles in previous protocols.

Various TCP backoff algorithms are also a great study in how to approach problems with insufficient knowledge of the system under test, and building desirable properties (reliable in-order delivery) from a undesirable input (arbitrary packet loss, and loss in the face of congestion). I make all the kids try to implement their own, then walk through the history of TCP approaches to the modern day.

> The elements of the protocol are all fairly easy to understand, but it's gorgeous to see how elegantly it solved a social problem rather than a computational problem.

One more protocol / paper which fits this description, imo, is Bitcoin paper [0]. It's easy and accessible to all and elegently explains how to solve a complex problem with existing tools. Not sure it can be called as an 'algorithm'

[0] - https://bitcoin.org/bitcoin.pdf

BitTorrent is foundational to the rise of decentralized stores and sharing: IPFS, Dat Protocol, WebTorrent, Storj, Holochain, etc.

An Introduction to Kademlia DHT & How It Works


DHT Algorithms are nice, the Chord is really simple and easy to understand.

Any suggestions where I can read about the BitTorrent protocol?

There is an official document on the protocol [0], but it doesn't go much into the details. Theory.org's resource [1] does a much better job.

[0] - http://www.bittorrent.org/beps/bep_0003.html

[1] - https://wiki.theory.org/index.php/BitTorrentSpecification

I implemented a BitTorrent client as a sophomore in college, available here: https://github.com/war1025/Torrent

It worked pretty well. Used it for several years until I started making real money and decided I could buy things rather than pirate them.

I had only been coding for a year or two at that point, so it is probably filled with lots of odd choices, but it also isn't super optimized like I would guess the more well known clients might be, and so might be easier to parse.

I'm the same way in that I haven't been using torrents in awhile. But a few legit things they're used for us Linux distros so if you ever feel like helping in that endeavor you can seed out some ISOs.

I would be interested in that too :)

I second this; Kademlia has always struck me as one of the most beautiful and elegant algorithms out there; Bittorrent is wonderful too.

Bencode is such a piece of junk though.

Extremely simple one, but my favorite is an algorithm for determining if two words are anagrams of each other:

The Fundamental Theorem of Arithmetic states: "every integer greater than 1 either is a prime number itself or can be represented as the product of prime numbers and that, moreover, this representation is unique, up to (except for) the order of the factors."[1]

So to determine that two words are anagrams of each other, you can assign each letter to a unique prime number (a = 2, b = 3, c = 5 etc.), then compute the product of those numbers, and if they're equal then those two words are anagrams.

[1] - https://en.wikipedia.org/wiki/Fundamental_theorem_of_arithme...

Nice in theory, but in practice you wouldn't implement it like that, especially if the words can be longer than your machine integer allows. Sorting and comparing is more elegant than invoking a BigNum library, imho (and has smaller footprint). This shows that theoretical elegance != implementation elegance.

I was curious how soon overflowing a native integer would come up. The "worst case" would be all "z"s (which map to 103), so how many characters does floor(log_103(2^n-1)) get you?

    int32  zzzz      (4 characters)
    uint32 zzzz      (4 characters)
    int64  zzzzzzzzz (9 characters)
    uint64 zzzzzzzzz (9 characters)
But that's the worst case, not many real words have several "z"s in them. How often are real words affected? I did some experiments with /usr/share/dict/words:

    1-( 36123/123115) = 70%  overflow int32
    1-( 39774/123115) = 68%  overflow uint32
    1-(117909/123115) = 4.2% overflow int64
    1-(118533/123115) = 3.7% overflow uint64

Hmm, nice result. That makes me want to map letters onto primes in order of frequency. So "e" is 2, "t" is 3, etc., see: http://pi.math.cornell.edu/~mec/2003-2004/cryptography/subs/...

At this point, you're yak shaving, where the sane thing to do is simply to count the number of times each letter occurs.

Better complexity-wise as well, both in terms of speed and storage: O(n) and O(1), respectively. [1]

The proposed algorithm uses O(n) storage (to store the immense product), and given that integer multiplication complexity is somewhere between O(n) and O(n^2), we end up[2] with at least O(n^2) runtime complexity!

[1] ...assuming the input data is less than 15 million terabytes in size; O(n log n), O(log n) respectively for arbitrary length input. Storing the number of input bytes takes O(log n) space, and integer increment can take O(size).


OK, on my (longer at 235886 words) /usr/share/dict/words, I find that:

  sequential encoding: 21882 words overflow 2**64 (9.3%)
  frequency encoding: 2945 words overflow 2**64 (1.2%)
Some other trivia:

  mean #bits for those words over 64 bits: 
         seq = 72.0; freq = 69.3
  largest #bits: 
         seq = 115.4; freq = 101.0
  word w/ largest #bits: 
         seq = thyroparathyroidectomize [1]
         freq = pathologicopsychological [n/a]
[1] https://www.merriam-webster.com/medical/thyroparathyroidecto...

Now add support for Unicode. :)

Or, for better time complexity (with a bit of extra space) making a map of char -> frequency and comparing the results.

That's Fourier transform!

Funny. :) But a Fourier transform is reversible.

The title asked for beautiful/elegant algorithms, not efficient ones! :)

In computing, inefficient algorithms are not beautiful.

I wrote some comparison some time ago https://github.com/remusao/anagrams-bench/blob/master/README...

Unfortunately, as you suggest, the complexity of arbitrary precision multiplication kicks in and performance suffers.

One might also choose not to bring in a BigNum library if you don't need to find every anagram. Here's an exhaustive list of anagrams I found with sorting and comparing that a quick Fundamental Theorem implementation missed:

   [ 'basiparachromatin', 'Marsipobranchiata' ]
   [ 'configurationism', 'misconfiguration' ]
   [ 'constructionism', 'misconstruction' ]
   [ 'pericardiacophrenic', 'phrenicopericardiac' ]
   [ 'anatomicophysiologic', 'physiologicoanatomic' ]
   [ 'petrographically', 'pterylographical' ]

Note this is out of 20K anagrams.

Here's my code: https://github.com/brlewis/brlewis.github.io/blob/master/201...

On my machine, Fundamental Theorem of Arithmetic finished in 1.703 seconds, sorting letters in 1.954 seconds.

This one came up a lot when I was researching the most efficient way to generate anagrams, but the numbers are growing way too fast. In the end, a frequency map and comparing the frequencies worked best. My use case is a bit special, as I'm creating anagram sentences, so a Trie or similar things wouldn't work. Check out https://anagrams.io if you're interested.

I have a pet implementation of the frequency map that I'm overly fond of, for ascii strings:

  (1) keep an array of length 127 that you re-use and set to 0 between calls
  (2) for each character in the first string, increment the array at the character's index
  (3) for each character in the second string, decrement the array at the character's index
If you end up with all 0s, it's an anagram.

Better to use a hash map instead of an array. When iterating the second string, as soon as you see a character without an existing value it's not an anagram. If you finish the second string, check that all values are zero.

This way you only need to check the character values you actually use, and not all 127. It also generalizes trivially to larger character sets.

An array is simply an optimized specialization of a hash map.

His approach is the same as yours. As soon as the algorithm sees a character from second string where the value for that char in the array is zero, the second string is not an anagram of first string and can return false immediately.

Doesn't that mean in the end you have to check 127 values for if they are 0?

Or 64, if you store numbers as 32-bit integers and compare them as 64-bit using a union type.

You could also keep a counter of the number of non-zero entries and update on zero/non-zero transitions.

Given the constraints of the problem, it's the sanest thing to do.

Diffie–Hellman. I know that cryptography can get much fancier and more clever, but Diffie–Hellman took a concept my intuition told me was impossible and showed that it's possible in a really simple, elegant way. Learning about it was the first time I realized how beautiful the math behind computer science is.

It's also a great insight into just how fundamental the concept of computational complexity is.

And here is an elegant explanation of the Diffie-Hellman protocol https://www.youtube.com/watch?v=YEBfamv-_do

Another explanation that is not as technically accurate, but really clicked on me to understand how it's possible to establish private communication in the open, is the following:

You put your secret message in a box, put a lock on it that only you have the key to, and send it to the other party. They, unable to open it, put on a second lock of their own, and send it back. You remove your lock, leaving theirs, and once again send it to the other party. Finally, they remove their lock too and can open the box without anyone else having had that possibility.

What can also be inferred from this, is how DH is vulnerable to a man-in-the-middle attack. Someone involved in the delivery could pretend to you to be the other party and to them to be you.

Absolutely amazing. This is one of those instances when analogy is exact (mixing colors) and doesn't dilute the original motif (One way function).

> a concept my intuition told me was impossible and showed that it's possible in a really simple, elegant way

I agree but, to be fair, the key ingredient ("discrete logarithms are hard") is not simple at all.

I am so glad that I took Cryptography as an elective in college even though the course brought down my grades :)

So true. Securely exchanging keys is an abstract concept until you see those paint colors mixing!

I love the idea of how essentially we have never met and we're shouting across a crowded room but no one else can understand our conversation.


There's also this. The paint mixing analogy starts around 3 minutes in.


That's awesome, thanks

Thank you for this. This probably ranks as the algorithm with the greatest coolness / simplicity ratio I've heard of.

+1 for Diffie-Hellman... and the paint analogy.

I was just blown away when my professor taught me this.

Second to this.

Totally blew my mind back then when I was trying to understand how asymmetrical cryptography works.

My first programming project was a tic-tac-toe game with a computer opponent. I painstakingly copy-pasted about a hundred nested `if` statements to check for a winner & decide the computer’s next move.

Several years later I saw a Matlab demo that did this by indexing the grid using values from a 3x3 magic square[1]. In a magic square, every row, column, and diagonal has the same sum. So checking for a winner was just checking if a player’s three moves added up to 15. Finding a space that would make the computer win, or block the human opponent from winning, was subtracting two moves from 15 and checking if that spot was available.

[1]: https://en.wikipedia.org/wiki/Magic_square

This is cool. I never actually realized that magic squares exhausted all of the sums to 15; in other words, there is no triplet which sums to 15 and is not already a row, column, or diagonal. No false positives, so to speak.

I wonder if this is true of larger magic squares.

It is not. There are indeed exactly 8 ways to pick three elements from the set [1,9] such that their sum equals 15. But for 4x4, there are 86 (!) ways to pick four elements from [1, 15] so that their sum is 34 (the magic constant for 4x4), whereas only ten of the combinations are used in construction of the square.

Some Python:

  from collections import Counter
  from itertools import combinations

  def number_of_sums(n, M):
    combs = combinations(range(1,n**2+1), n)
    sums = (sum(c) for c in combs)
    return Counter(sums)[M]

  print(number_of_sums(3, 15))
  print(number_of_sums(4, 34))

In high school I made a tic-tac-toe game that implemented what I thought was a clever trick for its has-winner? routine: starting with a 3x3 grid of ones, I multiplied each of its rows, columns, and diagonals by a unique prime number. This results in a board that internally looks something like

238, 22, 494

21, 10659, 39

665, 55, 1105

so checking for a winner was equivalent to checking if the gcd of a player’s spaces was greater than 1.

Much less efficient! But it looks like I was on the right track in searching for compact numerical alternatives to traversing the board and measuring strides.

Edit: fixed an arithmetic error

Nitpick: this only works if the player played exactly three moves (in the version of Tic Tac Toe I know you can play for up to 5 moves). Generalizing to 4 moves is fairly simple, to 5 moves less so; you can test all '3 from 5' combinations, which is only 10 possibilities, but I doubt the code to do this is easy to read and understand after the fact.

I'd disagree here; I still think it's quite elegant. For example, in python, it would look something like

  from itertools import combinations
  magic_square = [2,7,6,
  def did_user_win(moves):
    # Convert moves to magic square values
    magic_values = [magic_square[move] for move in moves]
    # Check if any sum of three moves equals 15
    for three_values in combinations(magic_values,3):
      if sum(three_values) == 15: return True
    return False

Point taken. I love itertools :D

I thought I had an original and clever solution to TTT... my solution was to define all winning combinations as sets of (x,y) coordinate pairs and each time a player made a move, check to see if any of the winning combinations was a subset of the player's set of coordinate pairs.

Oh wow. My mind is boggled. Boggled I say!

The Gale-Shapley Algorithm to solve the "Stable Marriage" problem. 2012 Nobel Prize in Economic Sciences for its wide-ranging use in medicine, education, and resource allocation. It's fairly easy to implement a basic version of it, feels intuitively obvious once explained, and has been applied to everything from organ transplants to student placement in elementary schools. Really, any place you have two groups where individual members have a preferential ranking of the other.


I remember when this "Nobel Prize" was announced I immediately started reading the paper and trying to figure out how it could work in code. It seems like it could have amazing economic potential if put to a good use, but until your comment I hand't known of any real-world uses outside of organ transplantation. Very excited to see it being appreciated by more computer scientists! I need to look into it again for sure!

The organ transplant algorithm is not really that similar to the stable marriage algorithm. It's a different mechanism based on the housing allocation problem. See https://www.nber.org/papers/w10002

I think they are lumped together because they are both "mechanism design" problems that were studied by Roth and part of his prize.

Interesting! I knew that med school applicants were accepted on some sort of ranking and matching process but didn't realize there was a real algorithm behind it. This, or one similar, seems to be what's behind it.

There's no matching process at the beginning of medical school, but there is for assigning med school graduates to residencies. This variant of the problem is NP-hard, so there's no exact solution, but matching still works pretty well. https://web.stanford.edu/~alroth/papers/rothperansonaer.PDF has all the details.

When you are referring to this "variant", are you referring to maximal matching, or bipartite matching when the number of people and schools don't equal?

Not the parent, but I think they refer to the fact that with admissions, matches are permanent (once a student accepts a school's offer, said offer can't be rescinded), while Gale-Shapely requires provisional pairings until all the rounds have been gone through and a final result is available.

The issue is that couples want to be placed together. That makes everything trickier than if people just placed independently.

Kleinberg and Tardos start off their algorithms book with this problem, except they formulate it in terms of college juniors seeking internships and employers.

NYC public schools use a version of this too: https://www.nytimes.com/2014/12/07/nyregion/how-game-theory-...

The real shocker is that it took until 2012 for this pivotal algorithm to win a Nobel Prize. David Gale had passed by then! The Nobel committee must be backlogged to Hell and back.

The "2 watched literal" algorithm used in sat solvers (here is a random blog post I found by googling http://haz-tech.blogspot.com/2010/08/whos-watching-watch-lit... ).

The algorithm has many lovely features. It is very efficient -- it is used in basically every SAT solver with minimal modifications. It's not entirely trivial it works, particularly the backtracking part. It is a good example of how there are algorithms which are extremely hard to do functionally (the whole algorithm is about moving pointers around).

It's also a good example of practical Vs theoretical efficiency. In the worst case it is no more efficient than the algorithm it replaced, in practice it is hundreds of times more efficient. The "not moving back" feels like it should save at most half the time, it on fact speeds things up by often 10 times (this is because watches are often moved around until they reach some "boring" variables, where they lay undisturbed).

Awesome, thanks for mentioning this! I completely forgot about 2-literal watching and I hadn't ever thought about the idea that it'd be difficult to implement functionally. It's absolutely clever. And one other intuition I would like to add to it is that, from the way I see it at least, it is a cheap "detector" for when 3-SAT degenerates into 2-SAT, which finally becomes polynomial-time solvable -- which is exactly the kind of thing you want to have a detector for!

this is a neat way to avoid moving those pointers around: https://www.cs.kent.ac.uk/pubs/2010/2970/content.pdf

While this is very cute I'd be interested to know what kind of code it turns into -- I imagine it will either boil down to the same, or will be much slower (but I am happy to be impressed)

Fast inverse square root, sometimes referred to as Fast InvSqrt() or by the hexadecimal constant 0x5F3759DF, is an algorithm that estimates 1/√x, the reciprocal (or multiplicative inverse) of the square root of a 32-bit floating-point number x in IEEE 754 floating-point format.


I especially love the story around it.

It’s nice but it’s deprecated, we now have hardware support for that in CPUs.

PC: https://software.intel.com/sites/landingpage/IntrinsicsGuide...

ARM: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc....

This is how I tell people that Computer Science can still have magic to it.

Binary search. Very simple, incredibly powerful; can search on data or math function. It's the basis for other CS concepts.

JS implementation: https://gist.github.com/netgusto/90c8e0e7019a832cbf95eac58e1...

Regarding "very simple" - as I recall from some book, first bug-free implementation appeared only after several years after invention / initial description of the algorithm. From wikipedia: "A study published in 1988 shows that accurate code for it is only found in five out of twenty textbooks" (https://en.wikipedia.org/wiki/Binary_search_algorithm).

Someone did a blog about that book, and challenged people to do their own bug-free implementations. https://reprog.wordpress.com/2010/04/19/are-you-one-of-the-1...

It was in Programming Pearls, IIRC.

I like this version:

  low = -1
  high = arr.length

  while(high - low > 1){
    mid = (high + low)/2
    if(arr[mid] <= x) low = mid else high = mid
This eliminates all branches from the body of the loop (gets compiled to conditional move). It's also more general because it can be used to find the right position to insert a new element in a sorted array.

and it's also a broken version.

Actually, I can definitely say that this version is not broken, i.e. I've formally proven it to be mathematically correct assuming that:

1. high and low are bigints (such as when using Python, for example)

2. the input array is sorted (needed for any binary search algorithm)

3. this missing code at the bottom is added:

  if (0 <= low && low < arr.length) {
    if (arr[low] == x) {
      return low;
  return -1;  // (or "raise Not_found" or whatever you use to indicate that 'x' was not found).
You can find the formal proof in WhyML below. This includes proofs that:

1. all array accesses are in bounds [checked automatically]

2. there is no undefined behavior (such as division by zero) [checked automatically]

3. the function always terminates [enforced automatically, proven with the help of the 'variant' clause]

4. there are no overflows [checked automatically, not possible since we're using bigints]

5. if an index is returned, the array has an element with value equal to x at that index [the 'ensures' clause]

6. if instead the function returns "not found", the array does not have any element with a value equal to x [the 'raises' clause]

... all this assuming that the input array is sorted [the 'requires' clause]

Code + proof here (it makes more sense if you can read OCaml syntax): https://clbin.com/jbTk8

Awesome! Interesting that it can prove that given only the loop invariant.

The inner loop could be factored to a subroutine with the following contract: let P be a predicate on the integers such that (i <= j) => (P(i) => P(j)), and let low be such that P(low) = false and high such that P(high) = true. The subroutine returns i such that P(i) = false and P(i+1) = true. The subroutine further promises to only call P(k) on low < k < high.

This may be applied to the predicate P(k) = (x <= arr[k]) on the range (low,high) = (-1,arr.length) to find the point at which the predicate flips from false to true (we're essentially pretending to know that there is a small value at arr[-1] and a large value at arr[arr.length], and the subroutine promises to only call P(k) on values strictly between -1 and arr.length).

Is it possible to do this with WhyML?

Actually, my loop invariant was unnecessarily confusing (I was sleep deprived), here's a better version: https://clbin.com/OhyJ3

Your question is very interesting and I think the answer is 'yes'. I will try to implement it and report back.

Ok, so this proved to be more difficult than I anticipated and it took me a while, but fortunately I really enjoyed the challenge!

It seems that the answer is definitely "yes", it can be done with WhyML.

There are a few minor caveats:

1. I had to change the function with the while loop to return an option instead of raising the Not_found exception, which makes the code a bit uglier. This is because I ran into the following bug in Why3 (it's already fixed in the master branch, but not in the most recent release): https://gitlab.inria.fr/why3/why3/issues/214

2. It turns out that the value of "high" is not statically known at compile time, because it depends on the size of the array. So I changed "high" to be a function of the array rather than a constant (I did the same to "low" for consistency).

3. Similarly and consequently, the predicate P(i) is not only a function of "i". It's also a function of the value being searched for ("x"), i.e. if P(i) is true or not depends on which value you're searching for. This also means that P() is also a function of the specific array being searched for (because some arrays may have "x" but others not).

4. You said that "we're pretending to know that there is a small value at arr[-1] and a large value at arr[arr.length]", but unless I misunderstood something, we can't actually pretend that, this must be written in the code somewhere, otherwise it won't work. For simplicity, I have chosen to implement those rules in the predicate itself. However, this is also enforced by the function with the while loop, i.e., it always makes sure that whatever predicate you have chosen, it has to implement those rules. I am sure that this could be changed, but I think it would complicate the proofs.

The result is that in module "GenericBinarySearch", like the name says, we have a neat generic binary search function which works with any predicate P (called "pred" in the code), any function "low", any function "high" and for any type.

And in module "ArrayBinarySearch" we have an instantiation of "GenericBinarySearch" which works for sorted arrays!

Also note that the axiom in the code is only an axiom in the generic module (it specifies what must be true of P()), but when the generic module is instantiated, Why3 verifies that the axiom is actually true for the specific instance of the predicate (so there is no cheating).

You can find the code in the link below. Enjoy!


Very nice. I don't quite understand why you need the option in GenericBinarySearch. Doesn't the function always return Some?

I had something like this in mind:

    let binary_search (pred: int -> bool) (low: int) (high: int): int
        requires { pred low = False }
        requires { pred high = True }
        requires { forall i j. (i <= j) -> (pred i -> pred j)}
        ensures  { pred result = False /\ pred (result+1) = True }
        let cur_low = ref low in
        let cur_high = ref high in

        while !cur_high - !cur_low > 1 do
            variant { !cur_high - !cur_low }
            invariant { pred !cur_low = False /\ pred !cur_high = True }
            let mid = div (!cur_high + !cur_low) 2 in
            if not (pred mid) then
                cur_low := mid
                cur_high := mid
Can something like that work?

  let predicate pred (i: int) (x: t) (k: int)
  = if i <= low x then false else
    if i >= high x then true else
    k <= x.arr[i]
Those tests are only necessary for the proofs, right? At run time the predicate will never be called with i <= low or i >= high, but only on low < i < high, because if high - low > 1 then low < (high + low)/2 < high. That's what I meant by pretending that the predicate returns false/true on -1 or arr.length: we don't actually need the predicate to work on those indices, because we never call the predicate at those indices.

> Can something like that work?

Yes, indeed! Your version is not only a lot more simple and elegant than mine, but your proof code was enough for Z3 to automatically figure out that everything works.

Therefore, I hereby declare that you are ready to use and learn more about Why3/WhyML.

Here's a working complete implementation of your version: https://clbin.com/cwvKY

There's only a minor caveat with your version: there's no formal guarantee that `binary_search` won't call `pred` below `low` or above `high`, because (as far as I know) we can't encode those requirements if we pass the function as an argument. Maybe there's a way to do that, but I don't know WhyML that deeply.

> Those tests are only necessary for the proofs, right?

I think so. We can probably separate `pred` into a logical version and a run-time version, making sure that the result of the logical `pred` matches the result of the run-time `pred` when `-1 < idx < length arr`.

I think this should be very easy to do with WhyML, I will try and figure out if I can do it.

That would be nice, then the run time predicate can be fast while the logical version remains elegant.

What happens if you remove requires { forall i j. (i <= j) -> (pred i -> pred j)}? I think it should still check, actually. That property ensures that the answer will be unique, but the algorithm will find a point pred i = False /\ pred (i+1) = True even if the predicate does not satisfy that property.

If you add the uniqueness to the ensures, does Z3 still automatically prove it correct?

  ensures { forall j. pred j = False /\ pred (j+1) = True -> j == result) }

> What happens if you remove requires { forall i j. (i <= j) -> (pred i -> pred j)}? I think it should still check, actually.

If I simply remove that 'requires', then Why3 cannot prove the postcondition of `binary_search` automatically anymore (using the Z3, CVC4 and Eprover automatic provers that I have installed).

Specifically, Why3 tries to split the postcondition into the 2 parts: `pred result = False`, which gets verified correctly, and `pred (result+1) = True`, which cannot be verified automatically.

I think this is because nothing stops `high` from actually being lower than `low`. If someone passes those 2 arguments reversed (high as low and low as high), then probably the algorithm wouldn't work, right? At least, Why3 is not convinced that it would work.

However, if I add this loop invariant: `!cur_low < !cur_high` and this function precondition: `requires { low < high }`, then it all works fine! (both are required).

Diff here: https://clbin.com/wVkhO And full code here: https://clbin.com/2gBfJ

I tried separating `pred` into logical and run-time versions (one with the ifs and the other only with the comparison), and it all seems to work fine, except for one problem:

The run-time version of `pred` is a partial function (it only works for valid array indices), so it needs a precondition. However, when I pass `pred` as an argument to `binary_search`, I can't / don't know how to specify that the argument needs the precondition.

Therefore, Why3 complains that the precondition of `pred` may not be respected (all other verification conditions are proven to be valid).

I could do what I did before (making `low` and `high` functions, etc) but that greatly complicates the code...

Maybe there is some way to do that, but currently I don't know how.

> I think this is because nothing stops `high` from actually being lower than `low`.

Ahh, right. I guess that's exactly the type of oversight that a checker is for :)

We could return the pair (!cur_low, !cur_high) and have the postcondition that pred (fst result) = False and pred (snd result) = True and abs (first result - snd result) = 1. Then it would work also if low > high, but I'm not sure this is useful in practice...

> The run-time version of `pred` is a partial function (it only works for valid array indices), so it needs a precondition. However, when I pass `pred` as an argument to `binary_search`, I can't / don't know how to specify that the argument needs the precondition.

If I'm understanding this correctly, you want to do something like this:

  let binary_search (pred: (i:int) -> bool requires { low < i < high }) (low: int) (high: int): int
But Why3 does not support this?

If you add a precondition like that to pred, wouldn't that also prevent requires/ensures/invariant from calling pred on arguments that don't satisfy the precondition? In the precondition we do want pred low = False /\ pred high = True, but the run time predicate only allows pred k for low < k < high?

> If I'm understanding this correctly, you want to do something like this:


> But Why3 does not support this?

As far as I can tell, it doesn't. I get a syntax error if I either try to name the argument to pred or if I try to add a 'requires {}'.

Maybe they will add this functionality to a future version, or maybe there is already a different but simple way to do this (but I don't know how).

> If you add a precondition like that to pred, wouldn't that also prevent requires/ensures/invariant from calling pred on arguments that don't satisfy the precondition?

No, logical/proof functions are always total, they cannot be partial.

One option is to always define what the function should return for the entire domain of its arguments.

The other main option AFAIK is to define the results only for a restricted domain that interests us and leave the function undefined outside this restricted domain (but in this latter case, we won't be able to extract conclusions about what the function returns outside this restricted domain).

However, as far as I know, the latter option needs to be implemented differently in Why3, specifically as an abstract predicate/function, and then you separately define axioms about things you know about the predicate/function. The disadvantage is that if you make a mistake in one of the axioms (say, you accidentally define that the function returns both True and False for the same input), then you are introducing an inconsistency which allows you to prove anything you want (i.e. you would be able to trivially prove that 2 = 3). This is undesirable, of course.

I think I saw somewhere that if you define a predicate `P` in Why3 that works both for runtime and for proofs, and then you add a precondition `A` to this predicate, then Why3 will automatically add an implication to the predicate for proofs, i.e. if use this predicate in a proof, the predicate will become `A -> P(x)` instead of just `P(x)`. But I'm not entirely certain about this, I could be wrong.

Unfortunately Why3 is not very well documented, the vast majority of what I've learned so far has been through reading the examples, looking through the git history, and trial-and-error.

> In the precondition we do want pred low = False /\ pred high = True, but the run time predicate only allows pred k for low < k < high?

Exactly, this is why I was trying to add a new predicate (for proofs only), which returns False for i <= low and True for i >= high, but calls the other predicate otherwise.

However, I still run into the same problem: I cannot specify that a function argument needs a precondition, and therefore Why3 cannot tell that the precondition doesn't get violated...

thanks, this is a great comment(and the rest of the thread). Yes, I meant that it doesn't work without assuming #1, and it had missing #3.

How so? Are you talking about (high + low)/2? Whether that is correct depends on the language. With 64 bit arithmetic it's correct in practice, and with arbitrary size arithmetic it's correct in practice and in theory. It only really causes problems if you're using 32 bit arithmetic on a 64 bit machine.

Just keep telling yourself that.

Or, look up the right answer.

Perhaps instead of being mean, you could be helpful and post a link to said answer.

With 64 bit variables it probably works. But for less here is the correct implementation: https://ai.googleblog.com/2006/06/extra-extra-read-all-about...

This implementation:

    mid = ((unsigned int)low + (unsigned int)high)) >> 1;
essentially extends the bit width from 31 bit to 32 bit by going from signed to unsigned, and is thus correct up to array lengths of order 2^31 rather than 2^30. Using an even larger bit width should classify as at least as correct as this.

In summary,

(low + high)/2 or (low + high) >> 1 is correct up to 2^30 for 32 bit signed and up to 2^62 for 64 bit signed, up to 2^31 for 32 bit unsigned, and up to 2^63 for 64 bit unsigned.

A cool thing about binary search is you can also use it in real life. It's great for troubleshooting.

`git bisect` is an indispensible git trick. You give it commit where you know the bug exists, and one where you know it doesn't and it'll use binary search to help you find where the bug came from. It's absolutely wonderful.

I always struggle to see why people bang on about git bisect. You need tests to make it work. If you have tests, why aren't you running them continuously? If you're running them continuously, why do you need git bisect?

Sure; if you never write code that contains bugs that your unit tests don't catch, then "git bisect" is probably useless to you.

As for the rest of us mere mortals, we sometimes write code that has bugs without discovering those bugs right away. In a complex system, it might not be easy to observe an incorrect behavior and immediately deduce the root cause. "git bisect" allows you to retroactively track down the commit that introduced a behavior change, even if it was something you didn't originally think to test for.

If the bug is a regression, you write the test after finding it, and execute it with git-bisect. If the test starts to pass, you have the commit where it will fail.

If I can write a failing test, I can likely fix it. If I am interested in seeing who broke it and when, I can run git blame on the thing I'm fixing.

Given that my test suite is generally in the same repo as the code, I'd need to write something that patched my new test into the codebase at every git bisect commit, recompile and run it.

I can see that this may occasionally be useful if you have an extremely hard to find bug, but for me it's pretty rare. In fact I've done this once, ever.

Hence my skepticism when people describe this as "git's killer feature" or whatever other hyperbole.

You don't really need automated tests. You can use a reproducible bug report as test input, then use git-diff to determine the first point it time it appears.

Because your tests didn't have 100% coverage.

It's useful for projects where maintaining tests is impractical such as the linux kernel. When someone tries to find a regression in the kernel they are probably going to write a test which grep's dmesg for a certain output, or sometimes the regression is a kernel panic and you are basically checking if your system crashes or not on each bisect step.

Yep, I use this all the time to figure out which SVN commit introduced a bug (run the last deployed version and verify it didn't have the bug, then binary search the commits in-between that and HEAD until I figure out which one caused it).

I recently learned about git bisect. Very useful!

git bisect is cool, but this also works with like electronic circuits, AV signal chains, etc...

Have a look at interpolation search.

The complexity is O(log log n), under the assumption of a uniform distribution of the data.


I've used it as a very simple equation solver in a pinch. Yes, it's a very naive approach for most equations, but getting started solving it satisfactory with a solution given is very welcome.

...wait, how? I’ve never heard of a use like this.

If f is monotonic and continuous, you can solve f(x) = y for x by successively narrowing an interval [a, b] where f(a) <= y <= f(b) (or vice versa for decreasing functions). In the case that the range of f spans the entire real numbers, the initial range can be determined as the smallest range [-2^k, 2^k] where k is an integer (a strategy often used for open-ended binary search).

Interestingly, with floats you can also binary search on the binary representation. That narrows down the answer to the most precise answer representable in floating point in at most 64 steps for 64 bit floats. If you bisect the interval in the middle at each step, then it could take more steps (possibly more than 1000).

It turns out that doing this is pretty much equivalent to what you suggest, if you narrow down the initial range [-2^k, 2^k] by binary searching on k, because that's just binary searching on the exponent in the binary representation.

Interesting! I didn't really think about this but this is how I implemented sqrt() and division in an interview. Didn't think of how it generalizes!

Isn't this bit:

const mid = Math.floor((right + left) / 2);

susceptible to overflow?

EDIT: Hm, perhaps not (in JS). Number.MAX_SAFE_INTEGER is much greater than I expected.

Indeed, Java's binary search had this bug for a while! It was found through automatic verification, by the way.


also some CS books

Incidentally, as outlined in this Julia FAQ, native machine arithmetic with overflow will deal with this correctly.

    mid = (lo + hi) >>> 1

EDIT: legibility

Thank you, fixed it!

const mid = Math.floor(left + (right - left) / 2);


Not in Javascript, where everything is a double precision float. You would lose precision at about 2^51, but that’s not a limit that will meaningfully affect us for good while.

Oh really? And what if you aren't storing the data? You just want to find such n, that f(n) >= m and f(n+1) < m. Now perhaps it might become an issue.

The FFT algorithm is perhaps the most elegant and useful work of the 20th century. Modern communications would probably not be possible with out it but it's utility isn't limited to electronics It is used across the list of scientific disciplines and even in finance and economics. Next time you make a call on your smart phone, hoist a beer to Cooley, Tukey, and Carl Gauss

I have heard it called the most important algorithm of the 20th century. Not sure I buy that, I would say it's the most important algorithm in EE though.

It's also critical for factoring prime numbers and SETI signal intelligence processing. Truly a gem.

Factoring prime numbers?



My experience with this is from using Prime95 two decades ago as part of a distributed computing project to factor Mersenne prime numbers

https://en.wikipedia.org/wiki/Prime95 | https://www.mersenne.org/

Failing to factor them can be pretty important.

Even after working with it for 15 years I still find new and interesting things about it.

I like Dijkstra's algorithm for finding the shortest path between two nodes in a graph.

It's not so much that it is "beautiful", but it is a remarkably simple algorithm that a human can follow manually. The reason it is efficient is easily understood (it's easy to see how we are able to "finalize" nodes because it's obvious there is no shorter path to that node), and it takes what would otherwise be a complicated task and makes it manageable.

What I like about Dijkstra's algorithm is not that you can calculate the shortest path between two nodes, but that you can calculate the shortest path from one node to every other node, and do it one pass in O(n) time. I'm not sure it's at all obvious that it should be possible to do that.

>do it one pass in O(n) time

Mind you, the “n” here is not the number of nodes!

More precisely, Dijkstra’s algorithm runs in O(E + V * log V) time (if the priority queue is implemented with a heap), where E is the number of edges and V is the number of nodes.

In the general case, E = O(V^2) (i.e. fully connected graph), so Dijkstra’s algorithm can calculate the shortest path to V nodes in O(V^2) time. It sounds less staggering this way.

And in a general graph where negative edge-weights are permitted, it’s actually impossible to find the shortest path between two nodes without finding the shortest path between the source node and every other node! It’s pretty easy to see why: you can’t be sure you’ve actually found the shortest path until you’ve covered all the edges, because there’s always the possibility of a very large negative edge to be visited.

Dijkstra’s algorithm improves on this general case by exploiting the fact that the edges must have positive weight.

> And in a general graph where negative edge-weights are permitted, it’s actually impossible to find the shortest path between two nodes without finding the shortest path between the source node and every other node!

If there are negative edge-weights and cycles may occur, there is no shortest path between some nodes. You can keep going through a cycle whose total weight is negative getting "shorter" and "shorter".

It's like a race where one shortcut takes you back in time and leaves you where you started. You can use it to finish the race as far back in time before you started as you want.

Sure, but between "all edges have positive weights" and "there are cycles with negative total weight" there are still graphs that have negative edge weights, yet there's no negative-weight cycle.


V = { A, B, C }

E = { A -> B, B -> C, C -> A }

weight(A -> B) = 1

weight(B -> C) = 1

weight(C -> A) = -1

Right, yes.

You are, of course, correct. That was sloppy writing on my part.

I've always suspected that Dijkstra's shortest path algorithm was invented many times before he wrote it down. The algorithm itself is relatively straight forward and I can imagine that a competent programmer not aware of it could come up with it independently.

I mean, even Dijkstra himself came up with it because he needed it for a telecom gig. He can't have been the first one in the world who needed to find the shortest path in a graph. He just also happened to be a scientist, experienced in the whole "how to publish a paper" thing.

In one of the documentaries about him, he remarked that having written it up, he had no idea where to publish it.

For similar reasons, I really like the A* pathfinding algorithm. If you implement it on a grid with obstacles, draw the grid on screen, and update it as the algorithm runs, it's both beautiful and very intuitive to watch.

K - means clustering, which I learned about recently.

For non-machine learning folks, this algorithms helps in grouping relevant items together. As a practical, using this algorithm you can segment your customers based on their purchase history and interests.

The actual algorithm is very simple, imagine a lot of points on the 2D plane and each of them represent customers, whom you want to segment/cluster. Now, chose random cluster points and assign those customers to these points, by their distance. That is, whichever customer is near to a point, it belongs to that. At the end of iteration, you have lots of customers assigned to each cluster. Now, for each cluster, take the mean of those customers and the move the cluster to this mean value. Slowly, the cluster points move to the center with all the customers surrounded, belonging to that cluster.

read more - https://en.wikipedia.org/wiki/K-means_clustering

The scikit-learn page on clustering [1] is a really great way to see K-means and other algorithms side-by-side. The pictures really make it easy to see how all the algorithms perform on various data distributions.

[1] http://scikit-learn.org/stable/modules/clustering.html

The interesting part is solving the "closest pair problem" which is a part of the clustering algorithm. Hurts my head just thinking about it, god knows how someone came up with a solution like this. https://www.geeksforgeeks.org/closest-pair-of-points-using-d...

The closest-pair algorithm is something you might use in single-linkage clustering, not K-means clustering:


Lesson two of this course does a great job explaining K-means and its connection to Expectation Maximization. https://www.udacity.com/course/machine-learning-unsupervised...

I prefer mean-shift for it's simplicity, as it even creates it's own clusters.

Could you explain what you mean by that a little more? I looked at the Clustering section of the Wikipedia page for Mean-Shift:


but I don't get how C and r are chosen, nor how the method is supposed to be "non-parametric" given that you need such parameters to begin with.

Also, how are you supposed to assign all points to separate clusters, and how would you determine how many clusters to use in the first place?

Having looked at the that page I also definitely don't think it's simpler than K-means!

Edit: This article gives a much more accessible descriptions than wikipedia:


To answer my own questions, the number of clusters is determined by the number of max points of the kernel density function, but the kernel bandwidth does have to be specified.

Thanks for not just looking for an answer to your own question but also sharing it!

> So how does mean shift come into the picture? Mean shift exploits this KDE idea by imagining what the points would do if they all climbed up hill to the nearest peak on the KDE surface. It does so by iteratively shifting each point uphill until it reaches a peak.

I wonder if this would still work with high-dimensional data. Distance and space start to act weird in higher dimensions, right?

Without looking much further into it, I can't imagine it going very wrong. My intuition says that the density function will be bounded, since you're working with a finite number of points, therefore you'll always be able to climb to a local maximum.

The problem I see is the word "near" in "climb to nearest peak". To quote the curse of dimensionality wiki page:

> The common theme of these problems is that when the dimensionality increases, the volume of the space increases so fast that the available data become sparse. This sparsity is problematic for any method that requires statistical significance.


Are you concerned that in a very high dimensional space you'd need an absurd number of points in order to produce a meaninful KDE, since otherwise the points will be too sparse? If so I think I'm with you... you'll probably just end up with a very large number of small, weak clusters.

Although I wonder if you can't account for that by simply increasing the kernel bandwidth. Perhaps not. Perhaps this is why mean-shift seems to be mostly used for computer vision and image processing, where (I guess?) the number of dimensions is low.

K-means is not an algorithm, it's a heuristic for an Np-hard problem.

It is absolutely an algorithm in the sense of "a set of rules to be followed". I think you mean that it doesn't guarantee an optimal solution. That just means it's a heuristic algorithm, same as simulated annealing is a heuristic algorithm for solving optimisation problems.

Nope. An algorithm has to be effective. You can find pathological cases for k-means such that it will never converge on anything useful. So if you set your termination case to be convergence it will never terminate and if you don't then it will never be effective.

I think you might be in the minority in this opinion. Many algorithms have pathological cases but are still considered algorithms

Minority? This is directly from Knuth.

Knuth defines effectiveness as: "... all of the operations to be performed in the algorithm must be sufficiently basic that they can in principle be done exactly and in a finite length of time by a man using paper and pencil."

K-means and other heuristic algorithms fit that description.

BogoSort is an algorithm. Not a very good algorithm, but an algorithm nevertheless.

No it absolutely is not.

The lack of fundamental computer science knowledge in this thread is alarming.

Negative 2 points on a post saying that a computational method that possibly never terminates is not an algorithm... Oh dear...

It never terminates with probability 0.

As far as I can tell you're only arguing against poor implementations of K-means. If you demand that the score strictly improves at each iteration then the algorithm must terminate.

And how do you "demand that the score strictly improves"? It's an NP-hard problem.

K-means implementations generally terminate once there's an iteration where the score doesn't improve. This happens when there is convergence to a local minimum or - less likely - the state hops between two nearby local minima with the same score. But it will terminate on something, and most of the time that something will be pretty good.

I saw your mention of Knuth elsewhere, I looked it up and he demanded that

> An algorithm must always terminate after a finite number of steps ... a very finite number, a reasonable number

This is a pretty niche characterization and almost certainly not what the original post was asking for. However, I concur that there is no guarantee on how quickly K-means terminates or on how good the output will be,. But... if you're going to be that strict about it you would even have to rule out the Simplex Algorithm, which everyone I've ever spoken to thinks of as an algorithm.

In that sense kmeans may be better referred to as a 'computational method' rather than an algorithm.


Isn't a method that gives an approximate or best-fit estimate to a problem still an algorithm, if it terminates?

No. You can't prove that k-means does anything useful.

Is the definition of "algorithm" that you're using here useful?

It's one of the most fundamental concepts in computer science and underpins decades of research. You can decide if it's useful.

This isn't a classroom, and your pedantry isn't adding anything useful to the conversation. We all understand these pedantic quibbles you're arguing about... and what the community is more or less collectively saying is "in this context, we don't care about the distinction between an 'algorithm' in the textbook sense, and a 'heuristic' in the textbook sense".

Nah. Most of them don't understand the difference. If you did you wouldn't can it pedantry.

I personally don't find heuristics beautiful. That's why I commented.

To be fair, you haven't explained at all clearly why you don't think k-means adheres to Knuth's notion of an algorithm.

Your objection seems to be

> You can find pathological cases for k-means such that it will never converge on anything useful

As has been pointed out more than once, a good implementation of k-means is guaranteed to terminate in a finite time. And whatever you mean by "useful" doesn't seem to appear in Knuth's definition of an algorithm.

I was gonna mention this one as well! So simple and clever.

I like the explanation in the case study in Sedgewick's Algorithms [1].

Compare its simplicity to connected-components [2], another elegant algorithm but with a perhaps a bit more involved implementation.

1: https://algs4.cs.princeton.edu/15uf/

2: https://algs4.cs.princeton.edu/41graph/CC.java.html

Yeah, this is one of my absolute favorites as well. Such a neat idea. The analysis is also fun, where the complexity of operations is O(alpha(n)), where alpha(n) is the Inverse Ackermann Function. That function is just fun to contemplate.

You wanna flex your muscles with this data structure, this is a fun Project Euler problem: https://projecteuler.net/problem=186

I was recently in a coding interview where I had heard of the original problem (a version of "transform one word to another one letter at a time, making sure that each step is a real word") so instead of just skipping the question, we went straight to the "extra credit" question of "can you tell that there is a path from one word to another faster than just calculating the answer?"

After hearing the word "memoization" go through my head (part of "Hacking the Coding Interview." which I recommend at least for the wealth of example questions for practice), I basically walked my way through making a disjoint-set data structure using the words as keys and the set number as values. I'm prettyy sure that making that up on the spot is what got me the job.

I love this algorithm (I like implementing type systems) but I always feel a bit naughty when I implement it, as AFAIK it can't be implemented with immutable data structures (not efficiently, at least).

I like implementing type systems

Could you expand on this? This is something I've just started reading about. I'd be interested in good resources to use to get started. At the moment I've just started reading TAPL.

Check out https://github.com/tomprimozic/type-systems there's been a few HN threads about it as well. I can also answer any specific questions you have, or if you want further resources I can try and find them... (there's a good online book I have in mind, but I've no idea how to find it right now!)

Some great stuff in this repo thanks. I'm particularly interested in resources that build a type system up step by step, from very simple and working towards Hindley-Milner

Hm... I'm not sure it works that way. Type systems are quite fragile beasts, if you change one thing you can easily break the rest. Especially when it comes to type inference! Although I'd say that HM is quite simple, especially if you don't consider polymorphism (i.e. if you require that every function parameter has a concrete type like int or bool).

In fact, that might be a good starting point - first implement something with the above 2 types, and where every variable and function parameter has a type annotation. From then, you could (1) add more complex types, like function types, tuples or lists, (2) implement type propagation where each variable has the type of the value it's assigned (like auto in modern C++), and then (3) go full HM type inference.

TAPL definitely sounds like a good resource. The next book might be this one: Advanced Topics in Types and Programming Languages https://www.cis.upenn.edu/~bcpierce/attapl/frontmatter.pdf

Have you read TAPL? I'm curious how you think that stacks up to readily-available online resources.

No, I've just skimmed over some of its chapters to clarify some concepts. I've mostly learned by reading code and then trying to implement my own type systems. Algorithm W is really quite simple and basic.

Another good resource could be Programming Language Zoo http://plzoo.andrej.com/ that covers different evaluation techniques as well.

In general, I've been "involved" in PL design for quite some time, so I've no idea where I've gained all the knowledge I have... but in recent years, there have been quite a few modern resources, even a few on HN IIRC!




(disclaimer: I haven't read them)

Exponential backoff: I don't know if this is a published algorithm.

Basically, background processes need to keep retrying an operation until it succeeds. (Make an API call to a server, upload a file, ect, ect.) If the retry interval is too small, you can DOS the remote server. (Server returns a 5xx error because there's a corner case that hits a defect.) But, if the retry interval is too large, your background process can run too slowly when it encounters transient errors. (Temporary network glitch, database under high load because of a cache flush.)

So, we pick an ideal small retry interval, and the maximum tolerable interval. (In my case, it's usually 30 seconds and 15 minutes.) Then, when errors happen, the retry interval keeps doubling until it hits the largest interval. So, the operation is retried at 30 seconds, 1 minute, 2 minutes, 4 minutes, 8 minutes, and then every 15 minutes until it succeeds.

The result is that transient errors don't cause large delays. Situations where a defect prevents a request from submitting don't DOS, because the retry interval throttles itself longer and longer.

Ideally, you also want to implement "jitter" in the retries. AWS has a good article about it: https://aws.amazon.com/blogs/architecture/exponential-backof...

The summary (from the article) is that:

    sleep_time = random_between(0, min(2 ** retries, max_sleep_time))
so that retries are "spread across" the delay window.

Reminds me of the original Ethernet algorithm to deal with collisions on coax.

This is also used in TCP with timeouts and ACKs. The idea is you have an expected timeout (ETO) and a retransmission time out (RTO). Initially, you set RTO to ETO and then you send a packet. If you don't get an ACK back, and the timer for the RTO expires, you then set RTO = 2 * RTO. Otherwise, reset RTO = ETO. I think they referred to this as 'exponential averaging' however.

I tend to use a min/max equation that acts similar to a fibonacci backoff, allowing for a couple rapid retries before the exponential backoff kicks in. Provides a little extra speed for the transient problems.

This can't be an algorithm? I implemented it on a loop and a limit duration ( an hour) and a Max interval of a minute ( starting from 5 seconds) in 5 lines of code

A short algorithm doesn't make it any less of an algorithm :)

The Euclidean algorithm to compute the greatest common divisor of two numbers.

In Javascript, if you have two numbers x and y that are both greater than zero, you compute the greatest common divisor xyGcd in one beautiful line of code:

  var xyGcd = function gcd(a,b){ return b ? gcd(b, a%b) : a; }(x, y);
In my opinion this is the greatest one-liner ever.

> In my opinion this is the greatest one-liner ever.

I dunno, I really like Haskell's one-line infinite Fibonacci sequence:

    > let fibs = 1 : 1 : zipWith (+) fibs (tail fibs)
    > take 7 fibs
    > fibs !! 1000
I don't think it's the most efficient solution, but it really showcases lazy evaluation with a simple example.

Isn't it a-b instead of a%b?

a % b is a way to do it in fewer iterations than a - b.

I loved this article on the elegance of deflate http://www.codersnotes.com/notes/elegance-of-deflate/ finishes with this:

It's not one algorithm but two, that dance and weave together in harmony. You can pick dictionary matching, or entropy coding, and you can select between them on a per-byte basis with no overhead. That's what I find so clever about it - not that it can do one thing or the other, but that it can choose either, and yet represent them using the same language. No extra markers, nothing to say "oh, now we're switching to entropy mode", nothing to get in the way.

There's something fundamentally appealing about Genetic Algorithms to me. Even though they don't apply everywhere, and even though "hill climbing with random restart" can be just as effective in many cases, there's just something awe inspiring about the idea of mimicking evolution.

I also really like other related "nature inspired algorithms"[1] like Ant Colony Optimization, Particle Swarm Optimization, etc.

[1]: http://www.cleveralgorithms.com/nature-inspired/index.html

I have to agree. The fact that evolution is the only process we know have spawned consciousness just resonates with me.

While our algorithms might mimick evolution poorly, there's something raw about it. And it's not totally forgotten in research: https://blog.openai.com/evolution-strategies/

I stumped upon Differential Evolution which I implemented here https://github.com/peheje/nim_genetic/tree/master/Differenti...

It's fun to see how easy it is to define a new problem for it to solve.

Would like to apply it to create NN or Trees for classifications or the like.

Agreed. Ever since I first discovered them, I've decided that I eventually want to specialize in these types of algorithms. They may not be the most efficient algorithms to solve a problem, but they're incredibly fascinating, and their potential is endless.

I was actually considering implementing one during development of the game I'm working on to generate a soundtrack (since I'm not a musician), but the problem is the fitness function would require a user rating of each generated sound sample, and to get any kind of decent results, that would require me to personally listen to and rate thousands of songs. Either that or outsource it through SoundCloud or something.

I may be stretching the definition of algorithm here, but I've always been fascinated by signed distance functions / signed distance fields for 2D / 3D shapes.

Being able to do the full set of boolean operations on shapes with just `min` and `max` operations is pretty cool. Doubly so because boolean operations on standard geometry is such a difficult, messy problem.

Bloom filters are interesting. A bunch of different hashes as kind of fingerprint that make search fast but with caveats: negative answer is negative, positive answer might be negative.

HyperLogLogs are a great sketch data structure of which a bloom filter is a special case. The cool thing about them is that you can perform approximate set operation cardinalities in runtime proportional to sketch size rather than the cardinalities of the sets you’re comparing.

+1 for bloom filters

I've always appreciated the Bresenham's line algorithm. It's a very simple algorithm and has been "superseded" by Xiaolin Wu's line which can also do antialiasing, but ever since I learnt about it, I've been very fond of it. Also the fact that it's an algorithm for use in computer graphics helps, because for me it's very easy to get excited about visual stuff as opposed to the more abstract CS stuff, which I have deep appreciation for but can't understand most of or kick myself enough to try.

> It's a very simple algorithm and has been "superseded" by Xiaolin Wu's line which can also do antialiasing

Actually, not really! Bresenham's Line Algorithm secretly has a very useful generic low-level algorithm at its core. It just happened to be created for drawing lines first.

The thing most people miss is that it describes a way to do error-free repeated addition of a fraction using only integers and addition. In fact, you can add many different fractions, as long as they all share a denominator.

That is hugely powerful in the right context: imagine an embedded context where you have to add fractions to a counter, but the counter should be an integer value at all times. Maybe you must add different fractions in different situations. For example, an embedded device with two non-matching frequencies to keep track of over longer periods of time. With this method you're guaranteed that no matter how long you keep adding, the sum will always be correct.

Yep; it's one-dimentional variant of Error Diffusion (which itself is a notably beautiful dithering algorithm), where "input" = slope of line and output is quantized dy=0 or dy=1.

Dithering also appears in audio. The old PC Speaker had merely 1/0 states. It's supposed to be limited to square waves, limited and unpleasant, right? But at some point people figured out if you flip 1/0 fast enough you can approximate continuous values: http://bespin.org/~qz/pc-gpe/speaker.txt

I once played with using "error diffusion" like dithering to play wav files. In theory that should produce better audio than the simple fixed-table and PWM dithering suggesting in above article, but I did this on a much newer computer (Pentium?) by which time PC Speaker was irrelevant (only for hack value) and only reached 30-40 bits per input sample at 100% CPU. Plus as that article explains, simple PWM might(?) actually produce less noise from timing irregularity.

But the technique is not merely an obsolete hack! Every CD player that visibly brags it has "1-bit DAC" uses a closely related technique, though in fast hardware. See: https://en.wikipedia.org/wiki/Delta_modulation https://en.wikipedia.org/wiki/Delta-sigma_modulation Take me with a grain of salt, I always get confused trying to grok these... Full explanations of the noise shaping properties that make Delta-Sigma highly useful involve quite a bit of signal-processing, but the core accumulate-compare loop is a very simple algorithm...

The Bresenham Circle Algorithm is cool too. You can draw a circle with it in just a few lines of code: https://en.wikipedia.org/wiki/Midpoint_circle_algorithm

Boyer-Moore (grep) : https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string-sea... because it's cool.

LR parser https://en.wikipedia.org/wiki/LR_parser because Aho & Ullman

https://en.wikipedia.org/wiki/Church_encoding 'cos it shows how fundamental lambda calculus is in a simple way (and how functional programming is so superior (flame wars intended :-) )

I'm surprised no one mentioned PageRank (or maybe I missed it).

Another favorite is RAFT, simply because of how elegant and simple it is and how easy it is to understand.

Also - I used to work in a company that did P2P video streaming (pre WebRTC) and the way the protocol leveraged Reed Solomon Coding was just awesome. https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_cor...

Probably the most commercially successful algorithm ever invented.

Seconding this. Its super easy to understand and explain to others (albeit implementation I have found to be somewhat more challenging when you dig into the details)

Yeah, once you start considering log rewrites I remember it got a bit more challenging. But the main idea is super elegant and beautiful.

Duff's device. Because you will first look at it in disgust, but then you realize how clever it really is.

See: https://en.wikipedia.org/wiki/Duff%27s_device

Quite a few EA Sports games have Duff's Device in locations of computational bottlenecks. I know because I put them there. Wonderful little mechanism.

That's really interesting to hear. I tried Duff's Device myself, once upon a time, and found it made no measurable difference, so I pulled it back out. This was a long time ago, so I had just assumed that a similar optimization was just built into most C compilers these days. Does it vary by toolchain? I believe I was using clang at the time.

(I was still quite wet behind the ears at the time, so it's also more than possible that I was just doing something wrong.)

All that said, agreed, wonderful little mechanism. It's one of those things that every C programmer should take the time to really understand.

Yes, modern compilers do loop unrolling nowadays, so Duff's device doesn't really have a use. But it's certainly a wonderful bit of history. I found Duff's device when I was researching coroutines in C.

Is there ever a case nowadays where the compiler doesn't do this for you automatically?

No, this is just a wonderful bit of obsolete hacker history. Unless if you mess with retro-architecture for fun (like old PDP machines, be they emulated or real), then it's somewhat relevant.

The first time I saw that thing I just thought "That just can't be valid C syntax". It's quite an amazing hack that seems ugly at first but seems more and more elegant the more you think about it.

These days compilers usually do loop unrolling on their own, so the device has mostly lost its purpose, but it can apparently also be used to implement something similar to coroutines in C, which is nice.

Disgusting indeed: A way to write irreducible loops that doesn't use a goto. Good idea to measure the resulting code, given how poorly most optimizers deal with such loops.

Even sadder because the same effect can be achieved cleanly (and "optimizably")by using as "switch" followed by a "while".

You should read Tom Duff's original email about this:


A quote of the man himself about this: "I feel a combination of pride and revulsion at this discovery."

It's interesting how this seems to be not quite an algorithm - I suppose "device" is an appropriate name. The "See also" section on Wikipedia leads to some good stuff.

Karger's randomized contraction algorithm for finding a min-cut. It's a common algorithm to introduce students into the world of randomized algorithms.

Also a shameless plug. My friend and I came up with this pseudo-polynomial time algorithm for subset sum that can be taught in a single session. It is faster than the standard dynamic programming algorithm. https://arxiv.org/abs/1807.08248

I actually read some of this paper! I liked your FFT trick.

Thanks! Do you use that algorithm for anything in your research?

Not directly. I briefly thought that a certain computational number theory problem might involve subset sums, but alas, it turned out to be the wrong approach.

By far my favorite is Depth First Search (DFS). It's almost a bread and butter algorithm for a lot of problems (at least for quick tests) with only a couple lines long in its recursive form. Some example usage: - combinations - permutations - generating sequences - exploring tree structures - finding paths - solving pretty much any search space problem and much more. It's not the most efficient for the large majority of these cases but it allows you to throw 10-15 lines of code to quickly test an idea before actually working on it.

1. The "Skew Algorithm", aka "DC3", aka Kärkkäinen-Sanders. It uses radix sort in an extremely brilliant way to construct suffix arrays in linear time. I found these explanations helpful (though it still took me some time to digest): http://www.mi.fu-berlin.de/wiki/pub/ABI/SS13Lecture3Material...

2. Fast Fourier transform (FFT). It's another quite brilliant algorithm used for decomposing a function into its frequency components in linearithmic time (among other things).

I would also vote for FFT. It's amazing how widely it's used and what sort of tricks you can do in frequency domain. Multiplying polynomials? Simple! Multiple 2D projections of a 3D object? Just FFT them, do a couple of simple steps and you will get a 3D model.

Do you have some details for these applications ?

The multiplication is concisely described here http://numbers.computation.free.fr/Constants/Algorithms/fft....

For 2D -> 3D, see https://en.wikipedia.org/wiki/Projection-slice_theorem for a simple overview

I have always liked DBSCAN: https://en.wikipedia.org/wiki/DBSCAN

DBSCAN is used for clustering in applications, much like k-means clustering, but in k-means clustering, the number of clusters must be known in advance (hence the k)

DBSCAN solves this problem by for every point in the graph checking if a certain threshold of vertices is within a certain radius. If this is the case, it will add these vertices to the new cluster and repeat the process for these nodes. It is very simple, so much that you can easily explain this to non-technical people.

In my previous job I used it for detecting whether for one word in a signal there only existed one representation or multiple (used for toggle bit detection)

If you like DBSCAN, I would recommend checking out OPTICS. It is not a clustering algorithm per se, but generates information from which clusterings for multiple settings of DBSCAN parameters can be "queried" cheaply. The DBSCAN and OPTICS papers share an author. One my favorite algorithms. This is the original paper - [1], and this looks like a helpful presentation on the topic - [2]. Since you mention k-means, I would point out that unlike k-means which finds convex-shaped clusters only, DBSCAN/OPTICS identify clusters of any shape.

[1] http://www.dbs.ifi.lmu.de/Publikationen/Papers/OPTICS.pdf

[2] https://www.cse.buffalo.edu/faculty/azhang/cse601/density-ba...

That definitely looks very interesting, I'll look into that.

Clustering on non linearly separable clusters is also one of the reasons we chose DBSCAN back then :)

The rsync algorithm.

There is a theorem that it is impossible in general to remotely compare two files with less network traffic than it requires to simply send one file over the wire. But rsync does it. How? By accepting a theoretical but unlikely possibility of coming up with the wrong answer.

What rsync does is compare hashes of ranges of stuff. If the hash comes out the same, the two are assumed to be identical with no further investigation. It is possible that different files will be thought identical. But it is unlikely that any two hashes ever generated by rsync have accidentally been the same when the underlying files were different.

(I do have to say "accidentally" because rsync uses MD5 as a hashing algorithm, and people have deliberately created collisions in MD5.)

rsync was indeed a dramatic "this should not be possible" feat. But the way it uses rolling hashes is not the simplest possible. One side sends hashes of fixed-size chunks, the other scans input for rolling hash matches, which only works for 2 sides and requires non-cachable CPU work on at least one side.

There is an even simpler idea at the core of many modern backup programs (eg. bup, attic, borg)! Possibly pioneered in `gzip --rsyncable`, not sure. https://en.wikipedia.org/wiki/Rolling_hash#Content-based_sli... Each side slices files into chunks at points where rolling hash % N == 0, giving chunks of variable size averaging N. These points usually self-synchronize after insertions and deletions. This allows not just comparing 2 files for common parts, but also deduplicated storage of ANY number of files as lists of chunk hashes, pointing to just one copy of each chunk!

rsync also checks modification dates. you would have to be extremely unlucky for both the checksums and the modification dates to be ok but for the files to be different

The Gilbert-Johnson-Keerthi convex shape intersection algorithm.

It can take any two convex sets and tell you if they overlap, while converging on a separating axis or a penetration point, if it exists. All you need is a support function that returns a point in the set that is furthest along a given direction vector. The algorithm works in an arbitrary number of dimensions.

Basically, it operates on a new shape which is the Minkowski difference of the two intersected sets. If the difference-shape contains the origin, then the two shapes overlap. The algorithm iteratively constructs simplexes inside the Minkowski difference, trying to build one that encompasses the origin. This allows it to have essentially constant memory for a given dimension of problem.

It's a very elegant formulation of a very general problem. It's so pleasing that you can intersect cylinders with cones, or oriented cubes with frustums, or polytopes with superellipses all with the exact same single algorithm.

The Wikipedia writeup is terrible, so I may do my own with nice illustrations at some point.

Neat. I googled, and have been reading this writeup: https://www.medien.ifi.lmu.de/lehre/ss10/ps/Ausarbeitung_Bei... —which seems quite nice so far.

Wow. I'm surprised to see this on here. I used to work Johnson, he's is a super nice guy and always made time to have long conversations with me about optimization.

I believe this algorithm is widely used in the video game industry?

Video games and anywhere you need a rigid body collision simulation. So TV & film visual effects is another common use.

Also researchers who need fast approximate physical dynamics. For example the Mujoco simulator or Drake.

I like the A* pathfinding algorithm. It's not particularly elegant or beautiful on itself, and it's very dependant on the heuristics applied & the topology of the terrain used. Yet the results tend to be quite pleasing to watch.

Also, it has a great website dedicated to it, with lots of interactive demos:


Actually it's pretty elegant by itself, Consider this all traversals are same,

val someDS;



Now replace, someDs with

Queue -> BFS

Stack -> DFS

Priority Queue, with priority of distance so far -> Dijkstra.

Priority Queue, with distance + heuristic -> A*

Its beautiful.

The development of "Shakey the robot" led to the discovery of A* along with a few other algorithms.


Minimax search https://www.cs.cornell.edu/courses/cs312/2002sp/lectures/rec... How knew that a couple of line of code could be used cause so many interesting behaviors like cooperation, betrayal etc.

Yes and alpha beta pruning. Your link mentions that.

Dancing Links (Knuth's paper: https://arxiv.org/pdf/cs/0011047.pdf )

> In computer science, dancing links is the technique suggested by Donald Knuth to efficiently implement his Algorithm X. Algorithm X is a recursive, nondeterministic, depth-first, backtracking algorithm that finds all solutions to the exact cover problem. Some of the better-known exact cover problems include tiling, the n queens problem, and Sudoku.

~ https://en.wikipedia.org/wiki/Dancing_Links

Also SEQUITUR http://www.sequitur.info/

> Sequitur (or Nevill-Manning algorithm) is a recursive algorithm developed by Craig Nevill-Manning and Ian H. Witten in 1997[1] that infers a hierarchical structure (context-free grammar) from a sequence of discrete symbols. The algorithm operates in linear space and time. It can be used in data compression software applications.

~ https://en.wikipedia.org/wiki/Sequitur_algorithm

Part of the reason I like these both is that they each require sharp pointer manipulation to work efficiently.

I'm a functional programming fan, these beautiful and useful algorithms remind me that purity isn't everything, eh? Also, it's fun to think about how you might get FP code to emit correct pointer-munging code for them.

+1 Dancing Link. Donald Knuth rules !

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact