
The Median-of-Medians Algorithm - MidsizeBlowfish
http://www.austinrochford.com/posts/2013-10-28-median-of-medians.html
======
susi22
Note that you shouldn't use the algorithm in practice (unless you need certain
guarantees) since the average performance isn't that great. This is just like
Mergesort vs Quicksort where Mergesort guarantees O(NlogN) yet Quicksort is
usually much faster. The constant for the number of comparisons is quite high.
Compare this to the _average_ complexity for Quickselect with these pivot
strategies:

\- Random pivot (ie Median of 1) has 3.39N comparisons

\- Median of 3 has 2.75N

If you want the fastest possible algorithm you should follow

[http://dx.doi.org/10.1109/TSP.2012.2197394](http://dx.doi.org/10.1109/TSP.2012.2197394)

which proposes to choose the median of N^{2/3} _1 /(4_pi)^{1/3} elements as
the pivot and then also uses a different strategy to choose the second pivot
(close to the median yet "safe").

Another fast algorithm is Floyd & Rivest's SELECT (yes the same crypto Rivest)
which is also asymptotically optimal and implementations exist. (Paper title:
"Expected time bounds for selection")

~~~
wfunction
> "Note that you shouldn't use the algorithm in practice"

You certainly can:

1\. Run the randomized algorithm

2\. If if the usual way takes too long (more than O(n) for whatever constant
factor makes sense to you), just stop doing more work and switch to the
deterministic algorithm

Now you get the best of both speeds, within a small constant factor (like 2).

~~~
anonymous
Or, since we have enough cores, run both in two parallel threads, keeping the
original list read-only.

~~~
crb002
Or distribute out the median_5() work to many cores at once since each call is
independent of the others.

------
vowelless
This is one of my favorite algorithms. I use it frequently in my code
(example: filtering noise in a real time system) and get giddy when I talk to
others about it. One of the contributors to this algorithm was Manuel Blum
[0], whose contributions (and his students') touch pretty much every corner of
Computer Science.

His lecture notes from his course at Carnegie Mellon describe this algorithm
quite clearly [1].

[0]
[http://en.wikipedia.org/wiki/Manuel_Blum](http://en.wikipedia.org/wiki/Manuel_Blum)

[1]
[http://www.cs.cmu.edu/afs/cs/academic/class/15451-s07/www/le...](http://www.cs.cmu.edu/afs/cs/academic/class/15451-s07/www/lecture_notes/lect0125.pdf)

~~~
shriphani
Minsky -> Blum -> Adleman, Minsky -> Blum -> Goldwasser are 2 hat-tricks of
Turing awards involving Blum.

------
awoolf27
According to the man himself, this algorithm was found by Manuel Blum when
trying to prove that it was impossible to do k selection in linear time
deterministically

~~~
svantana
Do you know if that's been proven, or is it still an open question?

~~~
awoolf27
The algorithm described in the article is deterministic and linear time --
exactly what Blum was trying to disprove as possible

------
thesz
This is so trivial!

Actually, if you use lazy data structures, then sorting the list lazily and
taking i-th element will give you the best possible complexity:
[http://en.wikipedia.org/wiki/Selection_algorithm#Language_su...](http://en.wikipedia.org/wiki/Selection_algorithm#Language_support)

In Haskell, it is nothing more than "ithSmallest xs i = sort xs !! i".

~~~
jackpirate
Pretty sure this is wrong. For example, what about the case where k=n? Then
the lazy evaluation will have to fully sort the list before it can determine
the kth element, resulting in an O(n log n) algorithm.

In the standard idiom,

    
    
       smallest = head . sort
    

we're only looking for the case of k=1, so we can do it in O(n) time.

Note that if you used a lazy BST, then you could probably still get the last
element in O(n) run time, but your example code doesn't do that. AFAIK, no
standard library routine exists for this.

~~~
sgk284
If your sort is lazy enough, the `(sort xs) !! k` approach will work fine.
Haskell's default sort isn't lazy enough, but you could certainly make it so.

You don't have to fully sort the list before determining the kth element
(again assuming a sufficiently lazy implementation). Once you know the element
is in the second half of the list, the sort simply will never sort the first
half.

~~~
bdonlan
It should be noted that the additional overhead from performing lazy quicksort
(all those additional thunks) make it an unlikely choice for the standard sort
function; you'd suffer a large constant factor overhead in the typical case of
processing the entire list in sorted order.

~~~
sgk284
Yep, definitely should have noted that. It's an interesting mental exercise
though.

------
wfunction
If you like this, go look into the suffix array construction algorithm, called
DC3 ("difference cover, modulo 3"). It's even more mindblowing.

~~~
crb002
Pang Ko's is cooler IMHO but then we are old lab mates.

------
jmlundberg
The C++ standard library method std::nth_element states complexity; linear
time on average. (
[http://en.cppreference.com/w/cpp/algorithm/nth_element](http://en.cppreference.com/w/cpp/algorithm/nth_element)
)

I wonder if there are other possible implementations?

------
crb002
I'm coding it in Ruby. Check back tomorrow and I should have it finished.
[https://gist.github.com/chadbrewbaker/7202412#file-median-
rb](https://gist.github.com/chadbrewbaker/7202412#file-median-rb)

~~~
nskelsey
Here is a python implementation.
[https://github.com/NSkelsey/algorithms/blob/master/select.py](https://github.com/NSkelsey/algorithms/blob/master/select.py)

~~~
ratpik
Sorting the sub-lists to find the 5 medians by itself is at best a O(nlogn) as
per this example. That would make the algorithm non O(n). Or am I missing
something?

~~~
virgi1
Yes you are. It's O(nlogn) where n is 5 - i.e. it's constant time (you can
easily find the median by 4 comparisons to find the smallest of the 5,
followed by 3 comparisons for the 2nd smallest, followed by 2 comparisons for
the median - a total of 9 comparisons)

~~~
ihateloggingin
Put otherwise: to sort them all it's O(n/5*(5 log 5)) = O(n).

------
NotSoNormal
Nice writeup. There's a good source already though:
[http://www.cs.cmu.edu/~avrim/451f11/lectures/lect0908.pdf](http://www.cs.cmu.edu/~avrim/451f11/lectures/lect0908.pdf)

------
myle
You can use the same idea in quicksort while choosing the pivot (and make it
O(n log n)) but you don't because it is expensive.

There is a very nice and simple probabilistic algorithm instead.

------
newobj
So embarrassing when a candidate busted this out on me in an interview - but
couldn't quite explain it - so I was convinced he was wrong / out of his mind.
Later I looked it up in the CLR and sure enough...

------
curiousDog
Well written. This was the first algo we were taught in our algorithms design
course from CLRS. Blew my mind at that time.

------
felipecruz
very cool algo!! a C impl:
[https://github.com/felipecruz/paa/blob/master/t1/knth_n_mom....](https://github.com/felipecruz/paa/blob/master/t1/knth_n_mom.c)

