

Median of Medians Algorithm for finding the median of an array of numbers - dhruvbird
http://www.ics.uci.edu/~eppstein/161/960130.html

======
davidtgoldblatt
I take issue with this: "Line up elements in groups of five (this number 5 is
not important, it could be e.g. 7 without changing the algorithm much)."

The correct statement is that you can replace 5 with any _larger_ number and
the algorithm still works, but 5 is the least number you can pick here. That's
the most interesting part of the algorithm to me - that "5" is connected to
median-finding in the same way that "2" is connected to divide in conquer
searching in a sorted array. But 2 is more obviously interesting than 5 (you
can say "dividing something in 2 makes it smaller", but there aren't very many
similar things you can say about 5).

~~~
onan_barbarian
I may be having a durpy Saturday afternoon, but why is 3 not a valid choice
for the group size?

I am not saying 3 is a good number in a pragmatic sense, just that eyeballing
the informal proof doesn't seem to yield any obvious problems with using
groups of 3 for the desired result.

~~~
dhruvbird
I haven't worked out the recurrence myself, but considering 3 as the partition
size, we get the recurrence [1] (below).

T(n) = T(n/3) + 4n/6 + O(n) __[1]

It seems as if this doesn't workout to O(n) whereas if the middle term is
7n/10 (parition size = 5), it does work out.

------
cschmidt
Here's a good public domain implementation of some fast median algorithms, in
C.

<http://ndevilla.free.fr/median/median/index.html>

I've used these codes on some very large datasets, and they work very well.

------
gorset
I feel that radix sort would be a better choice, and maybe faster as well. MSB
in-place radix sort can sort an arbitrary range, which can be used to find the
median very quickly. You simply count the elements for each buckets, and then
continue with the bucket containing the median value.

Of course, this requires 0 comparisons (but you might want to to a insertion
sort for the final round), and a maximum of 4 rounds for 32 bits numbers using
8 bits buckets.

~~~
xyzzyz
The thing is, you do not always want to find a median of a set of integers --
not even most of the time. Frequently you want to find a median of floats, or
use a custom ordering. Real life is not so simple.

~~~
gorset
Floats doesn't complicate radix sort that much. To quote McIlroy et al. (1993)
"the troubles with radix sort are in implementation, not in conception".

Finding the median strikes me as more of a academic problem than a real life
problem. However, partial sort (for which finding the median is a sub problem)
is an easy optimization relevant for search engines, where you sometimes only
need to extract the hits from position 1000-2000.

------
kens
The title is wrong - this algorithm (as the article explains) finds a value
near the median, not the actual median. This value is useful as a qucksort
pivot but is not the median. Edit: oops, I'm wrong. I missed the recursion
step that uses this to get the median.

The median of medians is not necessarily the median. E.g. take 1 2 5, 3 4 6, 7
8 9: medians are 2, 4, 8. Median of medians is 4 but overall median is 5.

~~~
dhruvbird
Actually, the median of the medians is used to find the real median

------
ximeng
How do you get the median of 5 items in 6 comparisons? I can see how to do it
in 9, but not 6.

Edit:

Found an answer:

[http://stackoverflow.com/questions/4761405/compute-median-
of...](http://stackoverflow.com/questions/4761405/compute-median-of-up-
to-5-in-scala)

------
reso
Love that algorithm. Wikipedia has a good visualization for those interested:
[http://en.wikipedia.org/wiki/Selection_algorithm#Linear_gene...](http://en.wikipedia.org/wiki/Selection_algorithm#Linear_general_selection_algorithm_-
_Median_of_Medians_algorithm)

------
zheng
Is there something about the page that I'm missing? Is it just about median of
median pivot selection for the selection problem? If so, isn't this fairly
common knowledge? I know my entry-level algorithms class covered it in some
depth.

~~~
kragen
I learned about Quicksort in about 1993, and I think I wrote the first
versions of some of the Wikipedia pages on sorting algorithms back in 2001,
but I don't remember learning about the median-of-medians approach for
Quickselect until last year.

