Hacker News new | comments | show | ask | jobs | submit login
Median of Medians Algorithm for finding the median of an array of numbers (uci.edu)
28 points by dhruvbird 2136 days ago | hide | past | web | 13 comments | favorite

I take issue with this: "Line up elements in groups of five (this number 5 is not important, it could be e.g. 7 without changing the algorithm much)."

The correct statement is that you can replace 5 with any larger number and the algorithm still works, but 5 is the least number you can pick here. That's the most interesting part of the algorithm to me - that "5" is connected to median-finding in the same way that "2" is connected to divide in conquer searching in a sorted array. But 2 is more obviously interesting than 5 (you can say "dividing something in 2 makes it smaller", but there aren't very many similar things you can say about 5).

I may be having a durpy Saturday afternoon, but why is 3 not a valid choice for the group size?

I am not saying 3 is a good number in a pragmatic sense, just that eyeballing the informal proof doesn't seem to yield any obvious problems with using groups of 3 for the desired result.

I haven't worked out the recurrence myself, but considering 3 as the partition size, we get the recurrence [1] (below).

T(n) = T(n/3) + 4n/6 + O(n) __[1]

It seems as if this doesn't workout to O(n) whereas if the middle term is 7n/10 (parition size = 5), it does work out.

Here's a good public domain implementation of some fast median algorithms, in C.


I've used these codes on some very large datasets, and they work very well.

I feel that radix sort would be a better choice, and maybe faster as well. MSB in-place radix sort can sort an arbitrary range, which can be used to find the median very quickly. You simply count the elements for each buckets, and then continue with the bucket containing the median value.

Of course, this requires 0 comparisons (but you might want to to a insertion sort for the final round), and a maximum of 4 rounds for 32 bits numbers using 8 bits buckets.

The thing is, you do not always want to find a median of a set of integers -- not even most of the time. Frequently you want to find a median of floats, or use a custom ordering. Real life is not so simple.

Floats doesn't complicate radix sort that much. To quote McIlroy et al. (1993) "the troubles with radix sort are in implementation, not in conception".

Finding the median strikes me as more of a academic problem than a real life problem. However, partial sort (for which finding the median is a sub problem) is an easy optimization relevant for search engines, where you sometimes only need to extract the hits from position 1000-2000.

The title is wrong - this algorithm (as the article explains) finds a value near the median, not the actual median. This value is useful as a qucksort pivot but is not the median. Edit: oops, I'm wrong. I missed the recursion step that uses this to get the median.

The median of medians is not necessarily the median. E.g. take 1 2 5, 3 4 6, 7 8 9: medians are 2, 4, 8. Median of medians is 4 but overall median is 5.

Actually, the median of the medians is used to find the real median

Love that algorithm. Wikipedia has a good visualization for those interested: http://en.wikipedia.org/wiki/Selection_algorithm#Linear_gene...

Is there something about the page that I'm missing? Is it just about median of median pivot selection for the selection problem? If so, isn't this fairly common knowledge? I know my entry-level algorithms class covered it in some depth.

I learned about Quicksort in about 1993, and I think I wrote the first versions of some of the Wikipedia pages on sorting algorithms back in 2001, but I don't remember learning about the median-of-medians approach for Quickselect until last year.

How do you get the median of 5 items in 6 comparisons? I can see how to do it in 9, but not 6.


Found an answer:


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact