
What is the fastest algorithm to find the largest number in an unsorted array? - bemmu
https://www.quora.com/What-is-the-fastest-algorithm-to-find-the-largest-number-in-an-unsorted-array/answer/Thomas-A-Limoncelli?share=1
======
graycat
Without more information, it is necessary to look at least once at each number
in the array. If the array is of length n, the the performance is O(n). But
looking at each number in the array just once, also O(n), is also sufficient.
So, looking at each number in the array just once is the fastest and is O(n).
Done.

~~~
dalke
Yes. And the answer wasn't about that. Quoting the end of the essay:

> So if this question was a homework assignment... the answer is O(N): do a
> linear search through the list and examine every single item.

> If this question is an honest request... put down the keyboard. Get off your
> ass and walk down the hall. Talk to all the people that are involved and
> figure out what real need is and re-examine whether the "largest value" is
> actually needed, how it is needed, and whether the entire process can be
> improved by looking at it end-to-end.

> Get out of your silo. Talk to people. You'll always get better results.

~~~
graycat
I saw the question at Quora within the last few days, thought of my answer
then, then seeing the question here, although just in the title, typed in my
answer here just from reading the title here and not the essay. Sorry 'bout
that!

This question had to be the easiest question on the fastest algorithm ever!

Some of the other questions at Quora were more challenging.

In my answer, I improved on the question by assuming also, that, even though
the array was "unsorted", really, we needed to assume more, i.e., that we knew
no more about the array. E.g., the array might be unsorted but, still, in,
say, as happens with some versions of the fast Fourier transform, in _bit
reversed_ order. Then there is a faster answer!

The essay has a good point but is not fully correct.

BTW, I prefer _sequential_ search to _linear_ search.

That point aside, actually the essay overstates its case. E.g., in my startup
I need to look at, say, 100 million numbers one at a time, where I know
nothing about the order, and end up with the, say, 50 largest.

So, how to do that? My solution: allocate an array x of, say, 50 components.
So assume we have array components x(i) for i = 1, 2, ..., 50.

Regard the array as a _descending heap_ , that is, a heap as in heap sort and
descending in the sense that x(1) is the smallest component of the array x. Of
course the heap means that x(i/2) <= x(i), i = 2, 3, ..., 50. Here, of course,
the i/2 is the usual result of division with integers, that is, the i/2 is the
largest integer <= the fully accurate version of i/2.

Sure, the 50 was just for definiteness; say the number of components in the
array is positive integer m.

And, let y be an array with y(j) for j = 1, 2, ..., p where p is, say, 100
million or so.

Initially, just put y(j) in the heap in x for j = 1, 2, ..., m.

Then in a loop for j = m + 1, m + 2,..., p, compare y(j) with x(1). If y(j) <=
then x(1), then y(j) is not among the 50 largest, and we f'get about y(j).
Otherwise we assign y(j) to x(1) and do a _sift_ operation to return array x
to a heap.

So, when m is more than a few but not in the billions, that looks like a
reasonably fast algorithm.

Note that when the elements of y are just any _random_ permutation, fairly
soon the heap in array x will have relatively large components so that inserts
into the heap will become relatively rare and, really, the comparison with
x(1) will be all the effort needed.

So, we are using the heap algorithm to build, say, a _priority queue_. There
are other important uses in reality!

~~~
dalke
> The essay has a good point but is not fully correct.

The essay's point is that "fully correct" is a meaningless concept without
knowing the goal.

For example, you modified the goal to find the top _k_ elements, and described
the standard heap selection method.

Even there, you are not "fully correct". The last time I implemented it was in
a situation where I knew that the values were 0.0<=x<=1.0 and where there
could be many 1.0 values. If the minimum value in the heap were ever 1.0 then
the search could stop, giving sublinear time for some cases.

The essay's point was, find out more of what's needed before answering the
question, because in reality few people really need that specific question
answered.

~~~
graycat
We're in 99.9999% full agreement.

The problem I solved with a heap really is the real problem I needed to solve.

Long ago when I was relatively knowledgeable in computing, two profs trying to
teach a course in computing (I'd been teaching such at Georgetown) asked me
about what lessons I would emphasize. I said "A programmer stands between the
machine and the real problem, understands the real problem, for a solution
finds some appropriate structure, say, divides the work into relatively
independent pieces each of which is relatively easy to understand, tries to
make the structure relatively robust to likely changes in the real problem,
and then programs that." That view was many years ago. I was and I am a big
fan of looking at the real problem.

