

On “order by” optimization - 719Ben
http://dom.as/2015/07/30/on-order-by-optimization/

======
acqq
In short, if I understood, the author used

SELECT c,d FROM t WHERE b=X ORDER BY a DESC LIMIT 1

on MySQL and was surprised to find that MySQL was using index over a and in
his opinion fully ignored the existence of the index over (b,c) to discover
where b is equal to X? I admit I didn't understand his explanation why,
especially not why it would maybe do the right thing with the LIMIT 2. I can't
imagine that any serious SQL engine would ignore the possibility to use the
index in such a case.

~~~
goldenkey
Its usually better to be explicit and use aggregates like MAX or MIN instead
of a LIMIT of 1.

~~~
crishoj
As in SELECT MAX(a), c, d FROM t WHERE b=x?

MySQL will return a random c and d, not necessarily the c and d from the row
with maximum a...

~~~
acqq
Goldenkey actually posted

SELECT c,d FROM t WHERE a=(SELECT MAX(a) WHERE b=X from t)

But that post got flagged because he also gave his opinion on the level of
understanding the article writer has. The query you posted is actually the one
by the article writer.

~~~
crishoj
I see. Thanks.

------
Animats
Right. MySQL's optimizer knows how many records each table has, but not much
else. It doesn't do statistical tests on the tables. Depending on table
contents, "WHERE b=X" can select any number of records, from zero to the whole
database. Whether the optimal strategy is to do the "WHERE b=X" first,
possibly getting a large number of hits, or simply making a sequential pass,
is not decidable from the database size only.

An alternative would be for the optimizer to do an indexed SELECT on b for
"WHERE b=X", and if the number of hits exceeded some threshold, abandon that
effort as inefficient and start over using a sequential pass. But MySQL
doesn't abandon strategies once selected, or keep such statistics as a hint
for future queries. As someone pointed out, Oracle's flagship database product
does do that. Anyone know about Postgres?

------
th3iedkid
>>Unfortunately, optimizer doesn’t understand, that there’re humans who are
building these systems, and humans have their own rationale and thinking. One
of the ideas that a human would have is that if you have 100GB table, you
better understand things like data locality and other sorts of efficiencies.

I have not used MySql and cannot comment for that but with my experience with
Oracle, i believe its histogram generation process is actually trying to solve
the same problem.It has a binning strategy for histogram generation that tries
to help with data patterns.

[http://docs.oracle.com/database/121/TGSQL/tgsql_histo.htm](http://docs.oracle.com/database/121/TGSQL/tgsql_histo.htm)

------
zapt02
Great read!

