
The 3-minute SQL indexing quiz: Can you spot the five most common mistakes? - MarkusWinand
https://use-the-index-luke.com/3-minute-test
======
rockostrich
Hmm, I got 5 out of 5 for the Postgres, but the questions seem pretty easy for
anyone who has dealt with trying to create useful indexes in different
databases and debugged slow running queries on an index.

`EXPLAIN` is probably the most useful word in all of SQL. I didn't know how to
read `EXPLAIN` results for the first few months of first starting to work with
raw SQL for the first time. It made life so much easier when I could just take
a query, throw an `EXPLAIN` in front of it, and have an answer as to what was
being used to fetch my data.

~~~
aetherson
I got 5/5 as well. My only hesitations were, "Is this a trick question? Like,
is there some little-know feature where they've special cased this so that
something that doesn't seem like it ought to work actually does?"

~~~
3pt14159
I was on the fence for the "not enough information" question because I thought
that for outlandish examples I thought there might be a chance the second one
was faster, but I thought that the 99.5% case was covered.

I feel kinda robbed.

~~~
rockostrich
Yea, I was a little unsure about that one, but realized that the columns in
the `SELECT` were the indexed column and a `COUNT` making it index-only scan.
Even if the `SELECT` included `b` it still would've slowed down things a tiny
bit, but I think I would've chosen "Doesn't really change performance" in that
case.

~~~
endymi0n
Playing devil's advocate here, especially for Postgres this isn't a clear cut
case, as because of Postgres' MVCC implementation, depending on whether the
rows were already frozen or not (caused by lots of mutations ongoing and not
enough VACUUMs), it might have _either_ used an index-only scan (only from 9.2
on and only if most pages are visible-for-all, so the query scanner allows the
optimization). In the other cases, it would have had to fetch all rows from
disk anyway, in which case "not enough information" would have been true...

~~~
anarazel
Nit-Pick: Freezing of rows isn't relevant here, just whether they're all-
visible.

------
aNoob7000
I would guess that at least half of the programmers where I work would have
failed. I wonder how competent developers are at other companies with their
SQL databases?

I can tell you that as a production DBA when I'm brought in regarding a
performance issue, the stuff that I see makes me wonder how some people remain
employed as programmers. Full table scans on 50M row table, no problem. Query
the database 10,000 times because you don't want to run one query and parse
the results, no problem.

I just want to scream sometimes!!!

------
acconrad
If you didn't get a 4 or 5 out of 5, you should go through this website, it's
pretty awesome. It explains SQL performance really well, and provides
equivalent examples for whatever database you use (including Oracle and DB2 in
addition to MySQL and Postgres).

------
Gys
I got 4 out of 5 for mysql. Should have given it more thought :-)

A suggestion: also mention that the correct results with an explanation are
shown at the end. It was not clear to me and therefore I decided to answer
just quickly. But maybe that is intentional ;-)

------
wongarsu
How to get 5/5: Visualize a B-Tree that indexes the stated columns in the
stated orders. Is the query faster to answer by traversing that B-Tree (where
faster means looking up fewer records, or not needing to look them up at all)?
Is there a different B-Tree we could build that makes this even faster? If you
have problems visualizing a b-tree: any somewhat balanced ordered binary tree
has basically the same properties.

Indexes are not complicated (at this level of detail). Maybe the bad results
(only 40% got >50% right) are mostly a testament to the state of computer
science education around the world.

------
xuanyue
I'm also getting 5 out of 5 for the Postgres, but I think these questions are
helpful. I'm not a DBA, but just guess with my knowledge when I take the
database implementation course.

------
captaincrowbar
I got 4/5 (background: C++/C# programmer who sometimes works with SQL
queries). The one I got wrong was the first one; I thought the query optimizer
would be smart enough to spot that the date function could be turned into a
range query, but apparently not.

It's interesting to compare this with the 4th question, which more or less
does exactly the same thing only with text instead of dates. They both have a
WHERE clause that superficially tests the result of a function call but can be
easily transformed into a simple range check on an indexed column. I don't
know why the optimizer seems to be able to handle one case but not the other.

------
felixge
I got 4/5 on PostgreSQL. I selected "Not enough information" on question 5,
which is supposed to be the wrong answer. But I think it's the correct answer.

Depending on the distribution of the data in the table (e.g. all rows have a =
38, but no rows have b = 1), PostgreSQL will chose a Sequential-Scan instead
of an Index-Only scan for both queries. This of course leads to the 2nd query
being faster due to not having to aggregate anything.

[https://gist.github.com/felixge/af97c844cb1f24ac278e0357741c...](https://gist.github.com/felixge/af97c844cb1f24ac278e0357741c4ea9)

~~~
kasparsklavins
I don't agree. The same could be said for the other questions as well.

In quizzes like these its usually assumed the worst possible scenario/dataset.

~~~
felixge
Good point. But the other questions didn't have a "Not enough information"
choice. So I think the question should be improved to avoid false negatives in
the results.

------
IgorPartola
I was under the impression that unlike MySQL, Postgres can use two indecies at
once when doing a query. So in MySQL you could do a query like `SELECT * FROM
tbl WHERE a = 1 ORDER BY b` and you'd want an index on (a, b). But in Postgres
you could have a separate index on a and b and you'd get the performance gain.
In fact I thought in Postgres land you'd be better off creating them as
separate indecies.

Is this info now outdated?

~~~
MarkusWinand
Not's not just outdated, its wrong (in context of question 2) which is about
indexing both, `where` and `order by` clauses. In that case you must provide a
single index to benefit from the index order (so that the database doesn't
actually need to perform a sort operation).

------
andr3w321
I failed <100%. _shrugs_ The problem I have with tests like this is it uses
syntax I've never used before and I'm not sure how it works. Also the real
world involves testing out and seeing what works and what doesn't. For
example,

#1 CREATE INDEX tbl_idx ON tbl (date_column) SELECT COUNT( _) FROM tbl WHERE
EXTRACT(YEAR FROM date_column) = 2017

I've never used EXTRACT() in my life, so I don't know if it's index aware. I
do know in real life I would write "WHERE date_column >= '2017-1-1' AND
date_column <= '2017-12-31' or if I were querying a school_year or something
that spanned between two years I would add another column and probably an
index on that column and query by that not by the date_column.

#2 CREATE INDEX tbl_idx ON tbl (a, date_column)

SELECT _ FROM tbl WHERE a = 12 ORDER BY date_column DESC FETCH FIRST 1 ROW
ONLY

Who uses FETCH FIRST 1 ROW ONLY instead of LIMIT 1? Also indexes can be
ordered by ASC or DESC so there's a possibility of a small optimization there.

#4 CREATE INDEX tbl_idx ON tbl (text varchar_pattern_ops)

SELECT * FROM tbl WHERE text LIKE 'TJ%'

I thought the general philosophy regarding indexes was create the ones you
know you need like on foreign keys and very common ones like last_name, SSN
etc and then if you notice queries running slow add more and test. This seems
like one of those examples. Do you really need an index here? How often are
you making this query and what's the speedup gained if you add an index?

#5 CREATE INDEX tbl_idx ON tbl (a, date_column)

SELECT date_column, count( _) FROM tbl WHERE a = 38 GROUP BY date_column

Let's say this query returns at least a few rows.

To implement a new functional requirement, another condition (b = 1) is added
to the where clause:

SELECT date_column, count(_) FROM tbl WHERE a = 38 AND b = 1 GROUP BY
date_column

This seems like another case in the real world where you might test what the
slowdown is, and how often you're running this query. If needed you can add
another index or run a subquery first, but the answer to this question can be
found out in < 10seconds in the real world more quickly by testing it than
even spending the time thinking about it.

~~~
eighthnate
> If needed you can add another index or run a subquery first, but the answer
> to this question can be found out in < 10seconds in the real world more
> quickly by testing it than even spending the time thinking about it.

This is true for some situations. But understanding indexing and databases can
save you from headaches that come from structural problems. Not every query
performance issue can be solved by throwing more indexes on a table and using
it.

It's the idea of prophylaxis. An ounce of prevention is worth a pound of cure.

And if you are using production as a testing environment and waiting for
problems in production to tweak your queries, then you probably won't last
long in software development.

Indexes are the basics. Everyone should have grasp of it at the basic level.

------
yellowapple
5/5 for Postgres. I'd chalk that up less to being any kind of Postgres expert
and more to having already read through most of that site (and having taken
previous versions of the test).

~~~
MarkusWinand
In that case, everything below 5/5 would be very bad ;)

------
stuff4ben
2 out of 5 for Oracle. But I haven't touched SQL in years, so not surprising I
don't remember much. I do have to say I would have gotten 2 more correct if I
wasn't so impatient :)

------
scenequeue
Working without JavaScript is cool and I got 2/5

------
aurelijus
Got 4/5\. Assumed that text field is text type, which doesn't let you have
index, so it was an obvious bad fit..

------
tzakrajs
5/5 but only because I have ran into these problems on my own in the past.

edit: MySQL

------
firefoxd
EXPLAIN driven SQL programming is my motto.

