

How to assert that your SQL does not do full table scans - kristiandupont
http://www.bestbrains.dk/Blog/2010/03/25/HowToAssertThatYourSQLDoesNotDoFullTableScans.aspx

======
rosser
Table scans aren't the enemy; sometimes, they're actually cheaper than index
scans. Random IO is expensive, especially with spinning platters. It's likely
that, if you're pulling enough rows off disk, you'll end up in a situation
where you're hopping back and forth across the table, seeking left and right,
and "head-thrashing". Sometimes it really is cheaper to simply read the entire
table and deal with finding the rows you're interested in once it's buffered
-- particularly if your disk controller is smart about readahead.

Sure, for simple cases where you're pulling a row or two from the heap, an
index scan will be vastly cheaper, but just because they're sometimes better,
doesn't make them _always_ better.

~~~
mullr
But for those cases where you are just pulling a row or two, this is a nice
technique. It hints at something more interesting, I think.

Fundamentally it wants to be an assertion against the schema. It would be
better to be able to ask the database about the worst-case complexity of a
particular query, then write assertions against that. (I don't think this is
the same as 'explain', it's more like 'explain as if' with some structure
behind it)

Something like: assertEqual( O(log(n)), schema.worstCaseComplexityOf('SELECT
text FROM response WHERE questionId = 27 AND participantId = 38'))

Even better would be a schema language that lets you build in these assertions
directly. This is really something you should be thinking about while
designing the schema, after all.

------
ck2
Does MySQL also have a full scan counter?

Ah here it is:

    
    
       log_queries_not_using_indexes = on
    

[http://dev.mysql.com/doc/refman/4.1/en/server-
options.html#o...](http://dev.mysql.com/doc/refman/4.1/en/server-
options.html#option_mysqld_log-queries-not-using-indexes)

------
tbrownaw
> SELECT text FROM response WHERE questionId = 27 AND participantId = 38

> If there is no index on response(questionId, participantId), the database
> needs to do a full table scan

With only an index on one of the columns, Oracle at least will still use that
index.

~~~
kevingadd
I can't imagine any halfway decent database engine failing to utilize at least
one index. Maybe the author is thinking of cases where you have multi-column
indexes that are ordered incorrectly and as a result can't be used? I'm more
inclined to suspect that they just don't know what they're talking about,
though. It's pretty trivial to verify that any decent database engine will use
one of your indices to optimize that query if possible.

The hack for detecting table scans seems like it would be super useful if
using MSSQL but I'm inclined to be wary of such a solution to what appears to
be a basic education problem.

It takes about 2 minutes to test this out in SQLite:

CREATE TABLE response ( text TEXT, questionId INTEGER, participantId INTEGER
);

CREATE INDEX i1 ON response (questionId); CREATE INDEX i2 ON response
(participantId);

EXPLAIN SELECT text FROM response WHERE questionId = 27 AND participantId = 38

The resulting bytecode clearly makes use of an index instead of performing a
table scan. If you drop both indexes and explain the query again, the bytecode
changes to indicate that it is performing a table scan.

~~~
ergo98
>The resulting bytecode clearly makes use of an index instead of performing a
table scan. If you drop both indexes and explain the query again, the bytecode
changes to indicate that it is performing a table scan.

You have no data. It knows that. It's showing you a query plan based upon no
stats.

[http://www.sql-server-
performance.com/articles/per/index_not...](http://www.sql-server-
performance.com/articles/per/index_not_equal_p1.aspx)

Fairly true across different RDBMS products.

~~~
kevingadd
Yes, but the original post makes no mention of things like statistics. It
states a general principle that is, in very many cases, not true.

Of course if the database engine has statistics on the table it can choose
_not_ to use the index, the point is that I don't know of any database engine
that _isn't able_ to use the index.

------
bpyne
What SQLServer provides is interesting: I need to read more about it.

From the Oracle side, you'd normally run the query through "explain plan".
This utility generates the optimizer's plan for query execution. Reading the
plan takes some practice but you clearly see where full table scans, indexes,
etc. are used. All the Oracle databases I've developed against for the past 10
years use cost-based optimization: the plan utility shows you the cost at each
step, the overall cost, the cardinality, and the space used.

If you want to get more into the system figures, you can set trace on and run
the query. After execution, you'll have a file on the db server that shows
disk I/O stats and many other system level measurements.

My own workflow is that I write the query first - typically nowadays in Toad -
and run explain plan against it. I make sure the data returned is what is
required and then move onto plan tuning. When I'm satisfied with the plan and
overall cost, I copy/paste the query into my code.

It sounds like the overall workflow is different when developing against
SQLServer.

~~~
vyrotek
My workflow is very different when using SQL Server. Usually it involves
writing a query the only way I know how, sending it to a DBA to review it,
getting laughed at and then having the DBA rewrite the whole thing.

~~~
bpyne
I feel for you. Eight years ago I worked in a department encouraging that
workflow, except I was on the DBA team. We could be a little (let's be nice
and say) menacing at times.

------
yread
Well, in my experience full table scans are quite rare. What happens often
though is an index scan, but those happen so often that you would get a ton of
false positives.

Plus, at least in my company, the unit tests run on a different database which
means different statistics, different number of rows and different decisions
in the query optimizer.

------
rbanffy
Looks really painful to do. I had better memories of my relation with SQL
Server than this.

Also, if your table is sufficiently small, a clever database would prefer a
table scan instead of using the index, rendering the check useless.

The _only_ sane way it to check with a representative amount of data.

------
hackermom
This is slightly besides the topic, but since it can be of much help it's
worth to mention in case someone learns something: primary keys are indexes
per their own nature. The KEY keyword is actually an alias for INDEX in pretty
much all RDBMS. So, if you can't index an already large database for whatever
reason, and happen to have a primary key column that could be used in a
similar way, you're suddenly better off. It's also the fastest index
available.

