

MySQL and partitioning tables with millions of rows - chrismoos
http://chrismoos.com/2010/01/31/mysql-partitioning-tables-with-millions-of-rows/

======
kogir
While the write up was good, the author makes assumptions about what's
happening seemingly without checking. He didn't post the actual query or the
original and final query plans.

Partitioning is a rather complex solution to his problem. See the limitations
at [http://dev.mysql.com/doc/refman/5.1/en/partitioning-
limitati...](http://dev.mysql.com/doc/refman/5.1/en/partitioning-
limitations.html)

I can't know for sure since there's not enough information there, but I'd
suspect a simple index would have done the trick.

    
    
      CREATE INDEX my_index ON Location (device_id, date_added desc)
    
      SELECT *
      FROM Locations USE INDEX (my_index)
      WHERE device_id = ...
      ORDER BY date_added DESC
      LIMIT 6
    
      SELECT *
      FROM Locations USE INDEX (my_index)
      WHERE device_id = ...
      AND date_added BETWEEN ... AND ...
      ORDER BY date_added DESC
    

I'm assuming that he had the index and it just wasn't getting used, so the
index hints are there in case the query planner was somehow missing them.

That said, table partitioning is an awesome feature. I've seen it be most
useful when you spread the partitions over tiered storage (latest on SSD,
archive on spindles), or when you want to drop a whole range of data quickly
on a regular basis (like last month's logs).

* Edit: Missed something:

You may need to specify more hints than I originally thought if you want to
force index usage for the order by clause as well. See
<http://dev.mysql.com/doc/refman/5.1/en/index-hints.html>

~~~
lsb
That's entirely right; to explain a bit more, the reason that index is good is
because it's a _covering_ index. You have all the data you need to pull from
the table in the index (which you've found via binary search through your
B+tree), so you don't need to run out again to the table, and that saves an
extra disk seek.

To make a covering index takes longer, so insertion speed drops, but that
shouldn't be the hotspot.

~~~
kogir
Actually, since he was effectively doing SELECT * ... it still has to go pull
the rows from the table after seeking the index. This likely isn't a huge
problem though.

To prevent going back to the table in this case, you'd either need the index I
mentioned to be the primary key, or a construct like INCLUDE (which mysql
appears to lack).

See <http://msdn.microsoft.com/en-us/library/ms188783.aspx>

------
prodigal_erik
Be advised that partitioning a MySQL table makes seemingly unrelated but
important features (like _foreign keys_ and _subqueries_ ) stop working.

------
chrismoos
Honestly, the first thing I did was add indexes. On device_id, user_id, and
date_added.

I'm using Linode and I kept getting these I/O notifications...and the site
would load extremely slow at times.

After looking at the log of my Rails app, I saw that some requests were taking
> 20 seconds. I determined that this was some kind of blocking at the
database.

I connected to MySQL and ran something simple...

select count(*) from location;

It took a long time...I turned profiling on and saw that it was taking a
really long time in "table lock".

I assumed (which probably wasn't a good idea) that it was going through and
counting all the rows.

I didn't really know what to make of this..because I didn't think that getting
the count would take so long.

I embarked on reading about partitioning, which may have been a solution for a
problem that didn't actually exist (based on the feedback here). I attempted
to partition (on what I put in the article), and everything seemed much
snappier after that. If the indexes should have solved the problem (given that
they were correct), I don't know why the location queries were taking so long.

Anyway, I still have lots to learn on the database front, and maybe the fact
that my VM had 7MB free of memory was causing weird things to happen, I'm not
sure.

Thanks for all the feedback and I have definitely learned a lot in this
thread.

kogir:

I'm using MyISAM (which, in retrospect, seems stupid), so I don't even have
foreign key constraints (InnoDB only, I believe).

~~~
rimantas
MySQL does not scan any rows for SELECT COUNT(*) FROM table — the number of
records is stored in table metadata. (Mind you, it only works for this kind of
query, as soon as you add WHERE the story changes). On the other hand, MyISAM
only has table level locks, not row level. I guess it is time for you to
analyze what your app does with DB and how it can affect locking…

~~~
jfarmer
Sort of true -- for the InnoDB engine SELECT COUNT(*) FROM table is more
complicated because each transaction has its own view of the table. So unlike
MyISAM (which uses one big table-level lock to guarantee consistency), it
can't store a row count on a per-table basis.

------
zepolen
I wish he would do an explain on the original slow query. 12 million rows is
not that much, this looks a lot like he was just missing an index on
device_id.

------
dr_strangelove
had this guy an index on "device_id" and "date_added", there is no way such a
query would take 3-4 seconds.

~~~
chrismoos
I had indexes on id, user_id, device_id, and date_added, but maybe I was doing
something else wrong. I'm not a database expert :(

~~~
jbellis
What SQL was running, and what was the query plan?

ActiveRecord is notorious for generating terrible, terrible SQL.

Edit: What I would expect to see would be index scan on device id, then sort +
limit. So the important factor would be rows per device, not total rows.

~~~
chrismoos
ActiveRecord...

The SQL was pretty simple I believe..select * from location where device_id =
? order by date_added desc limit 6...something like that.

Edit: Also, I don't know how much it matters, but MySQL probably only has
~256mb memory available to it (its hosted on a Xen box).

~~~
tlack
Queries like that should be very, very fast even with low amounts of ram. Good
article, but consider using EXPLAIN a few times first before embarking on such
adventures. :)

~~~
tom_pinckney
Definitely want to second the advice to use EXPLAIN.

MySQL has a lot of very specific limitations about when it will and will not
use the available indices. It also matters how you created the indices (one
index on multiple columns vs multiple indices on individual columns).

For example, if you created a single index with the columns (date_added,
device_id) MySQL would not be able to use the index since device_id is the
second part of the index, not the first and thus not available for use in the
WHERE clause.

See <http://dev.mysql.com/doc/refman/5.1/en/mysql-indexes.html> for more
limitations.

