
MySQL (InnoDB) clustered and non-clustered indexes - skipass
http://n3n.in/w
======
falcolas
I saw a presentation on an interesting hack done with InnoDB primary keys used
to cluster similar data.

A hotel booking company created a primary key which would put geographically
similar entries into primary keys which were close together. i.e. there
wouldn't be an entry for a hotel in Saigon ordered between two entries for San
Francisco.

I don't recall the specifics of how they did it, but due to how InnoDB orders
pages on disk according to their primary key, it meant that all of the results
for a particular location could be accessed by fetching one to two pages of
data from disk.

Resulted in some huge performance gains for them, both by reducing the number
of disk IOs required for your average query, and by reducing the on-the-fly
geographical calculations they had to do in the DB. They could do a range
query on the primary key in the database and then do fine-grained geolocation
on a less contended resource against a small number of entries.

~~~
misframer
A Z-order curve maps from two (or more) dimensions to one while preserving
locality:
[http://en.wikipedia.org/wiki/Z-order_curve](http://en.wikipedia.org/wiki/Z-order_curve)

~~~
fredophile
Depending on your application you might want to pick a different space filling
curve ([http://en.wikipedia.org/wiki/Space-
filling_curve](http://en.wikipedia.org/wiki/Space-filling_curve)). Different
curves lead to different characteristics for nearness. You can also use
multiple types of curves or two of the same curve oriented differently and
then use the minimum distance of the curves as your distance estimate for some
applications.

If you just want to generate keys you should look at Locality-Sensitive
Hashing ([http://en.wikipedia.org/wiki/Locality-
sensitive_hashing](http://en.wikipedia.org/wiki/Locality-sensitive_hashing)).

------
snake_plissken
A nice little article. I've always felt that database indexes are one of those
fringe topics that's just esoteric enough that people acquire a basic
knowledge of them and know they need them, yet understanding indexes and using
them correctly leads amazing performance.

On clustered indexes in particular, I try to follow these principles:

Narrow - in terms of a data type's byte length, so that more keys can be
packed into each level of the B-tree. You can end up having to traverse fewer
intermediate levels to reach the leaf level, where the data resides.

Unique - This ties into the point above, there will be no need to add a
"uniquifier" to the key, helping to save space and the overhead of managing
that extra tid-bit of data

Static - ideally, never changing. By it's nature, a clustered index is
ordered, and updating/changing the keys will lead to data that's in the wrong
place. You can kind of get around this by rebuilding the index after every
update, but that just adds another task you'll need to manage more often.

Increasing - this can lead to faster inserts; in a sense, the DB is just
filling up the last page of the index and then adding another when it needs
to. (I think after a certain amount of inserts, eventually you'll have to re-
build the index (to add more intermediate level nodes) but I can't recall the
specifics, and you're going to have to do rebuild indexes anyway with all
indexing strategies.)

------
gearhart
If, like me, you found the idea interesting but the article a bit confused
there's another good discussion on it here: [http://use-the-index-
luke.com/blog/2014-01/unreasonable-defa...](http://use-the-index-
luke.com/blog/2014-01/unreasonable-defaults-primary-key-clustering-key)

------
misframer
I think it would have been easier for me to understand if they put everything
in terms of a B+ tree, where the leaf nodes contain keys and values.

~~~
meritt
InnoDB does indeed use a B+ tree. A more thorough article on the topic is
[http://blog.jcole.us/2013/01/10/btree-index-structures-in-
in...](http://blog.jcole.us/2013/01/10/btree-index-structures-in-innodb/) or
the book High Performance MySQL.

~~~
misframer
I know it uses a B+ tree. I wanted to mention that the article doesn't mention
them at all.

I'm familiar with the High Performance MySQL book -- I work with one of the
authors :).

~~~
meritt
Cool. I was more taking the opportunity in replying to you to point out that
this article isn't very good :)

