
Scaling Django for a Russian search engine - iamelgringo
http://softwaremaniacs.org/blog/2008/02/24/why-offline-crashed-en/
======
xirium
Use of InnoDB and an MD5 primary key is problematic and not an isolated
situation.

InnoDB works in 16KB pages with 13 bytes of transactional overhead per row and
a default page occupancy ratio of 15/16. InnoDB attempts to store rows in
primary key order.

Using a composite primary key of (host_id,session_id) would be preferable
using an MD5 hash as your primary key for two reasons. Firstly, the composite
primary key is smaller. I'm assuming that they used an ASCII representation of
the MD5 hash. That would be a 32 byte primary key. Whereas, a composite
primary key could six bytes or less.

Having a smaller key means that B-tree index fan-out is improved. Essentially,
the database has to access less index blocks to find a row of data. This
occurs partly because the overall size of the index is smaller. More
importantly, smaller keys allow more pointers and therefore the index tree is
broader. Smaller size allows an increased proportion of the index will be
cached and the broader tree ensures less skipping through uncached blocks.

Secondly, the output of the hash is essentially random. This conflicts with
InnoDB's preference to store rows in primary key order. It also creates a huge
amount of cache churning as branches are added anywhere and everywhere in the
index tree. Whereas, a composite primary key with an ascending component
allows very good insertion and lookup rates with minimal cache or disk seek
operations.

I suspect that the situation was compounded by use of InnoDB's autoextend.
When pages are consumed and the InnoDB storage is grown, data is effectively
appended rather than being stored in preferential order. Multiple autoextend
operations can make performance degrade. However, given the description of a
performance wall, it is likely that another known circumstance was also
encountered.

~~~
bayareaguy
_Secondly, the output of the hash is essentially random. This conflicts with
InnoDB's preference to store rows in primary key order._

I don't know what kind of lock granularity InnoDB supports but if there are
times when you actually want this behavior in order to reduce lock contention.

