
How b-tree database indexes work and how to tell if they are efficient - kirubakaran
http://mattfleming.com/node/192
======
mustpax
This might sound dumb, but I always thought database tables used some sort of
hash table like implementation with O(1) retrieve times on indexed columns.
Don't b-tree indexes imply that we're looking at O(log(n)) retrieve times
instead?

If so, why would database implementations make this tradeoff?

Two advantages I can think of: a b-tree offers very quick sorting (because
data is already sorted). It is probably more efficient to grow a b-tree in
place on disk without having to move too much data around too often.

While these two things matter, O(log(n)) vs. O(1) read performance is a big
difference. And if I'm not mistaken, most databases ought to be read optimized
to begin with.

Can someone enlighten me on the implementation considerations here?

~~~
silentbicycle
For starters, algorithms with O(1) performance can have large constant
factors, and resizing the hash now and then can make the occasional operation
disproportionately expensive.

Mainly, though, each b+-tree node is the same size as a disk I/O block. The
disk is often the slowest part of the database, so making as few reads as
possible will go a long way. If you can't work entirely in memory, that's the
next best thing.

~~~
neilc
_Mainly, though, each b+-tree node is the same size as a disk I/O block. The
disk is often the slowest part of the database, so making as few reads as
possible will go a long way. If you can't work entirely in memory, that's the
next best thing._

That makes no sense. An external hash index is also designed to do as few I/Os
per read as well (typically just a single I/O to read the page that holds the
target bucket). Just like b-trees are designed differently than in-memory
trees, external hash indexes are designed differently than in-memory hash
tables.

------
neilc
_10% selectivity is the minimum selectivity necessary for a b-tree index to be
helpful._

That's probably on the high side. Suppose that records are stored in 8KB pages
(which is probably too small for a modern system, but nevermind), the
selectivity of the scan is 10%, the records that satisfy the scan are randomly
distributed over the pages, each page holds, say, 40 records (which is
conservative, even for small 8KB pages), and we're considering scanning a
secondary index[1].

That means that a typical page in the heap has 4 matching records, which means
we are very likely to have to read every page in the heap anyway. Worst still,
the order in which the index scan returns results is unlikely to correlate
with the physical storage order of the heap pages, which means that the 4 IOs
per page are going to be random, not sequential. Add to that, in a scan of a
secondary index, you need to read both the index page and the pointed-to heap
page, which is an additional I/O (compared to a sequential scan of the heap).
So it's easy to construct a realistic scenario when you wouldn't want to use a
secondary index when the scan selectivity is "only" 10%.

[1] And that an index-only plan can't be used: i.e. we need to fetch the
pointed-to heap page.

------
tt
A couple other things to consider:

1\. Create a multi-column index that satisfies the query entirely on its own
without the need to go to the table to read additional data. This technique
solves a large percentage of slow query problems.

2\. A bitmap index in Oracle kills concurrency. Do not use bitmap indexes for
table with multiple concurrent transactions.

~~~
fendale
Also consider that if you have a 100M row table and the column has only 4
values, but those values are highly skewed (ie 3 values are 99% of the rows,
4th is 1%), then a function index on the skewed column can be very useful:

    
    
        create fbi on table(case
                              when skewed_col == 'small value'
                                skewed_col
                              else
                                null
                            end)
    

The null values are never indexed, so now you have a nice small index that is
good only for finding you 'needles in the haystack', so long as you put that
case statement in your queries.

