
The Joy of Indexing - DanielRibeiro
http://kylebanker.com/blog/2010/09/21/the-joy-of-mongodb-indexes/
======
rix0r
_We commonly encounter users who believe that searching on two fields can be
facilitated by creating two separate indexes on those fields. This example
should give you some intuition about why this just isn’t possible._

Not really, as the example only allows a search on ingredient, since it
explicitly says the _name is not known at all_ , and hence _could never be
used_ for search.

Now let's say we know two things about the unknown recipe: it contains chicken
and it's from the Italian kitchen.

Given these two indexes:

    
    
      Chicken
        1, 15, 582, 1032
    
      Italian
        3, 15, 16, 82, 1032, 6821
    

I could definitely use both to speed up my search by looking only at pages
that occur in both indexes (in this case, 15 and 1032). This can be done in
O(# of indexes * index length) time if both indexes are in sorted order.

~~~
Saavedro
You don't even need to look at the entire index. See the point where chicken
jumps from 15, to 582? We know we can then skip to at least 582 in all the
other indexes we're looking at. (Which is doable as the indexes are going to
be stored as some sort of tree we can retrieve a particular range out of) This
is pretty much a merge join <http://en.wikipedia.org/wiki/Sort-merge_join>

Incidentally you can make indexes on Ingredient->Page/Cuisine->Page rather
than maintaining Page indexes for specific ingredients/cuisines. So with just
two indexes you can filter out any Cuisine/Ingredient combo (You can even do
multiple ingredients! As it's an intersection though, multiple cuisines
wouldn't be useful as few dishes are likely to fall under multiple cuisines,
you could still do it though.)

------
johkra
This was a very graphic and easy-to-understand explanation of indexes in
general. Although I thought I understood indexes, it helped me understand
secondary indexes better. Thanks for linking!

Did they ever write the promised second part where they wanted to present
B-trees in a similar form? I think I understand B-trees, but it wouldn't be
the first time I realise I didn't really understand something after all...

------
VladRussian
>It’s knowing how indexes work and having the intuition to create the best
indexes for your queries and your data set. Lacking this intuition, your
production database will eventually slow to a crawl, you’ll upgrade your
hardware in vain, and when all else fails, you’ll blame both gods and men.

it isn't an intuition. It is knowledge about access paths your system uses in
query plans and how to manage them. "Intuition" and "use [always] indexes" -
these 2 perpetrated legends is what frequently the root cause of the pitiful
performance of reasonably large DB systems (ie. ones that many times larger
than the RAM of the DB [cluster]). People get charmed by the "magic" of B-tree
and miss the iron-platter reality of random vs stream IO. You see a DBA or DB
developer expressing unconditional disgust pointing toward the red line of
"Full-Table Scan" in query plan he managed to display in the GUI tool and you
immediately understand what the situation at this client site is :)

Again, don't get me wrong, i'n not saying that indexes shouldn't be used -
that would be the same mistake as to say they should always be used. What i'm
saying is that you need to know (not intuit) when and how and why to use or
not to use specific access paths.

~~~
Estragon
Can you give an example of what an access path is and how to manage it? This
concept is new to me, and I think I understand what you're saying, but would
like to make sure.

~~~
VladRussian
i'm not RTFM-ing you, it is just they explain it better :

[http://download.oracle.com/docs/cd/B14117_01/server.101/b107...](http://download.oracle.com/docs/cd/B14117_01/server.101/b10752/optimops.htm#35891)

------
stuffihavemade
Any discussion about indexing needs a link to <http://use-the-index-luke.com/>

It's the best resource that I've seen with respect to databases and indexing.

------
SNK
Answer to the first quiz is clearly wrong; scanning an index - any index - is
faster than scanning all 5k pages.

