

Ask HN: need your help optimizing substring MySQL query - rorrr

Imagine we have this table with a million unique rows.<p><pre><code>    | id |        url        |
    +----+-------------------+
    | 1  | google.com        |
    | 2  | yahoo.com         |
    | 3  | reddit.com        |
    | 4  | digg.com          |
    | ...| ...               |
</code></pre>
I'm running queries like<p><pre><code>    SELECT id,url 
    FROM table
    WHERE url LIKE '%hoo%'
</code></pre>
When you have a million rows, this query is slow (around 2-5 sec depending on what you search for). I need to make it fast.<p>So far the fastest solution I found is to concatenate everything into one long string, and do regular sub-string searches in the app (not the DB). It's a horrible and ugly solution, and I have re-build that string every time we insert a new row.<p>So the string looks like<p><pre><code>    1|google.com#2|yahoo.com#3|reddit.com|4#digg.com ...
</code></pre>
It's pretty long, but substring searches on it are pretty fast. I get the same results in around 70-200ms.<p>I would prefer to find a non-ugly solution.<p>thanks!
======
elliottcarlson
The main issue is that since you are using %searchterm% you aren't getting any
real use out of indexes as you would when using something like searchterm%.

You could have a second table that has permutations of the data allowing
indexing of the content, but your records would increase dramatically in that
second table;

i.e.

    
    
        | id | link_id | url_permuted |
        +----+---------+--------------+
        | 1  | 1       | oogle.com    |
        | 2  | 1       | ogle.com     |
        | 3  | 1       | gle.com      |
        | 4  | 1       | le.com       |
    

This would allow you to search by searchterm% while gaining use of indexing...

Not sure if this is the most elegant solution to your current problem though -
and it could result in multiple hits to the same record...

~~~
rorrr
Thanks for the idea.

Many of the URLs are long. The average length is 43.9 characters. That would
make the auxiliary table huge.

------
smoody
Just curious... are you using innodb or myisam? also stackoverflow.com is a
perfect place for this sort of question, too.

