
A fast, fuzzy, full-text index using Redis - adamcharnock
http://playnice.ly/blog/2010/05/05/a-fast-fuzzy-full-text-index-using-redis/
======
amix
I think this is a terrible idea, even thought I like Redis and use it in my
projects. Why is it a bad idea? Because Redis keeps everything in memory and
scaling it up will be quite expensive on very large datasets. Even when Redis
VM comes out (which will eliminate the all the data in memory requirement),
it's a bad idea as Redis isn't really optimized for full-text search...

What's better? Use a tool that's optimized for the job. I would recommend
looking at Sphinx, which is quite amazing and can handle indexing billions of
words on a single server without using much CPU or memory. Plus Sphinx has
tons of features, such as geo-based search, fuzzy search, boolean search, full
support for Unicode etc. etc.

~~~
donw
If you're going to look at Sphinx, you'd do yourself a disservice by not also
looking at Solr.

Solr has effectively the same feature set, including geospatial and
multilingual searching, but it has better relevance, and generally is faster
at returning query results (although Sphinx is faster at doing full
reindexes).

Also, unlike Sphinx, Solr doesn't glue you to MySQL. Or to any SQL database;
use it with Cassandra, Voldemort, text files, quantum storage in the galactic
hive-mind, whatever.

(edit: They've added PostgreSQL support, and raw XML support, since I last
installed and tested Sphinx about a year ago)

Plus, unlike Sphinx, you don't need to reindex when you add new records,
because Solr can seamlessly merge indexes on-the-fly.

I know I sound like a fanboy here, but I spent a lot of time evaluating the
two of them for our product, and Sphinx just didn't fit the bill for a large
number of reasons. It's a good solution if you just need fulltext indexing in
a MySQL database, but if you want to move beyond that, have a look at Solr.

[http://beerpla.net/2009/09/03/comparison-between-solr-and-
sp...](http://beerpla.net/2009/09/03/comparison-between-solr-and-sphinx-
search-servers-solr-vs-sphinx-fight/)

~~~
petercooper
Or, if you don't want yet another daemon, Xapian. It's almost the SQLite of
full text searching. Has its limits, but is crazy fast and scales well.

------
alexro
Redis rocks, but using it as a search engine does not rock at all IMO. For one
thing - if the item changes then you have to update all the keys that hold
references to that item.

EDIT: it's fine if your items are very small or do not change often

~~~
adamcharnock
Yes, there is a Caveat that if some words are removed from the database then
you will need to reindex for that to be taken into account. However, in the
case of bug tracking (which is largely what we are doing), data tends to be
added rather than removed. Plus re-indexing happens on a project bases, not a
global basis.

Also, for our use case it is not necessarily a bad thing that removed words
may cause an item to be shown in the results. After all, just because a
paragraph was removed from a bug description doesn't necessarily mean it is
invalid search fodder (bearing in mind that, in our case, full-text search is
just one way of filtering).

------
binaryten
wow! lots of ifs, maybes, projections and conjectures. It would be nice to see
someone give a concrete example when they give an opinion. Seriously, if you
disagree, for example, and it is based on provable data, please share. Lloyd
Moore.

~~~
mks86
Agreed. And all the people who keep harping on with recommendations of
existing fts engines - who about providing some benchmarks regarding real-time
search speeds. Thats one area where lucene (and solr using it) comes a bit
unstuck - if you have fast changing dataset, its hard to keep up with storing
and indexing it and thats where the redis approach with it high speed writes
wins hands-down. Of course you could use lucenes RAMDirectorys and then create
code to persist them back to disk, but then you're just re-inventing the redis
wheel.

And from what I have read of Sphinx it likely has the same issues. Plus it
seems to be aimed only at the lamp crowd - not very friendly if you are not
using php+mysql already.

