

Mongoid_fulltext: full-text n-gram search for your MongoDB models - carterac
http://code.dblock.org/ShowPost.aspx?id=195

======
xpaulbettsx
It seems as if for every database / NoSQL solution that comes out, someone
writes a full-text search provider for it; why aren't people using something
full-featured and abstracted away from the data model like Lucene and not
reinventing the wheel all the time?

~~~
dblock
Try to get that running on something like Heroku. Lucene is like renting a
private jet to bring your kid to school every day.

------
ryanfitz
does this scale? Just recently I was using mongoid_search and it performed
very slowly on just 5000 documents, autocomplete was not usable. I switched to
solr and its easily handling 100K documents as of right now. I don't know if
mongo is really a good fit right now for search functionality. Id love to see
benchmarks with mongoid_fulltext.

~~~
dblock
Scale depends on what you're trying to do. I definitely think that
mongoid_fulltext will work for 100K documents if you're trying to do
relatively simple autocomplete. But this technology is not meant to compete
with solr or any other dedicated search engine.

~~~
ryanfitz
In my particular case, I noticed continual slow down every 1000 or so
documents I kept creating when using mongoid_search. I was only doing a simple
search, indexing on just a title field in my documents.

~~~
mathias_10gen
Based on my quick reading* of the mongoid_search code it looks like it is
using a regex query against the keywords array which requires a full scan over
the index. In fact the way it is being done is probably far less efficient
than just doing a normal regex table scan without the keyword array. If you
were to modify the search method to do a normal string query (maybe using $in
for OR and $all for AND) you should get _much_ better performance.

*Ruby isn't my native language so I may be missing something.

~~~
dblock
Yes. But don't confuse with mongoid_fulltext. It doesn't do any regexes, it
uses map/reduce where each n-gram is indexed separately.

