
Full Text Search in Mongo - DanielRibeiro
http://www.mongodb.org/display/DOCS/Full%20Text%20Search%20in%20Mongo
======
zefhous
As others have mentioned, this isn't really true full-text search support, but
instead an old page with an example of how you can get that kind of
functionality.

Full-text search should be coming, but it's still a ways off... The most
recent word from a 10gen employee is that it "Seems likely to be in 2.2."

You can keep up with the status of true full-text search here:

<http://jira.mongodb.org/browse/SERVER-380>

~~~
ChrisFulstow
Thanks, great to know it's being considered as part of the product roadmap.

------
ChrisFulstow
This isn't really supposed to be a _proper_ full-text index feature, instead
it's building a rudimentary inverted index using a string array property. It's
possible to create indexes over array properties in MongoDB, which is very
cool, and increases performance to an extent. But even with an index, this
approach to full-text was much slower for me than an equivalent search against
the same data in Lucene.

I'd love to see a MongoDB component that replicates data from the oplog to a
dedicated full-text store like Lucene or Solr.

~~~
rabidsnail
How are you doing your tokenization and stemming? I find it hard to believe
that the actual token lookup is slower.

~~~
ChrisFulstow
Tokenization was a simple string split on whitespace, and no stemming. It was
quite a large Mongo dataset, so only a fraction of the index and data would've
been in memory, it could've easily been quicker for a smaller dataset living
in memory. For me, one of the benefits of Lucene is the powerful built-in
query parsing, tokenization, analyzers, etc.

~~~
rabidsnail
The performance probably would have been better if the dataset (or at least
the portion of the dataset that gets touched frequently) was smaller. But why
wouldn't lookup in lucene be at least as slow?

------
andrewstuart
Yes, do you have a question or point to make about MongoDB fts?

~~~
DanielRibeiro
More like throwing some wood into the fire. Was instigated by the talk[1]
"Solr Power FTW", and "Building a recommendation engine, foursquare style" [6]
(where Justin Moore admits: _We are dumping the data from Mongo and loading it
into Hadoop over S3 files. Map reduce is in this system, not in our mongo
databases._ ).

Was wondering how much has this evolved over the last 6 months, how viable
machine learning algorithms are appliable and clusterable over NOSQL
databases, and which ones can do this without dumping the entire database.

I had some links talking about it:

 _Full text search with MongoDB[2]_

 _Ask HN: What's the best way to handle full-text search with MongoDB?[3]_

 _AskHN: NoSQL with full text search - which is better CouchDB or MongoDB?[4]_

 _Is MongoDB a valid alternative to relational db + lucene?[5]_

[1] <http://schedule.sxsw.com/events/event_IAP7455>

[2] <http://hmarr.com/2010/mar/18/full-text-search-with-mongodb/>

[3] <http://news.ycombinator.com/item?id=2069271>

[4] <http://news.ycombinator.com/item?id=1984666>

[5] [http://stackoverflow.com/questions/2546494/is-mongodb-a-
vali...](http://stackoverflow.com/questions/2546494/is-mongodb-a-valid-
alternative-to-relational-db-lucene)

[6] [http://engineering.foursquare.com/2011/03/22/building-a-
reco...](http://engineering.foursquare.com/2011/03/22/building-a-
recommendation-engine-foursquare-style/)

~~~
kunjaan
The question regarding the difference between CouchDB and Mongo wasn't fully
explored. Can someone here comment one which route makes more sense right now?

Riak also seems to have full-text search. Has anyone used it?

~~~
rb2k_
Riak Full text search is fun to work with and as simple to set up as the rest
of Riak. They offer a Solr interface, but don't support all of the operations
yet (e.g. [http://wiki.basho.com/Riak-Search---Querying.html#Faceted-
Qu...](http://wiki.basho.com/Riak-Search---Querying.html#Faceted-Queries-via-
the-Solr-Interfae) \--> Facet querying through the Solr interface is not yet
supported. ).

Riak isn't the fastest single node system, but if you're going big and need
several servers anyway it will save you some time.

CouchDB could use elasticsearch and its streaming indexation ("river" ->
[http://www.elasticsearch.org/docs/elasticsearch/river/couchd...](http://www.elasticsearch.org/docs/elasticsearch/river/couchdb/)
) to get scalable fulltext search.

An interesting project for fulltext search when it comes to mongodb and SOLR
is photovoltaic ( <https://github.com/mikejs/photovoltaic> ), it pipes the
mongoDB changes to the Solr XML interface. Sadly, I haven't had time to use it
yet, but it looks interesting.

------
mark_l_watson
I think that is an old documentation page - I know because they have had for a
long time a link at the bottom of the page to one of my blog entries where I
show a simple way to do indexing and search in MongoDB.

------
vain
i was very excited by this technology. i tried to implement it on a large
database. it bombed, the hype around nosql kind of whitewashes the fact that
at the end, it is indexed in a btree, exactly how mysql would do it. i am not
trashing mongodb, i am a big fan of it, i am just making a point about being
objective.

~~~
FooBarWidget
You're right, that is what many NoSQL databases are. However this simplicity
also allows NoSQL databases to more easily implement features like sharding,
which is why they're useful. I've yet to see an SQL database that supports
sharding and doesn't cost a lot of money.

