Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Full Text Search in Mongo (mongodb.org)
40 points by DanielRibeiro on March 27, 2011 | hide | past | favorite | 14 comments


As others have mentioned, this isn't really true full-text search support, but instead an old page with an example of how you can get that kind of functionality.

Full-text search should be coming, but it's still a ways off... The most recent word from a 10gen employee is that it "Seems likely to be in 2.2."

You can keep up with the status of true full-text search here:

http://jira.mongodb.org/browse/SERVER-380


Thanks, great to know it's being considered as part of the product roadmap.


This isn't really supposed to be a proper full-text index feature, instead it's building a rudimentary inverted index using a string array property. It's possible to create indexes over array properties in MongoDB, which is very cool, and increases performance to an extent. But even with an index, this approach to full-text was much slower for me than an equivalent search against the same data in Lucene.

I'd love to see a MongoDB component that replicates data from the oplog to a dedicated full-text store like Lucene or Solr.


> I'd love to see a MongoDB component that replicates data from the oplog to a dedicated full-text store like Lucene or Solr.

Photovoltaic does this: https://github.com/mikejs/photovoltaic

I haven't had time to play arround with it yet though :(


How are you doing your tokenization and stemming? I find it hard to believe that the actual token lookup is slower.


Tokenization was a simple string split on whitespace, and no stemming. It was quite a large Mongo dataset, so only a fraction of the index and data would've been in memory, it could've easily been quicker for a smaller dataset living in memory. For me, one of the benefits of Lucene is the powerful built-in query parsing, tokenization, analyzers, etc.


The performance probably would have been better if the dataset (or at least the portion of the dataset that gets touched frequently) was smaller. But why wouldn't lookup in lucene be at least as slow?


Yes, do you have a question or point to make about MongoDB fts?


More like throwing some wood into the fire. Was instigated by the talk[1] "Solr Power FTW", and "Building a recommendation engine, foursquare style" [6] (where Justin Moore admits: We are dumping the data from Mongo and loading it into Hadoop over S3 files. Map reduce is in this system, not in our mongo databases. ).

Was wondering how much has this evolved over the last 6 months, how viable machine learning algorithms are appliable and clusterable over NOSQL databases, and which ones can do this without dumping the entire database.

I had some links talking about it:

Full text search with MongoDB[2]

Ask HN: What's the best way to handle full-text search with MongoDB?[3]

AskHN: NoSQL with full text search - which is better CouchDB or MongoDB?[4]

Is MongoDB a valid alternative to relational db + lucene?[5]

[1] http://schedule.sxsw.com/events/event_IAP7455

[2] http://hmarr.com/2010/mar/18/full-text-search-with-mongodb/

[3] http://news.ycombinator.com/item?id=2069271

[4] http://news.ycombinator.com/item?id=1984666

[5] http://stackoverflow.com/questions/2546494/is-mongodb-a-vali...

[6] http://engineering.foursquare.com/2011/03/22/building-a-reco...


The question regarding the difference between CouchDB and Mongo wasn't fully explored. Can someone here comment one which route makes more sense right now?

Riak also seems to have full-text search. Has anyone used it?


Riak Full text search is fun to work with and as simple to set up as the rest of Riak. They offer a Solr interface, but don't support all of the operations yet (e.g. http://wiki.basho.com/Riak-Search---Querying.html#Faceted-Qu... --> Facet querying through the Solr interface is not yet supported. ).

Riak isn't the fastest single node system, but if you're going big and need several servers anyway it will save you some time.

CouchDB could use elasticsearch and its streaming indexation ("river" -> http://www.elasticsearch.org/docs/elasticsearch/river/couchd... ) to get scalable fulltext search.

An interesting project for fulltext search when it comes to mongodb and SOLR is photovoltaic ( https://github.com/mikejs/photovoltaic ), it pipes the mongoDB changes to the Solr XML interface. Sadly, I haven't had time to use it yet, but it looks interesting.


I think that is an old documentation page - I know because they have had for a long time a link at the bottom of the page to one of my blog entries where I show a simple way to do indexing and search in MongoDB.


i was very excited by this technology. i tried to implement it on a large database. it bombed, the hype around nosql kind of whitewashes the fact that at the end, it is indexed in a btree, exactly how mysql would do it. i am not trashing mongodb, i am a big fan of it, i am just making a point about being objective.


You're right, that is what many NoSQL databases are. However this simplicity also allows NoSQL databases to more easily implement features like sharding, which is why they're useful. I've yet to see an SQL database that supports sharding and doesn't cost a lot of money.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: