

AskHN: NoSQL with full text search - which is better CouchDB or MongoDB? - andrewstuart

Starting a new project soon and wanting to use a JSON data store.  Full text search is going to be very important.  We can do without it, or with lesser full text search functionality, in the short term, but full text search will be critical later on.<p>So which is better given such a requirement - MongoDB or CouchDB?  Which is likely to deliver the better integrated full text solution, and when?<p>Can anyone share their experiences with FTS and either of these data stores?
======
datd00d
Check out riak, they are continuing to develop their full text search engine
that integrates with their data store. I have yet to play with it but it looks
rather promising and you wont have to deal with trying to bolt on your
lucene/solr into couch/mongo.

[http://blog.basho.com/2010/10/20/why-i-am-excited-about-
riak...](http://blog.basho.com/2010/10/20/why-i-am-excited-about-riak-search/)

and the direct link to riak search info <http://www.basho.com/riaksearch.html>

~~~
andrewstuart
I do wonder why data store projects such as MongoDB and CouchDB don't start
with FTS as the very foundation of everything they do - task one should have
been FTS.

~~~
datd00d
I have seen several emails/blogs about mongodb and 10gen musing over FTS.
There are a few projects if i am not mistaken that have "bolted on" FTS to
mongo but I do not think they have had the same development/LOE as riak
search. But with out playing/testing with all those options I can not speak
authoritatively on the topic. FTS no doubt will be an emerging factor in the
future in the NOSQL landscape IMHO.

------
ammmir
although i'm not using it yet, i'm looking at adding
<http://www.elasticsearch.com/> next to my MongoDB store for indexing and
search. i particularly like its support for "rivers," and support for push vs.
pull method of indexing remote sources.

if you're already using CouchDB, check out
[http://www.elasticsearch.com/blog/2010/09/28/the_river_searc...](http://www.elasticsearch.com/blog/2010/09/28/the_river_searchable_couchdb.html)

------
abiczo
I wouldn't rule out using a separate search engine. Yes, you'll probably have
to keep the index in sync with the db manually but I found that it also gives
you much more flexibility.

I personally use Solr with MongoDB and it works pretty well. Keeping the index
in sync is not that big of a deal for me (depends on the application of
course) and in exchange I get a full featured FTS engine that is probably more
mature and has more features than an integrated one would have.

~~~
datd00d
how are you going about syncing? And what is your average document size
(bytes, as well as attributes/elements)

~~~
abiczo
The syncing is done on the app level: when a new document is inserted into the
db that needs to be indexed a background job is launched that insert the
document in Solr. This shouldn't be too hard manage if you can write post-save
handlers in your model code.

The average document is about 1KB and has about 30 attributes (but only 2-3 of
those attributes needs to be index).

