

Cloudant (YC S08) Releases In-Database, Distributed Search - timanglade
http://blog.cloudant.com/announcing-cloudant-search/

======
turbodog
Nice! I really appreciate the tone of Tim's post in that it acknowledges both
the strengths and weaknesses of NOSQL in general and of Cloudant Search in
particular in an honest manner.

------
dcaylor
How does this compare to the various existing Lucene based search options for
CouchDB? The post says the new Cloudant Search "is a way that would not
require you to set up a third-party, financially or operationally expensive
solution." Adding basic Lucene searches to a CouchDB setup isn't all that
hard. What about elasticsearch and solr? Aside from the cost and hosting, are
there other differentiaters between Cloudant's Search and these third party
options?

~~~
timanglade
I already responded to this partly in
<http://news.ycombinator.com/item?id=2764736> — We think the integration into
a single deployment is in itself a big gain. Maintaining several of those
infrastructures can be very painful, especially as your clusters grow large.
Also, as mentioned, we’ve also already added several features on top of
Lucene, and we’ll be adding more in the future.

~~~
dcaylor
Yes, thank you. Just after I posted my question here I realized that much of
what I was wondering about was also answered in another post on the Cloudant
blog: <http://blog.cloudant.com/technical-look-at-cloudant-search/>

------
gniquil
I have a very noob question. For most databases, suffix search is always super
slow. However, can't someone just build an index based on the string reversed,
then treat suffix search exactly the same way as prefix search? This doubles
your index storage requirement. But index storage is generally not a problem.
Finally this could be perhaps extended to cover any wildcard searches
(hell*world)

~~~
davisp
The issue there is that you're still anchoring your index to one end of the
string which means you're not solving the general problem, only a specific
manifestation of it.

A general example would be given the string "foo bar baz", your solution could
find "foo%" or "%baz" efficiently, but not "%bar%". Its not out of the
question if what you really want is a suffix search, but the general problem
of finding an internal substring is still less than optimal.

Edit: Formatting

------
brendoncrawford
Maybe slightly off-topic here, but are there plans to eventually merge Big
Couch upstream into Couch core?

~~~
davisp
Its hard to say. There are quite a few ways in which this could play out.

Firstly, there are two important points to consider. Currently, BigCouch is
more or less a superset of Apache CouchDB. The only patches we have to CouchDB
sources can and should be back-ported but require that we solve a couple
possibilities for bugs in non-clustered deployments. Secondly, Erlang is a
language which allows for an easy mish-mashing of code so that once we have
back-ported these patches there's no real requirement for a merge at all.

There are also a few things that we're discussing in the CouchDB community
that could very well contribute to not needing to merge the projects.
Specifically rearranging our source tree to be more prototypically Erlang as
well as some tools like a couch-config script that could allow plugin-type
extensions to CouchDB.

In the end, its hard to tell how things will shape up. It could be a full on
back-port, or it could just be a general improvement to CouchDB's source tree
and build system so that BigCouch is strictly "CouchDB + Other Erlang Apps" if
that makes sense. And with my CouchDB committer hat on, it really depends on
what the community wants. Its easy to fall into think of the trap of "it's
obvious" but we also have to consider that others are taking CouchDB and
porting it to mobile phones. What we end up with in "core" CouchDB has to
consider a lot of use cases.

~~~
brendoncrawford
Thanks for this response, and thanks for the great work on both Couch and
BigCouch.

------
rb2k_
An "open" alternative could be the CouchDB integration that elasticsearch
provides:

[http://www.elasticsearch.org/guide/reference/river/couchdb.h...](http://www.elasticsearch.org/guide/reference/river/couchdb.html)

------
owenmarshall
Will this make its way upstream into the open-source BigCouch?

~~~
ahoff
For now, this feature will stay a part of the closed-source hosted and
licensed products. But remember you can always try it out for free with our
Oxygen plan at cloudant.com

------
paulasmuth
Wouldn't it be possible to do the same thing with Apache Lucene/Solr?

~~~
timanglade
Not entirely. The big difference here is that you only have one, integrated
architecture to maintain. So that means less operational complexity, smoother
“scalability” of your infrastructure, tighter integration between DB & Search.
Also, we added stuff like queries across several indices, typed queries, etc.

~~~
paulasmuth
Hm, as far as I understand multiple-index (core) search has been implemented
in Solr 1.3?

~~~
hardtke
Sharded search is not new. Solr, elastic-search, and Riak do it as well. The
difference here is that we've built the Search on top of the BigCouch map-
reduce view model. Views are calculated post commit so there are no data
insertion locks. Multiple copies of each shard exist for fault-tolerance.
Also, multiple map-reduce analytics passes can be used as input to the search.

------
mlmilleratmit
Come and kick the tires!

