Hacker News new | past | comments | ask | show | jobs | submit login


"Some people actually advocate using Elasticsearch as a primary data store; I think this is somewhat less than advisable at present."

"The good news is that Elasticsearch is a search engine, and you can often afford the loss of search results for a while."

My personal favorite solution would reliably channel data from a Riak cluster to an ES cluster. Anyone knows if there is something like that out there?

Funny, the link for "using Elasticsearch" points to an interview I gave ;) He raises some good points on ES current problems with master election, I have raised it with the ES team during meetup with them and we have a work around the issue (we discovered the bug during our testing). Its important to know the soft points of the system your using and how to work around it. We feel like we have a good workaround and I think that has been the point of his series is to point out the flaws in common tools and you should be read to work around them. But he found a single flaw and attacked it hard, so, I am not sure throwing away the whole thing for a single flaw is a great recommendation.

Would you mind sharing your workaround?

Before you do that, check out http://aphyr.com/posts/285-call-me-maybe-riak

TL;DR: Aphyr talks at length about how riak is not a CP system, even though nobody claims that it is. Riak is AP. Aphyr demonstrates how this plays out in practice and gives an intro into CRDTs, which let you achieve something that is eventually correct for certain kinds of state transformations.

Every conclusion he comes to are also described on Basho's site when it comes to using last-write-wins. At the end of the post, notice CRDT's preserve 100% of writes, and the same could be had if allowing siblings. Punting on high-availability is different than the company being upfront about the tradeoffs.

What do you mean? Are you saying it because Riak with sibling preservation is actually one of the few (if not the only) database that didn't drop writes.

So, what do you do with siblings in a searchable text index (the case that ElasticSearch is designed for)?

Riak Search will do this for you. The Riak key/value store also has post-commit hooks that you can use: http://docs.basho.com/riak/1.2.1/references/appendices/conce...

Basho is implementing a much better search option based on distributed Solr (instead of relying on something in-house), due out this year with the 2.0 release. It's available for testing today.

(Full disclaimer: I work for Basho.)

"Based on distributed Solr" might be an over simplification. It uses Solr as its indexing engine, but really, that engine could be any single-node indexer... including single ES nodes. Basically, Yokozuna adds real grownup distributed systems computer science to the OSS distributed search space. http://docs.basho.com/riak/2.0.0beta1/dev/advanced/search/

To me, the Yokozuna "mediator" seems highly specific to Solr. It's not like you could swap out Solr for ES with just a little glue code, I think?

I'm building an internal geospatial product on the beta, anticipating the release. Great results so far.

I have briefly checked the Riak yokozuna project, and it really looks great. Nevertheless, I'd prefer to have the data storage cluster and the search cluster to be separated (and I really like ElasticSearch too).

Edit: I shall add that the most interesting search problems are the ones where you need to join separate data sources, and in such cases it is not really the question of what kind of search solution you are using, rather what kind of async queue and data update you have. So the separated cluster is really about having a distributed queue between the 'data-master' and the 'search-master'.

How about Riak CS for storage, separate from the search cluster?

I am not sure what difference it makes to have Riak or Riak CS for data storage, in the context of having a separate search cluster. Would you elaborate?

Yokozuna and CS (or plain Riak) use different backend configurations[0,1]. So beyond having physically distinct clusters, you'd get to have purpose optimized storage, but with a common management regime.

[0] http://docs.basho.com/riak/latest/ops/building/planning/back...

[1] http://docs.basho.com/riakcs/latest/cookbooks/configuration/...

DataStax Enterprise integrates Solr with Cassandra. The Solr docs are stored in Cassandra. Solr and Cassandra occupy the same JVM. As a result all docs inserted into C* are indexed using Solr, and Solr get's the high availability/linear scalability of Cassandra. I successfully deployed it as a customer, and I know of several clusters in the many hundreds of nodes.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact