When it comes to consensus real consensus servers are the only way and Zookeeper is the only production consensus server available outside of Google. (sorry etcd, you aren't quite there yet).
Seems to be compatible with ES 1.4 :D
In December 2010 we found about Elasticsearch and were truly amazed by it's simplicity, speed and elegance. We built our service and consultancy business around it.
In 2011 we've built some of the largest ES applications at that time (6TB, 120node cluster, http://2012.berlinbuzzwords.de/sessions/you-know-search-quer...) and started to develop a set of plugins, such as the in-out-plugin to allow distributed dump/restore.
With this background - and the mission to build a datastore that is as easy to use and administer as Elasticsearch we founded in 2013 Crate and raised some Seed money. Since that we're working hard to make this vision become true.
We're often confronted with the results of the so-called Jepsen-Test (https://aphyr.com/posts/317-call-me-maybe-elasticsearch) that Aphyr published in 2014. Don't forget: Lucene, Netty, Elasticsearch, Crate - all are Open Source products (APL) and rely on all kinds of contributions - such as this analysis! No matter be it bad news or good news. We can only improve based on hard testing and feedback.
However, this caused a lot of rumblings in the Elasticsearch ecosystem and the reaction of Elasticsearch was exemplary:
1) explain the reasoning and make an official statement: https://www.elastic.co/blog/resiliency-elasticsearch/
2) list all the issues and hunt them down. one by one: http://www.elastic.co/guide/en/elasticsearch/resiliency/curr... (and add new ones as they occur).
3) stay in contact with the community that reported the issues: https://twitter.com/aphyr/status/525712547911974913
All that being said. We see many people using Crate as primary store (and of course backing up their data) but we also see people that don't put that much trust in a younger database and keep all their primary data in another location and sync/index to Crate.
ALWAYS make backups (COPY TO / COPY FROM), make sure you have replicas, and most important configure minimum_master_nodes correctly to avoid split brain.
At Crate we stand on the shoulders of these great Open Source project, try to be as good citizens as possible and focus mainly on our Query engine (Analyzer, Planner, Execution Engine).
With the amount of attention and support this has received from both Crate and ES, coupled with the amount of progress that's been made against these issues, I don't think it's a fair to advise against using either as a source of truth.
That said, they are checksumming a hell of a lot more than before, so there's a chance.
If it's a database engine, then here are my thoughts: How is this database engine built to replace SQL and No-SQL? If it doesn't support JOINS why would I replace my SQL with it? Are transactions ACID? Why would anyone use this if there are no built-in user / group security mechanisms to protect data?
I also agree that our site is not as clear as it needs to be and we're working on it already.
If you'd like we're happy to answer any questions in IRC or our Google Group:
IRC Freenode #crate: irc://irc.freenode.net/crate @mention anyone with Voice
this is one thing that bothers me with elasticsearch, that I can not define eg "type": "cart","index":"realtime", "not-analyzed" so if an item gets added to a cart, the subsequent count would directly return the correct number of items in the cart.
The beauty of the Elasticsearch query syntax is that you can dynamically create complex JSON dsl objects as you drill down just using push and other methods.
With the crate SQL syntax it looks like it would be a messier dynamic query generation using string functions?
So my question is: Is it possible to query Crate with Elasticsearch syntax?
Is there specific "Crate" query syntax for selects that is not supported by the Elasticsearch DSL?
As for crates.io, I guess their entitled to choose their own name & domain, but I admit it is confusing.
I also confused the name with this: https://crates.io/
So you can run in a sort-of-shared-nothing configuration, but its not recommended.
When you map a volume into a container as suggested, the data can persist through a container restart/replacement. When the container is instantiated, the volume is read, the node checksums the shards it finds to make sure they're not stale. If so, they're brought up to date. By tuning the recovery settings you can avoid extraneous shard movement and therefore leverage containers as you would expect.
Are you also planning to move to single shard per datapath like ES? If that is the case what is your thoughts on increasing the shard count post single shard per datapath?
There are two options:
1. (Recommended for now) Export the table with COPY TO ( https://crate.io/docs/stable/sql/reference/copy_to.html). Drop the table and then import it again using COPY FROM (https://crate.io/docs/stable/sql/reference/copy_from.html ).
2. Use insert by query (see https://crate.io/docs/stable/sql/dml.html#inserting-data-by-... ) if it is ok for you to copy the whole data to another table (with more shards).
1) is recommended, since it allows for throttling on import time (see https://crate.io/docs/en/latest/best_practice/data_import.ht...) and also does not require a rename of a table, which is currently not implemented but is on our backlog. However i think once ES 2.0 is out we will have table renames and also throttling in insert by query, so option 2) will be recommended then.
Our genreal recommendation to the fixed number of shards limitation is to choose a higher number of shards upfront (number of expected cores matches the most use cases) or to use partitioned tables (https://crate.io/docs/en/latest/sql/partitioned_tables.html) where possible since those allow to change the number of shards for future partitions.