The one drawback ES had in the bad old days was that backup and restore was a nightmare... ESPECIALLY on AWS. The new system they introduced was so simple I was concerned about updating to it because I was SURE something would go south.
But it all just worked.
I still have the Couch to ES replication running because I'm anal like that... but really... yeah... you can do without Couchbase, Mongo et al... ES will probably do everything you need PLUS everything you can't do in the others.
also, i have the exact opposite nitpick. people want to use it to do everything, mail indexers, file system indexers. what's the matter with web developer folks? why is it that when the next database comes around they want to use it for everything?
Because they like a simple web stack. KISS means a faster time to market. Faster time to iterate. Faster time to fix bugs because there are fewer places those bugs can be. All of that doesn't even factor in the productivity benefits gained by not having to switch technologies from project to project.
But to be fair, ES is not some brand new database... ES has been around for a LONG time.
that's a pretty long time.
- Pause indexing
- Issue a flush request
- Rsync data directories somewhere
- Resume indexing
This is technically a very naive approach, since a simple rsync of the data dirs will include replicas too. If you were more diligent you could check the state files in each shard directory and only copy out the primaries.
You can just google "elasticsearch rsync" to get information, and even scripts, that will do this for you. The thing is... you REALLY need to know what you're doing when you go this route.
Also, you can try the gateway feature. Gateway is actually pretty straightforward. Restore WILL be slow though. And for many scenarios ... it is not ideal. (You don't want to take a day, or even a few, to restore after a failure.)
I think the best advice is...
Update to 1.0.
Just go to 1.0 and do snapshots... you will save yourself A LOT of headaches.
Also, Lucene at its core is an Index. Changing the query strategy might require reindexing. It is perfectly valid to throw data at it, build the index and throw away the source. You will just never get it back again.
While ES can be used and tuned as a store just fine, it is not necessarily its raison d'etre.
I help run a large ES cluster (with canonical data in MySQL), and I consider this cautious attitude by the ES developers to be a good thing.
For example, Elasticsearch has poor availability characteristics - both because it is master-slave and because it focuses on ensuring consistency - relative to, for example, something like Riak.
There's no holy grail of data storage... ElasticSearch is really nice, and if it fits your needs, more power to you.
Elasticsearch is brilliant as a NoSQL - and if you were already using elasticsearch as a search system, you dont need to introduce yet another component into your stack.
Other than that (which is just performance tuning, really), ES matches mongodb feature for feature, and obviously has a lot of extra power from its search heritage such as facets and percolate.
So I can't actually think of any limitations, and it's why I said ES makes a better MongoDB than MongoDB.
You need to have a good understanding of how tokenizers and analyzers work to be able to create good results for your data. I have difficulties matching documents with the exact title being searched for. On MongoDB that just works, on ElasticSearch you need to configure it.
ElasticSearch has some advantages and MongoDB others. I think they are great together. One for storage and the other for searching.
Internally it is still reindexing the entire document, but from your application's perspective, the Update API is a lot friendlier.
This is really important. Creating a proper searching experience with auto-complete which works "just like you want" can be a very painful experience with ES, specially if you are new to ES. It bite me some time ago when I was trying to achieve just that.
Side note: Happy Found customer here...you guys have made it much easier to run our ES index!
The point of that section is exactly that "NoSQL" (or to make things even more confusing "NOSQL" (Not only) doesn't have a very specific meaning. Some think it rules out ACID, other's don't. Thus, you'll need to know what you need.
And database marketing tend to not be very good at pointing out what they're not good at, or actually deliver what they promise. See also: http://aphyr.com/tags/jepsen
NoSQL was in large part about precisely what the name implies - giving up relational (SQL) data in exchange for better performance and the ability to have a distibuted store. Yes, part of this is also about being willing to trade off consistency for availability. But Elasticsearch is an example of a NoSQL store which does focus on consistency (in this case at the expense of availability and, to some extent, partition tolerance).
> Format in [lon, lat], note, the order of lon/lat here in order to conform with GeoJSON.
.. the data example below is not actually geojson. See the spec:
Elasticsearch can map any kind of JSON, so you can, without problems, write a mapping for proper GeoJSON points. (map "type" as unanalyzed string, map "coordinate" as GeoPoint). Arrays of values are generally supported in ES.
The biggest problem is that Elasticsearch probably does not provide all kinds of queries you'd like if you are working with complex shapes. Basically, only distance and simple location queries with polygons are supported.
You create a number of shards for each index(database) that you can't later expand.
I have my doubts mongodb would scale up that well to 20+ servers without some maintenance as well. So I'm not sure how that's really a limitation anyone should use for choosing mongodb or ES. If you're expecting that kind of data, just make a large number of shards in your index creation as it will work fine on fewer servers too?
larger number of shards=slower searching (unless you distribute the shards to multiple nodes)
The benefit of this is the as your app scales, you'll search only the shards needed. So if you have just 1 shard w/ data, u can tell ElasticSearch to just search in that 1 shard.
It's search capabilities and scalability and fantastic - were throwing GB of data into it weekly and it just soaks it up.
That said, it's definitely worth looking into both, depending on what your needs are.
(IMHO) Unfortunately for most of the people, old habits to be made. Indeed a nice project and great release.