On the product side, I'm sitting here being amazed at some of the problems they've solved very elegantly. Elasticsearch has a bright future.
I havent looked at Elastic Search in a long time, so I really do want to know. Not trying to pick a fight. ;)
* Much more approachable config.
* Its clustering is easier to setup.
* Eventhough logstash is a bit heavy for my taste, the whole ELK stack is really nice for aggregating server logs.
As things are, one can always direct an app sever's syslog to a logging fleet running logstash (or elasticsearch running embedded logstash): http://cookbook.logstash.net/recipes/rsyslog-agent/
It looks pretty promising, but I have yet to meet anyone who uses it.
That's what logstash-forwarder (formerly lumberjack) is for. It's in Go, not Java.
If you want to know more details, vote for my talk in November, I'll be digging into much more technical comparison: http://lucenerevolution.uservoice.com/forums/254257-open-sou...
Elasticsearch has easier config, especially for clustering. It is designed to be schemaless so you can push almost any JSON data into it.
Performance is adequate in both.
It's not a big advantage either way, unless you need clustering.
In short, it's debatable whether an investment into npm, Inc. will directly pay off for the investors. What the investment is more likely about is creating the infrastructure for new, billion dollar companies to pop up, giving those investors an inside track to the new companies.
In the case of ElasticSearch, investing in this infrastructure project absolutely makes sense. "Big Data" is becoming huge, but it's still relatively dumb. Up until now, we've primarily been focused on tools and technologies to source, aggregate, and analyze the data. But lots of companies are now popping up who are built on the idea of making intelligent use of all this data, far beyond what humans are naturally capable of. ElasticSearch of course isn't the whole solution, but it's part of it.
(I work for one of those companies, and we use ElasticSearch)
The world is finally figuring out that it's impossible to employ top talent, but that it can be quite lucrative to sponsor it.
Now marvel is where they've started monetizing and I'm sure they make a good bit of money from "professional services" teaching companies how to deploy ES at scale. Hopefully they pull it off, as the world needs a good competitor to splunk. ES has the backend tech, but kibana has a loooooong way before it rivals the interface for searching splunk. Here's to hoping!
Their plans are to offer support, and make their bones that way. Elasticsearch is a complex enough product at scale that this would probably be quite lucrative.
After spending a few hours with the documentation, I felt like I had a generally good feel for how you would interact with ES through curl, but then jumping into using the Ruby library there seemed to be a big leap and I felt like I needed to have a much more intimate knowledge of how ES worked to "get" it. A lot of guess and check before I figured out how to query properly. I've also been unsuccessful in figuring out how to implement accent folding. At this point I assume that this has something to do with "mapping", but I couldn't figure out where that was supposed to be set up (again, through the Ruby lib, but I was also unsure where to start to just accomplish that via curl…).
Who knows, maybe I just need to spend more time reading the docs, and maybe the information that I need is in there somewhere, but to me it felt incomplete or disorganized. I'd love to stop using Solr for document search, but making ES do things that I know how to do in Solr ended up being too time consuming.
I ended up feeling like I probably had to go to a training session to really get it, which is a shame…
here's the things I noticed when using it:
- Since it's schemaless by default it will guess the data type of a document based on the first value it sees, as far as I'm aware you're not able to change a data type later on, so I found it best to create a schema (aka mapping) being explicit about the fields data types up front. I think explicit is better anyway (Zen of python ;) )
- It's blazingly fast. Like crazy quick.
- Use the geo data type if you're going to be doing radius queries. I've got 50M documents in the index and it queries insanely fast. It's been just as fast as PostGIS (which I also love)
- Use this as the GUI: http://mobz.github.io/elasticsearch-head/
- Do some proper research on filtering before you start, start here: http://www.elasticsearch.org/guide/en/elasticsearch/referenc...
the correctly configured snowball filter will make sure things like "rückwärts" will match "ruckwarts",
My reason for not using it originally was that we had prototyped our solution using Compass (which was the project before Elasticsearch) only to have it abandoned to work on Elasticsearch. So I was concerned about the "one dev" model and losing something that was a key component of what we needed.
Since then I've not been impressed with SolrCloud (would love to hear good experiences, but it seems their distributed model isn't right) and have been giving Elasticsearch serious thought again.
I didn't say nor mean to imply that. Technically, Solr is based on servlet technology whether stand-alone or not, you can choose to deploy it in a web container of your choice or use the Jetty instance it comes with for the "stand-alone" experience. I don't know much about Elasticsearch's architecture personally.
Just use a different transport implementation:
The documentation is also fantastic and the plugin availability pretty good.
The best proof is that Elastic Search is slowly hitting the same problem Solr hit at some point in the past and has to deal with them (scripting is now disabled by default, analysis/query language is slowly getting more complex, etc).
Not trying to downgrade ElasticSearch, they have done great things and have features Solr hasn't matched (yet). Percolation is one (though check Luwak for a comparison: http://www.berlinbuzzwords.de/session/turning-search-upside-... )
It's typically the rest of the pipeline now that causes most of the latency whereas search used to be the bulk of the duration of a request.
It has also spurred other entities to improve their search performance :)
Apparently what Amazon provides is called Elastic MapReduce, not Elastic Search.
Admittedly they were pretty poor early on, but they've matured with the product.