Hacker News new | past | comments | ask | show | jobs | submit login
Elasticsearch – the definitive guide (elasticsearch.org)
172 points by donretag on March 20, 2014 | hide | past | favorite | 44 comments



Glad to see this. The reference documentation is pretty opaque.

So far, the best resource I've found is "Exploring Elasticsearch". http://exploringelasticsearch.com/


What version of Elasticsearch is this guide targeting? I believe there are some fundamental differences between e.g. 0.9 and 1.0 releases if I'm not mistaken.


Hi, that's my book! It's currently a little out of date, still targeting pre-1.0 Elasticsearch. I'm working on bringing the examples up to speed soon however.

The main issue with it at the moment is that it doesn't take into account the new aggregations stuff.


Unsure if you are talking about "The Definitive Guide" from the original link, or "Exploring Elasticsearch" from the OP.

If you are talking about "The Definitive Guide", it is targeted to Elasticsearch 1.0. Most of the APIs and concepts are backwards compatible, but we wanted to target 1.0 because it brings a lot of great new features.


I haven't been able to determine when the content was last updated, but some of the comments on the site are ~8 months old, so it might be a bit out of date.

That said, the basic concepts should be more-or-less unchanged.


Hi, I'm the author. I actually just (soft) relaunched the book using a new backend, so the content's a little stale. Now that I've revamped all the formerly poor quality ebook generation I can focus on content once again.

Aside from not mentioning aggregations it should be fairly accurate, and is still a useful guide for a beginner.

I' hoping to revamp it to discuss aggregations over the next couple weeks.

I lost most of my disqus comments, unfortunately, when I relaunched the site last week as the URL structure changed. I revamped the book to have more content per page. It's probably a little less SEO friendly, but makes for a nicer reading experience.


Andrew, exploringelasticsearch is cool, thanks for the time & effort. Just wanted to let you know that the github audio interview link is broken (or not available publicly) since few weeks: https://soundcloud.com/andrewvc-1/github-interview-edited

I would really appreciate if you could put it back, it's really a good listen.


Ack, soundcloud took it down when I uploaded another interview and went past their free plan. Just upgraded, it should now be available.



I'm looking at the guide and I see a lot of explanation but little to nothing in the way of clear, simple instructions that will cover the most typical basic use cases.

Of course Elasticsearch has many features and many different types of interfaces, but most people don't need to use most of those features, and having some example code available for a few common languages/platforms would be very useful.

Elasticsearch has done a great job of streamlining the use of Lucene and of course generally making many improvements, but based on the documentation I have seen including this new book, Elasticsearch must derive most of its income through consulting or support, and providing simple instructions obviously is a direct conflict of interest.

I believe that the average user is like me: they want to index some documents and then search the full text. They want a straightforward way to connect one or two search boxes on their web application to Elasticsearch and then retrieve some useful results. They do not want to learn the nuances of different engines or search interfaces. They do not want to read a book.


Well, you could use Solr. As with ES, the basic examples run out of the box. But frankly, whichever search engine you use, at some point you will have to read a book or dig hard to understand underlying issues such as tokenization. But starting is easy.

And the Solr community gets its income from all sorts of direction. So, the user mailing list is quite helpful.


Starting with ES is not easy. The documentation index is confusing. The basic examples that I saw only showed half of the equation: searching. And that example left out a lot of typical use cases without links to any examples for implementing those use cases.

So what you are suggesting, that I should use Solr since the Elasticsearch documentation is difficult to navigate or perhaps incomplete, is almost like saying "I think it should be difficult to figure out how to use ElasticSearch, since its so powerful, its not for kids. Therefore, the documentation is deliberately opaque. If you want something easy, use Solr."


I am a Solr specialist (and author www.packtpub.com/apache-solr-for-indexing-data/book ), but usually the comments I see about Elastic Search is that it is _easy_ to get started with. Then, it gets a lot more complex (and so does Solr). But I haven't done the study myself to have a strong opinion.

Still, if you have trouble, then yes, try Solr. They have a tutorial that walks you through the full example. Perhaps that will get you going faster.

If you are not picking ES because of a particular feature set, pick on whatever criteria you feel makes it more user-friendly. Whether ElasticSearch or Solr, you are benefiting from Lucene library underneath. I would think twice about any other non-Lucene based search engine at this stage of the search game.


I figured out what I needed to know about ElasticSearch. But that doesn't mean that I shouldn't complain about it. More people need to complain.


My biggest difficulty with the documentation is the lack of examples and a logical layout. That's why at our company (we do hosted ES), we decided to invest in a new documentation portal ala Stripe: http://www.silota.com/docs/api/

Hope you find that useful. Any feedback is appreciated!


Is Silota based on Lucene (or ES or Solr)? Because you don't seem to mention it. And if it is not, there is not enough information to make an informed choice. I don't see any information on tokenization nor does your own site seems to have a search engine to check (dogfood?).


Never mind. Seems to have missed (hosted ES) point. I would make frankly make that (or at least Lucene mention) a selling point as opposed to a buried one. That way people who actually know what to look for in a search engine know what's under the covers.


Good feedback. Still iterating on the messaging – at this stage, we are still figuring out how to describe the product.

When you begin incorporating ES into your application, roughly you’d be thinking about: 1. Figuring out the structure of your data and translating that into ES’s mapping 2. Learning the query syntax (the ES docs assume you already know Lucene, not usually the case.) 3. Setup an ingest workflow and keeping your indexed data in sync 4. Securing your cluster if you want to hit ES directly from the browser/API client 5. Maintaining your cluster

Silota attempts to solve 3, 4, 5. Improving documentation helps with 2.

There’s an e-commerce search example here: http://www.silota.com/docs/api/ecommerce-product-search-exam...


I agree. Searching is actually the easy part. Setting up a distributed cluster, allocating shards, replicas, rebalancing , etc is the hard part.


And that's supposed to be the hard part. There are no workaround for the hard parts. You need to understand the complex issues involved. However, you may want to read the blog from Found.no, they have a lot of great material: https://www.found.no/foundation/


Clustering Solr is even harder. I would say that, once you know what you're doing, ES makes this much easier.


What i'd really love to see is an administrating Elasticsearch guide and a common recipes guide, both would be super helpful for getting started and more advanced tasks. Having run into issues with Elasticsearch data scaling, trying to pull answers out of the current guides or the IRC channel is like pulling teeth... from an alligator... with a laser attached to its head.


Easiest way for me to learn Elasticsearch was through queries generated by Kibana [1] and playing with them inside Sense [2]. Kibana relies on Facets, version elasticsearch 1.0 introduces Aggregations, I would suggest using aggregations for your projects.

[1] https://github.com/elasticsearch/kibana

[2] https://chrome.google.com/webstore/detail/sense/doinijnbnggo...


I recently started with elasticsearch for a hobby project, and my biggest issue with it is really finding things in the documentation.

For example, the documentation brings up relationships, like parent/child, but it isn't clear how to do the simplest case : return a parent's child documents.

Or the fact that the all the parameters to a request isn't listed in one clear place. Its all hidden by a dozen separate examples, but if I want to know options I can set for a _mapping, I can't find it.


If you want to set up Elasticsearch and have a quick play around with some queries I have written a beginner friendly intro which you may find useful:

http://red-badger.com/blog/2013/11/08/getting-started-with-e...

Looking forward to reading this book! Elasticsearch has been a great tool for us.


There is also a free ebook exploring elastic search

http://exploringelasticsearch.com/


So excited to finally start seeing more books about Elasticsearch!

"Elasticsearch Server" is another really good book for anyone who's interested. http://www.packtpub.com/elasticsearch-server-for-fast-scalab...


Solr Cloud is more mature and more customizable. ES is perhaps easier to start with, but you could get a lot more customization and power with Solr Cloud. The Apache community will ensure Solr always remains at a bleeding edge and stable for large scale deployments.


I have used and spent time with both products and I couldn't disagree with this more. In my opinion, Elasticsearch is the more mature product and has been around longer than Solr Cloud (ES 2/2010, Solr Cloud 10/2012)[0][1], if you count Compass developement (ES' precursor) than even longer. Where ES was built from the ground up with clustering in mind, this is a feature that is certainly a primary focus for Solr Cloud but not a key feature for much of Solr's development. I would also contend that, in my opinion, there is no basis for the assessment that Solr Cloud provides "more power" than ES.

Elasticsearch has been an open project for some time, and they have recently formed a for-profit company in order to push the project forward even more. Certainly the Apache Project has an investment in Solr and, I certainly hope, that the project will continue even if Lucid Works (who employs 25% of the Solr developers)[2] were to close up shop.

[0]: http://en.wikipedia.org/wiki/Elasticsearch

[1]: http://en.wikipedia.org/wiki/Apache_Solr

[2]: http://www.lucidworks.com/about-us/


Is it odd that there is no search on the elasticsearch site / docs / guide?


ElasticSearch >> Solr


for the uninitiated, is there a decent blog/post somewhere that goes into some details comparing the two?


We have been running solr in production for 6 years or so. But I have with interest been following the ES development blog, since I think they have a nice api/features.

The momentum/community behind elasticsearch seems to be building rapidly(to get an idea, just follow the "This week in Elasticsearch" posts on http://www.elasticsearch.org/blog).

Having said that, solr have been rock solid for us so we have no real pressing need to switch. Also we dont really have the problem(massive scaling) that elasticsearch seems to be built to handle, we just have 15 mio pageviews or so pr month. Our setup is 1 solr master, and 5 solr slaves(one on each of our 5 webservers). And do nightly dataimports from our SQL server database. That last part is indeed where I think solr currently have a nice advantage over elasticsearch. The solr dataimporthandler is really nice if your primary datastore is a SQL server, and allows you to do all sorts of nifty javascript and other transforms on the data in-flight as you stream it from your SQL server. For elastichsearch there is a jdbc-river thingy that sorta lets you do the same, but it isnt as polished or usable as the solr dataimporthandler(IMO). And if you want to install it you have to do it via a plugin link that points to a bit.ly address.. which makes me feel uneasy.

I also like that solr comes with an admin GUI out of the box. There exist some ES equivalent plugins(mobz/elasticsearch-head), but like with the jdbc river its a thirdparty plugin and I guess you have to trust that it doesnt screw with your server. With solr all you need comes with the distribution, so you dont need to spend mental energy on wether or not you can trust this or that plugin to run on your server.

Also the .NET client for ES seems very polished and more sexy than the solr equivalent.

Anyway my non-scientific gut feeling is that with the current momentum behind ES, it will over time be a better choice than solr. But unless you really need to scale massively, plain old solr is available and works just fine. And seems to me to be somewhat easier to get running than ES(but then again I'm probably biased after having run solr a long time).


This is comprehensive look at the two from an author of 2 ES books and 1 Solr book. http://blog.sematext.com/2012/08/23/solr-vs-elasticsearch-pa...

Chance some ES stuff might be a touch out of date as Elasticsearch has evolved a lot since this was written.


They could not have chosen a more terrifying creature to put on the cover.


It reflects the brand/product. ElasticSearch is terrifying to use :)


FWIW, we didn't choose the animal. O'Reilly's design department does a voodoo incantation and choose an animal...the authors/editors have zero input :)

I quite like the snake though, think it looks nice


Congrats @clintongormley!


great job guys, looking forward for it!


I hear a lot about this but never really had a chance to incorporate it in my projects because I didn't understand what it as for and why


If you need people to navigate through the information you provide and it is more than a couple of pages of links, you need search engine.

If your stuff is generic text then Google will find it for you. However, if you have categories, unusual languages, business logic, geographic locations and other non-pure-text content, custom search engines like ElasticSearch, Solr, etc will give your customers much better results than generic Google search.

As an example, do a search at LinkedIn and see all those categories and limits popping up on the left. That's what you can get from the search engines, but not from Google.


That's a really nice explanation.

Now if someone could come up with a site with explanations like this (including when it is overkill compared to a basic alternative), that would be really useful. The problem is when you go to the sites homepages, they will tell you how you can do almost everything with the given technology.


What snake is on the cover? I thought it was a tapeworm at first - a rather odd choice for a marketing animal :)


Awesome, just awesome. Finally. Yay.

Thanks dudes and dudettes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: