
Elasticsearch 1.0.0 released - dakrone
http://www.elasticsearch.org/blog/1-0-0-released/
======
RyanZAG
Elasticsearch is really awesome for searching, but what most people don't
realize is that it makes a better MongoDB than MongoDB while giving you that
searching too.

~~~
axefrog
What limitations should one be aware of that would make ElasticSearch _not_ a
viable candidate where something like MongoDB would be a better fit?

~~~
alisson
On ElasticSearch you have to update the whole document, no commands to
manipulate them. You don't have commands like: $set, $addToSet, $pop, etc..

You need to have a good understanding of how tokenizers and analyzers work to
be able to create good results for your data. I have difficulties matching
documents with the exact title being searched for. On MongoDB that just works,
on ElasticSearch you need to configure it.

ElasticSearch has some advantages and MongoDB others. I think they are great
together. One for storage and the other for searching.

~~~
polyfractal
Regarding updates, you can use the Update API for partial updates, and include
a script to do things like _" counter += 1"_ or _" add value to existing
array"_.

Internally it is still reindexing the entire document, but from your
application's perspective, the Update API is a lot friendlier.

[http://www.elasticsearch.org/guide/en/elasticsearch/referenc...](http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-
update.html)

~~~
alisson
Thanks for pointing that out, it will be really useful!

------
m0th87
It was two weeks ago, and our startup was on the precipice of a major launch.
We had completely rewritten our online publication site, which drives the bulk
of our traffic. The product had to be shipped on-time - we had press releases,
eager investors and a launch party dependent on it.

A few days before launch, things were not looking good. As admins manipulated
articles in preparation for the launch, the servers kept crashing.

In a time-constrained major launch like this, a lot of nasty little hacks
build up in the codebase. Our search system for admins was a complete mess. It
was a custom solution that worked fine when admins managed a handful of
database records, but now that they were managing thousands of articles, it
was not scaling at all.

At the 11th hour, we dropped elasticsearch into our infrastructure. It worked
like a charm. The servers stopped crapping out, and we launched on time.

Elasticsearch mostly "just works", and we didn't have to worry about complex
schema definitions, working with giant complex XML files (hello Solr), or
build anything on top to interface between the index and the queries
themselves (Lucene). Thanks elasticsearch, you saved us!

~~~
troels
Did you try/consider Sphinx? It's simple and it's quite fast. I'm using that
and I'm pretty happy with it, but I might investigate ES at some point to see
if I can squeeze a bit more speed out of it.

~~~
rch
You might also take a look at the search functionality in Riak. I've run both
Solr and ES, the latter at significant scale, and I'm leaning more towards
Riak going forward. The difference is mainly convenience, so not a reason to
switch off something that's working already.

~~~
troels
Hadn't considered Riak, but I can see that it has some full-text search
capabilities. Any idea about its features and how it compares in performance,
as a raw search index?

~~~
biscarch
Riak 2.x uses Solr to index values from K/V with AAE. If you're interested in
how using it looks, I wrote a post using geospatial data here[1].

[1]: [http://www.christopherbiscardi.com/2014/02/07/geospatial-
ind...](http://www.christopherbiscardi.com/2014/02/07/geospatial-indexing-
with-riak-search-2-0-yokozunasolr/)

~~~
rch
If it's just Solr underneath, then why is the pesudo-Solr API implementation
not a complete implementation? Something to do with each node being an
isolated Solr instance maybe?

~~~
biscarch
There is backward compatibility with the old Riak Search (which wasn't Solr
based) intended to not break old applications, but you can query with any
currently implemented Solr client afaik.

------
mavelikara
ES seems to have ability to run analytic queries. I have read about people
using it as an OLAP solution [1], although I have not yet read anyone describe
their experience. In that respect how does ES analytics capabilities compare
against:

1) Dremel clones [2] like Impala & Presto (for near real-time, ad hoc analytic
queries over large datasets)

2) Lambda Architecture [3] systems (where queries are known up- front, but
need to run against a large dataset)

Does anyone here have experience ES in such usecases, beyond the free text
searching one ES is well-known for?

[1]:
[https://groups.google.com/forum/#!topic/elasticsearch/iTy9IY...](https://groups.google.com/forum/#!topic/elasticsearch/iTy9IYL23as)

[2]:
[http://static.googleusercontent.com/media/research.google.co...](http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/36632.pdf)

[3]: [http://jameskinley.tumblr.com/post/37398560534/the-lambda-
ar...](http://jameskinley.tumblr.com/post/37398560534/the-lambda-architecture-
principles-for-architecting)

~~~
zcrar70
I would also be interested in this.

------
Argorak
Beyond the technology, Elasticsearch has a very mature, active and helpful
community with users groups all over the world. We're well connected.

Pick your favourite users group here:
[http://elasticsearch.meetup.com/](http://elasticsearch.meetup.com/)

Full disclosure: I started and run the Berlin UG. We set ourselves apart by
always providing a small introduction into ES for those that are completely
new and would have a hard time following the main talk.

~~~
shurane
Intros to ES and other technologies are useful.

I don't see many tutorials covering usage of ES here:
[http://www.elasticsearch.org/tutorials/](http://www.elasticsearch.org/tutorials/)

Could you maybe provide a link to yours?

~~~
Argorak
The introduction is in person, at the users group.

Yep, tutorials is a huge problem, but there are people working on that.

------
bryanh
The thing that worried me the most about Elasticsearch was how fragile it got
around the limits of its performance. Run out of memory because of a nasty
query? Boom, data corrupted. I hope you weren't using it as your primary
persistence layer...

Otherwise, we love ES. The other comment about it being a better Mongo than
Mongo rings true. With the backup/restore API and the some of the circuit
breakers, I'm hopeful that my fears will be abated.

~~~
nzadrozny
Ditto open file handles, which is easy to push when aggressively over-
sharding. Not an uncommon mistake for the enthusiastic newbie.

Having supported Solr/ES/Lucene in production for 4+ years now (websolr.com /
bonsai.io) I would be pretty hesitant to trust Lucene in general as a primary
data store. Beautiful for secondary indexing, but otherwise, Why Not
Postgres?™ ;)

~~~
RyanZAG
Complexity. Having two copies of the data means more dev time, more resources
required to shift the data around, etc. Having just 1 data store that can also
handle all your searching is like the holy grail. As you say, not sure if
Solr/ES/Lucene are there yet - but they're definitely very very close. There
is no theoretical barrier either - it just comes down to closing bugs, and the
ES/Lucene team are very good at closing bugs.

EDIT: I don't think MongoDB is there yet either. There are definite benefits
and drawbacks between Postgres and ES, tipping heavily towards Postgres for
structured heavy write data. But for ES and MongoDB? I think MongoDB falls a
bit short there.

~~~
nzadrozny
Sure, that's a fair point. Data consistency reliability in ES and Lucene will
only get better over time.

But I personally suspect Lucene won't ever get away from the dreaded "just
reindex." And to the larger point, I think recent resurgent interest in data
stores and distributed systems have shown pretty clearly that there is no holy
grail. No single data store can provide all the semantics necessary for all
use cases. Maybe not even for most use cases. There are just too many
tradeoffs to consider.

Believe me, I earn a living hosting Elasticsearch, so I'd love to see it
become a robust primary data store. There are some use cases where it actually
does make sense—just look at the amazing traction ES is experiencing for
storing and indexing time-series data.

But as a general-purpose primary store, I'm not really holding my breath.
Maybe I'm just becoming battle-worn and bitter. I would love to be proven
otherwise over the next few years!

~~~
dclara
I'd like to learn from you about "general-purpose primary store". Do you mean
for storing any type of data? Here is what I think regarding the case you
brought up in the previous post:

ES is suitable for full-text based document indexing for enterprise level or
any websites, which means they have a reasonable amount of data to be indexed
in a given timeframe. A complete re-indexing won't not take for a couple of
days.

So the basic idea behind the NoSQL database is to dump the data into the
database quickly and return, so you can see very fast response for insert and
delete. Then it will load the data into the memory to process for real-time
retrieval which also produces fast response from select. I'm not sure about
update.

If the data volume grows, they quickly add shards or make the number of pre-
shards big enough to allocate enough memory resources to handle the queries or
let the OS to swap the memories by adding more server nodes.

So if you want to use NoSQL database, you must be bound with the system
requirement and make your application fit into that and take the most
advantage from it. Otherwise, if you are running high structured data store,
better to use relational database.

Another point is: if the documents are collected from the web like search
engine, NoSQL will not fit for the large volume of data and relational
database is also used to store the indexed data for fast retrieval. I guess
this is what you meant "general-purpose primary store".

Correct me if I'm wrong.

------
sandstrom
This gem is from the 'breaking changes' list:

    
    
      “Geo queries used to use miles as the default unit. And we 
      all know what happened at NASA because of that decision. The
      new default unit is meters.”
    

I like this release already.

~~~
roryokane
Link to that page:
[http://www.elasticsearch.org/guide/en/elasticsearch/referenc...](http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/_parameters.html)

------
lflux
> Easy to read, console-based insight into what is happening in your cluster.
> Particularly useful to the sysadmin when the alarm goes off at 3am and JSON
> is too difficult to read.

It's these little details I love, when a project actually cares about
operations and not just "well here's the API"

I've been using ElasticSearch only for Logstash, but i've been blown away so
far as how easy it is to deal with.

------
axionike
ES has performed very well for us as the backbone for the solution we deployed
for a large government-sector customer. Had some GC issues initially, and were
worried about user concurrency, especially since we were not restricting
queries (i.e. users can do full-scale wildcard searches against the entire
data set of 1BN+ records). But ES continues to shine.

Congrats to the ElasticSearch team, and all the supporters around it. Once I
get back into more of a coding role, I'll definitely be contributing back to
the ES project.

~~~
room271
This may require a bit more lengthy answer than makes sense here, but I'm
curious about what was causing your GC issues and how you fixed them (we have
GC issues at the moment).

~~~
polyfractal
Not the OP, but GC issues in Elasticsearch basically boil down to memory
pressure (obviously), which is usually caused by facets. Facets eat a lot of
memory, especially if you are faceting high-cardinality fields - think fields
like "tags" or any analyzed field. High cardinality, analyzed strings is the
easiest way to blow out the heap.

There are other reasons, but that is like 90% of GC issues. To solve it, you
need to make sure your faceted fields are configured well (usually
not_analyzed) and assess how much memory is available. You may be able to
index and even full-text search ten billion docs on a single machine, but
faceting it may just be too much to ask for a single node.

Omiting norms, disabling bloom filters on old indices and enabling doc values
are other ways to help alleviate field-data pressure.

Other GC culprits can be: too large bulk requests, unbounded threadpool
queues, or something like parent/child/scripts/filter cache keys eating all
your memory. Also don't go above 30gb heaps, the JVM becomes unhappy :)

------
NDizzle
I also took a few days a few weeks ago to setup elastic search after my mysql
full text search fell apart.

What I'm doing is slamming the full text output of OCRed PDFs into a MyISAM
table, the entire document in a text field.

What I'm afraid I'm not doing right is creating the web interface to search
elasticsearch. What I'm using filters with the query string syntax[1] in the
search box, pointing directly at that fulltext column. I'm also using the
highlight functionality so that I can specify how many highlight blurbs to
return with the result. The query string syntax works great with the OCR'd
text, because most of it is near-garbage (as most ocr is) so you can search
for something like "net sales"~50 to find those two terms within 50 words of
each other. I think the results were something like: net sales 15,000 results
"net sales" 120 results "net sales"~50 550 results

Can anyone point me at a good web based search implementation using
elasticsearch that explains how they're doing it?

What I have works pretty good, I just want to... check my work, I guess.

[1]:
[http://www.elasticsearch.org/guide/en/elasticsearch/referenc...](http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-
dsl-query-string-query.html#query-string-syntax)

~~~
nzadrozny
I host and support websolr.com and bonsai.io and have seen a lot of search
implementations.

The main thing for good stability and performance is to be very good at
batching your updates. You don't want to sling a ton of highly-parallel
single-document updates at Lucene, lest you thrash the JVM and start garbage
collecting like crazy.

From there, on the query side, you'll want to get a good working knowledge of
the different tokenization and analysis options. There are a lot of subtle and
interesting combinations to be had in there that influence performance and
relevance of your search results.

~~~
NDizzle
Do you have a demo on either of those sites where I can input terms into a
search box and look at results? What explanation do you give to users as to
the options available when formatting the query?

~~~
nzadrozny
We've got a free Heroku addon that's pretty easy to spin up and play with.
Elasticsearch also has an analyze[1] API that can be helpful to play around
with.

It's also possible to download and install ES locally and run any number of
front-end interfaces, some of which include query builders. ElasticHQ seems
like a decent option for that. The venerable Elasticsearch-head is another.

I think now that ES 1.0 has shipped, more experimental tools will start to
emerge that help people learn and interact with ES itself. (If anyone out
there is a front-end whiz and wants to help me build something like that,
please email nz@bonsai.io!)

1\.
[http://www.elasticsearch.org/guide/en/elasticsearch/referenc...](http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-
analyze.html)

------
xutopia
I love when something I've been using in production for what seems like years
just announces now that they've reached 1.0.

~~~
brickcap
Well does it not make you feel glad that you took the risk? After all version
is just a number :)

------
dabeeeenster
ES is a fantastic project. Thank you thank you thank you for your
contribution; truly standing on the shoulders...

------
jonhmchan
Congrats to the team - absolutely love elasticsearch. Having a lot of fun with
it here at Stack Overflow.

------
pron
What does Elasticsearch add on top of Lucene?

~~~
lobster_johnson
A lot. Lucene is basically the inverted indexes, providing on-disk structures
and a mechanism to query, as well as assorted bits like tokenization.

ES adds distribution (multimaster-replicated cluster of nodes connected via a
gossip protocol), sharding, defines a document model and schema (the mapping
of arbitrary JSON documents to index structures), faceting, aggregation (ie.,
roll-up-type calculations), various types of scoring (eg., geographic
distance), ETL ("rivers"), backup/restore, performance metrics, a plugin
system (eg., for indexing different file formats) and a bunch of other things
-- and of course a REST-based API on top of the whole thing.

------
buckbova
I didn't know what this was and looking at this link it was tough to tell.

The github lays it out well.

[https://github.com/elasticsearch/elasticsearch](https://github.com/elasticsearch/elasticsearch)

------
alecco
Why is it awesome? Why "it just works"? Is it just a mongodb-kind document
store over Hadoop+Lucene?

What makes it so special to have hundreds of votes and tweets all around
within 2 hours?

I don't understand. A DB engine engineer.

~~~
gibrown
There are a lot of features thoughtfully combined that make ES great. Top of
my list would be:

1\. It handles human written language. Any language. The same technology that
let's it handle strings written in human language provides a lot of
flexibility in handling string in other applications. Particular when handling
logs.

2\. Non-string data it also handles very fast and cleanly (numbers, dates,
geo).

3\. Lucene has an inverted index that has been optimized over many years. ES
scales that pretty seamlessly across many servers. All decisions in the
project seem to be made around whether a feature can scale to 100s of nodes.

The devs have also been really smart to focus on the "out of box experience".
Very well thought out defaults.

More on our experience with ES at scale:
[http://gibrown.wordpress.com/2014/01/09/scaling-
elasticsearc...](http://gibrown.wordpress.com/2014/01/09/scaling-
elasticsearch-part-1-overview/)

~~~
buckbova
Is this accurate to elastic search since it is build on Lucene?

[https://lucene.apache.org/core/](https://lucene.apache.org/core/)

"index size roughly 20-30% the size of text indexed"

That seems excessive for an index.

~~~
gibrown
Not sure how that's calculated. I assume it is accurate, but the index size is
going to depend a lot on what kind of text you have and how it is separated
into individual terms (or n-grams or all the other ways you can tokenize and
filter to create individual terms).

Personally, I think of disk space as cheap, and am far more concerned with
having options to improve speed and quality of search results.

------
philfreo
We wrote a tutorial about how we wrote our search for Close.io using
elasticsearch and pyparsing:

"Sales data search: Writing a query parser / AST using pyparsing +
elasticsearch"

Part 1: [http://blog.close.io/sales-data-search-writing-a-query-
parse...](http://blog.close.io/sales-data-search-writing-a-query-parser-ast-
using-part-1)

Part 2: [http://blog.close.io/sales-data-search-writing-a-query-
parse...](http://blog.close.io/sales-data-search-writing-a-query-parser-ast-
using)

------
karterk
Elasticsearch mostly "just works". The latest version of Solr has made
clustering easier (requires managing Zookeeper), but before that, it was
either ES or nightmare.

Lucene is one of those projects which hardly has any real competition. That's
surprising given how many real world software projects have a search
requirement. While Lucene is excellent, it's not without flaws and competition
is always great.

~~~
swah
Hmm, could that be because they have to compete with free?

~~~
malaporte
Lucene does have competition, mostly in the commercial world. I know, since I
work for one of those companies :p

Solr, ElasticSearch, etc. are mostly concerned about the index/search
features, and they do quite a good job there. But this still leaves a huge
amount of space for commercial offerings, as core search is only a part of the
problem. I'm thinking about connectivity with complex enterprise systems,
support for the specific security models of those systems, integration in
other systems, etc. Believe me, those problems are not easy to solve.

So, even if we have an index that can most probably match Lucene's feature for
feature and quite a lot of things beside, we typically won't go after deals
where simple search is the only requirement. Instead we focus on larger deals
with more complex requirements. And we're doing quite well, thank you :)

------
Zilog
Too bad they have yet to address the split brain issue.

~~~
chriscareycode
I haven't had a split brain on my 15 node cluster in over 6 months even though
the cluster is split among multiple data centers which do drop connectivity
from time to time. When the setting was wrong, it happened constantly. Tune it
properly and it won't happen. n/2+1

------
hungryblank
At Contentful in Berlin (Germany) we're looking for an elasticsearch/lucene
expert, if you're excited by this tool and want to work full time with it get
in touch.

[https://groups.google.com/d/msg/elasticsearch/Rb7Lei4gaaE/7I...](https://groups.google.com/d/msg/elasticsearch/Rb7Lei4gaaE/7IDPuPxQV-
IJ)

------
capkutay
I was vetting ES for a business critical search platform, had some concerns
about write/read performance and how the lucene indexes are handled on disk. I
read that it doesn't really perform as well a splunk...Instead of ES, I'm
considering a solution using HBase to shard lucene indexes on HDFS.

------
gane5h
Really impressed with the pace of innovation in the last few months: cat api,
aggregations, snapshots. The unfortunate side effect is that books and stack
overflow posts written before 1.0 are outdated.

Disclaimer: I’m the founder of a hosted Search As A Service and we use ES in a
few critical parts of our infrastructure.

------
mtrn
Elasticsearch is a really great piece of software because it makes the simple
easy and the complicated possible.

------
vhost-
I'd be curious to see how well Elastic Search holds up to Endeca. I'm
currently stuck maintaining some Endeca instances and it's a nightmare. I wish
I could go back to ES.

At my last place of work, ES was beautiful and required little work to get a
very fast, workable search in place.

~~~
quicksilver03
FYI, at my shop we use Oracle Commerce (ATG) and we've seen Oracle's
salespeople pushing Endeca to all current and new customers.

For our current project we went with ElasticSearch and we're quite happy. One
of the contributing factors was that one of our most experienced guys was
unable to get the damn thing installed, even with the help of one Endeca
consultant.

------
pyotrgalois
Great news. In every new project that we create (in general REST JSON APIs
made with nodejs, erlang or rails that are consumed by iOS and android
clients) we always finish using postgresql, redis and elasticsearch. Great
tools.

------
kailuowang
Congratulations to the team. This is a great library that we really
appreciate.

------
willcodeforfoo
Congrats! Elasticsearch is one of my favorite recent pieces of technology.

------
rartichoke
ES is one of the few techs that I seriously love.

The rails support for it is amazing too. The guy creating the rails
integration lib is really talented and active.

------
elchief
Anybody know if elasticsearch does multiword synonyms properly? (Solr
doesn't). Thx

------
skarnik
congrats to the team!

------
dreamdu5t
We recently switched from using MixPanel + Crittercism + Sphinx to using
qbox.io (hosted elasticsearch) and Kibana to do all our analytics, crash
reporting, and search.

I can't recommend qbox.io enough! Point-and-click scaling of managed
elasticsearch clusters + Kibana == bliss.

