
Call me maybe: Elasticsearch - itamarhaber
http://aphyr.com/posts/317-call-me-maybe-elasticsearch
======
keypusher
As someone who is currently building out a distributed storage cluster system,
this series has been amazing. Not only are the results informative, but I have
learnt so much about clustering, consistency, testing methodology and what to
look for when evaluating reliability on these systems. Very nicely done.

~~~
itamarhaber
Yep - Aphyr, the destroyer of databases :)

~~~
syntern
I'd rather say "destroyer of database myths".

------
Radim
Indeed. ElasticSearch is super useful, but its docs used to be of the
frustrating variety: "Explain minute API details, as if a passing note from
the lead dev to himself, assuming all context and concepts are understood and
obvious. Don't bother with why's & what's & high-level nonsense."

It's been improving lately though, "going big" helped ES here.

My personal favourite: an issue from 3 years ago, where ElasticSearch returns
incorrect facet counts (as in, fundamentally BROKEN faceting). Still
unresolved:
[https://github.com/elasticsearch/elasticsearch/issues/1305](https://github.com/elasticsearch/elasticsearch/issues/1305)

~~~
capkutay
but its docs used to be of the frustrating variety

I hope this doesn't sound like a knock on open source technology, but when a
company's only revenue stream is providing training and support for an open
source framework, doesn't writing laconic documentation seem to be in their
interest?

~~~
stingraycharles
That depends on how you look at it. A short-sighted manager might think that
way, but it will hurt (early) adoption / engagement. The target audience for
big support contracts are enterprises anyway, and they will not buy support
solely for the documentation.

~~~
capkutay
These guys win early engagement by being free and having awesome
functionality.

Want to tune it to actually work for your needs? Trying to figure out how to
use our 'RAMOptimization' library? Better start buying some training!

------
syntern
TL;DR:

"Some people actually advocate using Elasticsearch as a primary data store; I
think this is somewhat less than advisable at present."

"The good news is that Elasticsearch is a search engine, and you can often
afford the loss of search results for a while."

My personal favorite solution would reliably channel data from a Riak cluster
to an ES cluster. Anyone knows if there is something like that out there?

~~~
kansface
Before you do that, check out [http://aphyr.com/posts/285-call-me-maybe-
riak](http://aphyr.com/posts/285-call-me-maybe-riak)

~~~
rdtsc
What do you mean? Are you saying it because Riak with sibling preservation is
actually one of the few (if not the only) database that didn't drop writes.

~~~
rspeer
So, what do you do with siblings in a searchable text index (the case that
ElasticSearch is designed for)?

------
lbarrow
Aphyr is truly amazing. This series of blog posts has introduced me to a
rigorous, careful way of thinking about distributing systems. I'm a much, much
better developer for having read his blog. How many people can you really say
that about?

~~~
benjaminwootton
I also feel much more stupid after reading his blog. Great content but bad for
my ego!

------
silenteh
I am currently writing a Golang client for Elasticsearch which uses the native
binary protocol and I have to say the lack of documentation about it is making
the process really painful!

I tried to use the Elasticsearch thrift plugin but unfortunately it does not
work for the version 1.1 and 1.2

So basically I have to inspect each and every byte of each and every request
and response in order to be able to send or parse data.

While developing the client a managed several time to crash the Elasticsearch
server by sending malformed packets. In addition, this, brought me to review
the networking part of Elasticsearch code and I think it needs a refactoring
and a better, deeper and cleaner usage of Netty.

I hope they will soon sort out this and the problems mentioned in the article,
since I think that Elasticsearch is really an amazing product!

~~~
diminish
So if Elasticsearch is a stringified (JSON/HTTP) wrapper around Lucene with
simplified setup for the web app crowd, you are making a binary-fied wrapper
around it. Why not skip ES altogether, or use the JSON api?

~~~
lobster_johnson
ElasticSearch is a _lot_ more than just a "stringified wrapper around Lucene".
Lucene is used for the underlying inverted indexes, the item store and
tokenization/analysis, and that's pretty much it. ES adds clustering, a query
DSL, configuration, data mapping system, "river" functionality, HTTP API etc.

------
Torn
From the article comments:

> seems like the ES team is moving in the right direction with testing this
> stuff
> [https://github.com/elasticsearch/elasticsearch/commit/ef7593...](https://github.com/elasticsearch/elasticsearch/commit/ef759322231b21aa3c8b160f86b895483cff1ebf)

------
bzelip
I'm not much of a programmer, but it's great to know about this guy. The
diversity link someone posted here [0] is inspiring.

Off topic question about Aphyr's website: the stylesheet is linked to only as
`<link rel="stylesheet" type="text/css" href="/css" />` How and why does this
work?

[0][http://aphyr.com/diversity](http://aphyr.com/diversity)

~~~
SethKinast
The /css route is being served with the correct MIME type-- "text/css". That's
all that really matters, despite the unconventional name.

This is likely a route that aggregates a collection of CSS automatically and
outputs it.

------
nsxwolf
Can someone explain the "Call me maybe" theme/meme? I know it is a song, but
what's the relevance here?

Edit: I looked at the archive and found the original post where it is ...
erm... explained?

~~~
bjt
The name of the "Call Me Maybe" artist is Carly Rae Jepsen.

The name of the test framework is Jepsen.

Nodes in distributed systems call each other. Except under partitions, when
they call each other... maybe.

------
room271
The 'Nic' he quotes in the article is me :) This is seriously the highlight in
my career folks!

Although to be fair, he quotes me generously. Later on in that discussion I
give up my wisdom and get confused again.

~~~
donretag
I'm in there as well.

------
cjbprime
Aphyr is amazing and we're very lucky to have him doing this!

------
sjaaktrekhaak
Shay says they've already "fixed this" [1] and the aim is to get the fix in
1.3. I'm interested in what the fix actually is.

To quote Shay: "FYI, the improved_zen branch already contains a fix for this
issue, we are letting it bake as this is a delicate change, and we are working
on adding more test scenarios (aside from the one detailed in this issue) to
make sure. The plan is to aim at getting this into 1.3. We have not yet ran
Jespen (which simulates the same scenario we already simulate in our test),
but we will do it as well."

[1]:
[https://github.com/elasticsearch/elasticsearch/issues/2488#i...](https://github.com/elasticsearch/elasticsearch/issues/2488#issuecomment-46135721)

------
programminggeek
Elasticsearch is great for um... search. Like, you use it as an index to point
to the real system of record and it is not expected to be perfect (or
shouldn't' be).

I though it worked out real well when used to alleviate pressure on the
database which was being used for search results (which were sometimes ajax
live search style). The big benefit was our database usage went down and
search was better/more reliable.

The other big benefit is if elastic search goes down search stops working.
That is FAR better than if elastic search goes down the whole database and
site stops working.

At a big enough scale, with thousands of dollars in transactions every day,
the database can't go down. Search can break gracefully, but the spice must
flow (so to speak).

~~~
steveklabnik
From the article:

    
    
        > The good news is that Elasticsearch is a search engine, and you can often
        > afford the loss of search results for a while. Consider tolerating data
        > loss; Elasticsearch may still be the best fit for your problem.

------
AznHisoka
These problems are a major reason why I decided to not go with the typical 1
cluster-multiple replica infrastructure.

Instead I have multiple clusters, 0 replicas, and load balance against those
clusters. No split brain problem, but the same data availability benefits as
having replicas. Granted, the logic is all in my app now. Now when a cluster
is down, I have retry logic to keep reindexing that data to that cluster until
it succeeds.

~~~
gibrown
Doesn't that mean that you immediately lose data if a node goes down or a disk
dies?

FWIW the split brain problems (while not impossible) are extremely rare in ES
in my experience.

Seems like you've traded one potentially rare problem for a lot more
complexity and other problems.

~~~
AznHisoka
I'm using Sidekiq, a messaging queue framework to handle indexing data to my
nodes/clusters. So if one goes down, it will always retry reindexing in a
future time over and over until its successful. Everytime there's new data, I
need to send the same data to all of the clusters.

~~~
gibrown
Interesting. Were you actually seeing failures without running multiple
clusters?

------
linux_devil
Informative , started looking into ES since past week for indexing , really
like the ease of use in ES , but definitely there are few points to be kept in
mind.

------
fasteo
I am a happy user of ElasticSearch I always thought that it would pass the
"Call me maybe" test with flying colors...Well, not

------
johnnymonster
whats up with the OP and barbie memes?

~~~
SEJeff
aphyr is homosexual and isn't afraid to let the world know[1]. I think it is
just his own humor. I actually find it pretty hilarious how often he changes
his twitter picture and the strange things he switches them to.

It is just a quirk of an incredibly brilliant person, which I personally find
hilarious. That being said, he has a great point of view on non-tech things
and is a super humble guy. As quoted from the movie Van Wilder, "You shouldn't
take life so seriously, you'll never get out alive!"

[http://aphyr.com/diversity](http://aphyr.com/diversity)

~~~
bch
[https://twitter.com/aphyr/status/479726976471539713](https://twitter.com/aphyr/status/479726976471539713)

~~~
peterwwillis
lol lol lol lol lol lol lol lol lol @aphyr #homosexual #barbies #makessense

~~~
dang
What exactly are you trying to say?

~~~
peterwwillis
The correlation between being homosexual and liking Barbie dolls, along with
the Twitter conversion linked, is hilarious.

------
eli
I am probably just an aging fuddy duddy, but the animated GIFs make me much
less likely to share this article with my team.

~~~
lomnakkus
Is this sarcasm or are you actually being serious? If you're being serious,
why are you letting such absurdly superficial issues determine whether to
share a genuinely informative and intereseting article?

~~~
aikah
Someone could ask the same but backward,why waste a good article with childish
animated gifs.

~~~
anomaly47
Because they're hilarious, that's why. How about having a sense of humor?

