Hacker News new | past | comments | ask | show | jobs | submit login
Call me maybe: Elasticsearch (aphyr.com)
392 points by itamarhaber on June 19, 2014 | hide | past | web | favorite | 89 comments

As someone who is currently building out a distributed storage cluster system, this series has been amazing. Not only are the results informative, but I have learnt so much about clustering, consistency, testing methodology and what to look for when evaluating reliability on these systems. Very nicely done.

Absolutely agree. The author is a quadruple threat -- knows the theory, communicates clearly, gets their hands dirty, and is incredibly thorough.

Yep - Aphyr, the destroyer of databases :)

I'd rather say "destroyer of database myths".

Aphyr calls it Jespen ;)


Indeed. ElasticSearch is super useful, but its docs used to be of the frustrating variety: "Explain minute API details, as if a passing note from the lead dev to himself, assuming all context and concepts are understood and obvious. Don't bother with why's & what's & high-level nonsense."

It's been improving lately though, "going big" helped ES here.

My personal favourite: an issue from 3 years ago, where ElasticSearch returns incorrect facet counts (as in, fundamentally BROKEN faceting). Still unresolved: https://github.com/elasticsearch/elasticsearch/issues/1305

but its docs used to be of the frustrating variety

I hope this doesn't sound like a knock on open source technology, but when a company's only revenue stream is providing training and support for an open source framework, doesn't writing laconic documentation seem to be in their interest?

Equally, if a company's only revenue stream is paid upgrades, doesn't that mean releasing software in urgent need of upgrading, with bugs and poor forward-compatibility is in their interests?

And if a company's only revenue stream is monthly or annual subscriptions, doesn't that mean firing all the developers and letting the product stagnate as money keeps flowing in is in their interests?

That depends on how you look at it. A short-sighted manager might think that way, but it will hurt (early) adoption / engagement. The target audience for big support contracts are enterprises anyway, and they will not buy support solely for the documentation.

These guys win early engagement by being free and having awesome functionality.

Want to tune it to actually work for your needs? Trying to figure out how to use our 'RAMOptimization' library? Better start buying some training!


"Some people actually advocate using Elasticsearch as a primary data store; I think this is somewhat less than advisable at present."

"The good news is that Elasticsearch is a search engine, and you can often afford the loss of search results for a while."

My personal favorite solution would reliably channel data from a Riak cluster to an ES cluster. Anyone knows if there is something like that out there?

Funny, the link for "using Elasticsearch" points to an interview I gave ;) He raises some good points on ES current problems with master election, I have raised it with the ES team during meetup with them and we have a work around the issue (we discovered the bug during our testing). Its important to know the soft points of the system your using and how to work around it. We feel like we have a good workaround and I think that has been the point of his series is to point out the flaws in common tools and you should be read to work around them. But he found a single flaw and attacked it hard, so, I am not sure throwing away the whole thing for a single flaw is a great recommendation.

Would you mind sharing your workaround?

Before you do that, check out http://aphyr.com/posts/285-call-me-maybe-riak

TL;DR: Aphyr talks at length about how riak is not a CP system, even though nobody claims that it is. Riak is AP. Aphyr demonstrates how this plays out in practice and gives an intro into CRDTs, which let you achieve something that is eventually correct for certain kinds of state transformations.

Every conclusion he comes to are also described on Basho's site when it comes to using last-write-wins. At the end of the post, notice CRDT's preserve 100% of writes, and the same could be had if allowing siblings. Punting on high-availability is different than the company being upfront about the tradeoffs.

What do you mean? Are you saying it because Riak with sibling preservation is actually one of the few (if not the only) database that didn't drop writes.

So, what do you do with siblings in a searchable text index (the case that ElasticSearch is designed for)?

Riak Search will do this for you. The Riak key/value store also has post-commit hooks that you can use: http://docs.basho.com/riak/1.2.1/references/appendices/conce...

Basho is implementing a much better search option based on distributed Solr (instead of relying on something in-house), due out this year with the 2.0 release. It's available for testing today.

(Full disclaimer: I work for Basho.)

"Based on distributed Solr" might be an over simplification. It uses Solr as its indexing engine, but really, that engine could be any single-node indexer... including single ES nodes. Basically, Yokozuna adds real grownup distributed systems computer science to the OSS distributed search space. http://docs.basho.com/riak/2.0.0beta1/dev/advanced/search/

To me, the Yokozuna "mediator" seems highly specific to Solr. It's not like you could swap out Solr for ES with just a little glue code, I think?

I'm building an internal geospatial product on the beta, anticipating the release. Great results so far.

I have briefly checked the Riak yokozuna project, and it really looks great. Nevertheless, I'd prefer to have the data storage cluster and the search cluster to be separated (and I really like ElasticSearch too).

Edit: I shall add that the most interesting search problems are the ones where you need to join separate data sources, and in such cases it is not really the question of what kind of search solution you are using, rather what kind of async queue and data update you have. So the separated cluster is really about having a distributed queue between the 'data-master' and the 'search-master'.

How about Riak CS for storage, separate from the search cluster?

I am not sure what difference it makes to have Riak or Riak CS for data storage, in the context of having a separate search cluster. Would you elaborate?

Yokozuna and CS (or plain Riak) use different backend configurations[0,1]. So beyond having physically distinct clusters, you'd get to have purpose optimized storage, but with a common management regime.

[0] http://docs.basho.com/riak/latest/ops/building/planning/back...

[1] http://docs.basho.com/riakcs/latest/cookbooks/configuration/...

DataStax Enterprise integrates Solr with Cassandra. The Solr docs are stored in Cassandra. Solr and Cassandra occupy the same JVM. As a result all docs inserted into C* are indexed using Solr, and Solr get's the high availability/linear scalability of Cassandra. I successfully deployed it as a customer, and I know of several clusters in the many hundreds of nodes.

Aphyr is truly amazing. This series of blog posts has introduced me to a rigorous, careful way of thinking about distributing systems. I'm a much, much better developer for having read his blog. How many people can you really say that about?

I also feel much more stupid after reading his blog. Great content but bad for my ego!

I am currently writing a Golang client for Elasticsearch which uses the native binary protocol and I have to say the lack of documentation about it is making the process really painful!

I tried to use the Elasticsearch thrift plugin but unfortunately it does not work for the version 1.1 and 1.2

So basically I have to inspect each and every byte of each and every request and response in order to be able to send or parse data.

While developing the client a managed several time to crash the Elasticsearch server by sending malformed packets. In addition, this, brought me to review the networking part of Elasticsearch code and I think it needs a refactoring and a better, deeper and cleaner usage of Netty.

I hope they will soon sort out this and the problems mentioned in the article, since I think that Elasticsearch is really an amazing product!

    > I am currently writing a Golang client for Elasticsearch 
    > which uses the native binary protocol 
Why on earth would you do that? Is request serialization and transfer time via the JSON API even approaching 1% of mean request duration?

Why would I not ?

This brought me to dig deeper into Elasticsearch code, find out more about its code quality, deal with machine endiannes, deal with byte shifting, think how to structure code in Golang and overall enjoy the feeling of touching the bare metal again...

I guess bc theoretically someone is paying you to write applications at a fair clip. Granted, we have no idea what your role is or what your goals of the project are, so we are probably completely wrong at your own role! :)

You are right, I should have probably mentioned I am doing it on my free time and no one is paying me. It's just pure curiosity. :)

Writing a distributed system to talk to Lucene directly might be more rewarding? If it is possible?

So if Elasticsearch is a stringified (JSON/HTTP) wrapper around Lucene with simplified setup for the web app crowd, you are making a binary-fied wrapper around it. Why not skip ES altogether, or use the JSON api?

ElasticSearch is a lot more than just a "stringified wrapper around Lucene". Lucene is used for the underlying inverted indexes, the item store and tokenization/analysis, and that's pretty much it. ES adds clustering, a query DSL, configuration, data mapping system, "river" functionality, HTTP API etc.

The clients actually acts as a cluster node and therefore has knowledge about the cluster state, its indexes and shards, because it receives notifications from it, once it joins.

This allows to execute operations on a specific shard of a specific index on a specific node of the cluster resulting in better performance than going through the HTTP interface.

It can be used to efficiently store big quantities of data, for instance logs, which then can be visualized with Kibana.

It's just unfortunate that Elasticsearch presents the problems mentioned in the article and which I also experience in production, because it has a series of plugins which makes it a good solution for specific use cases.

I'd recommend not using the native binary protocol unless you have proof it makes a substantial difference for your application.

If you need to do bulk work, connection pooling, keep-alive, and batching on the client-side over HTTP can easily vastly exceed what ES cluster can handle. Users of my library have confirmed this.

You could use my library as a guide to the abstract data types, even I don't use the native protocol.


From the article comments:

> seems like the ES team is moving in the right direction with testing this stuff https://github.com/elasticsearch/elasticsearch/commit/ef7593...

I'm not much of a programmer, but it's great to know about this guy. The diversity link someone posted here [0] is inspiring.

Off topic question about Aphyr's website: the stylesheet is linked to only as `<link rel="stylesheet" type="text/css" href="/css" />` How and why does this work?


The /css route is being served with the correct MIME type-- "text/css". That's all that really matters, despite the unconventional name.

This is likely a route that aggregates a collection of CSS automatically and outputs it.

Wow. Not only this guy is good at what he's doing, but he also seems to be a great person.

I'm impressed.

Can someone explain the "Call me maybe" theme/meme? I know it is a song, but what's the relevance here?

Edit: I looked at the archive and found the original post where it is ... erm... explained?

The name of the "Call Me Maybe" artist is Carly Rae Jepsen.

The name of the test framework is Jepsen.

Nodes in distributed systems call each other. Except under partitions, when they call each other... maybe.

The 'Nic' he quotes in the article is me :) This is seriously the highlight in my career folks!

Although to be fair, he quotes me generously. Later on in that discussion I give up my wisdom and get confused again.

I'm in there as well.

Aphyr is amazing and we're very lucky to have him doing this!

Shay says they've already "fixed this" [1] and the aim is to get the fix in 1.3. I'm interested in what the fix actually is.

To quote Shay: "FYI, the improved_zen branch already contains a fix for this issue, we are letting it bake as this is a delicate change, and we are working on adding more test scenarios (aside from the one detailed in this issue) to make sure. The plan is to aim at getting this into 1.3. We have not yet ran Jespen (which simulates the same scenario we already simulate in our test), but we will do it as well."

[1]: https://github.com/elasticsearch/elasticsearch/issues/2488#i...

Elasticsearch is great for um... search. Like, you use it as an index to point to the real system of record and it is not expected to be perfect (or shouldn't' be).

I though it worked out real well when used to alleviate pressure on the database which was being used for search results (which were sometimes ajax live search style). The big benefit was our database usage went down and search was better/more reliable.

The other big benefit is if elastic search goes down search stops working. That is FAR better than if elastic search goes down the whole database and site stops working.

At a big enough scale, with thousands of dollars in transactions every day, the database can't go down. Search can break gracefully, but the spice must flow (so to speak).

From the article:

    > The good news is that Elasticsearch is a search engine, and you can often
    > afford the loss of search results for a while. Consider tolerating data
    > loss; Elasticsearch may still be the best fit for your problem.

There are a great many businesses whose entire model is centered on search working, and therefore search breaking is a major problem.

These problems are a major reason why I decided to not go with the typical 1 cluster-multiple replica infrastructure.

Instead I have multiple clusters, 0 replicas, and load balance against those clusters. No split brain problem, but the same data availability benefits as having replicas. Granted, the logic is all in my app now. Now when a cluster is down, I have retry logic to keep reindexing that data to that cluster until it succeeds.

Split brain problems are not related to having a replica, it can happen when any nodes in the cluster cannot talk to each other, and therefore opens the possibility of multiple masters. Having multiple clusters just means you can have multiple things in split brain, unless you have protection within your clustering stack against this outcome.

Doesn't that mean that you immediately lose data if a node goes down or a disk dies?

FWIW the split brain problems (while not impossible) are extremely rare in ES in my experience.

Seems like you've traded one potentially rare problem for a lot more complexity and other problems.

I'm using Sidekiq, a messaging queue framework to handle indexing data to my nodes/clusters. So if one goes down, it will always retry reindexing in a future time over and over until its successful. Everytime there's new data, I need to send the same data to all of the clusters.

Interesting. Were you actually seeing failures without running multiple clusters?

Informative , started looking into ES since past week for indexing , really like the ease of use in ES , but definitely there are few points to be kept in mind.

I am a happy user of ElasticSearch I always thought that it would pass the "Call me maybe" test with flying colors...Well, not

whats up with the OP and barbie memes?

aphyr is homosexual and isn't afraid to let the world know[1]. I think it is just his own humor. I actually find it pretty hilarious how often he changes his twitter picture and the strange things he switches them to.

It is just a quirk of an incredibly brilliant person, which I personally find hilarious. That being said, he has a great point of view on non-tech things and is a super humble guy. As quoted from the movie Van Wilder, "You shouldn't take life so seriously, you'll never get out alive!"


lol lol lol lol lol lol lol lol lol @aphyr #homosexual #barbies #makessense

What exactly are you trying to say?

The correlation between being homosexual and liking Barbie dolls, along with the Twitter conversion linked, is hilarious.

As a gay-friendly straight man, I am puzzled by the idea that those gifs are supposed to reflect the author's sexuality. Was that really the author's intent? Serious question. Until I read this answer I did indeed assume that they were "just a quirk of an incredibly brilliant person".

I don't know, I don't judge either way. The way the guy worded that question reminded me of someone I spoke with a few months ago about west Hollywood, so it was more important in my head than it actually is reflected in the article. All of the posts have hilarious memes and are worded in memes. Some people need to realize the internet isn't serious business. This work with Jepesen is truly incredible however. I'm so glad someone is doing it. The end result is that a lot of the fail will be fixed or at least thought about so end users have mitigation strategies. Everyone wins here.


I just figured that it's his way of saying that if you're building distributed systems that lose prodigious amounts of data because you (a) don't implement the right algorithms, and (b) don't appear to have been influenced by the literature, then you might as well be playing with dolls.

IOW, his version of "databases are hard; let's go shopping"

I am probably just an aging fuddy duddy, but the animated GIFs make me much less likely to share this article with my team.

Is this sarcasm or are you actually being serious? If you're being serious, why are you letting such absurdly superficial issues determine whether to share a genuinely informative and intereseting article?

Someone could ask the same but backward,why waste a good article with childish animated gifs.

Because they're hilarious, that's why. How about having a sense of humor?

When the CERN scientists presented their Higgs boson findings in Comic Sans did you dismiss them because, Comic Sans\?

The gifs aren't necessary, but I found them refreshingly amusing. This article isn't for a professional press release for a scientific journal, it's aphyr's blog.

Why so serious?


Someone spends hundreds of hours of their own time working on a blog post that is amazingly informative and you aren't going to share it because they put some fun in it.

If you work in a conservative shop, this is a real concern – there are still IT shops which have formal dress codes, etc. and managers who frown on anything like this as a sign of unproductive. Yes, that's pointless and wrong but that doesn't mean that there aren't people who have to worry about it.

It doesn't mean that aphyr was wrong to include them – if you do the hard work you can style it however you like – but it wouldn't surprise me to learn that some people do something like run it through instapaper before circulating it to the button-down set.

Personally I thought it was hilarious:

"I lost, like 27 pound."

"Oh my god, what's your secret?"

"... I had my arms ripped off."

"Oh, right."

I read this strictly for the animated gifs.

I'm curious what industry you're in. My impression is that most engineering teams wouldn't have a problem with the irreverence, but that teams in some industries (banking? health care? defense?) have more of a need to look like Serious Grown Ups.

I work in a large, conservative bank. I've already shared this article with the team. If information from this article was to be messaged upwards, that's where it gets placed into a standard branded PowerPoint and all personality/"fun" is removed.

Do you have no comment on the (brilliant, imho) substance of the article, just the presentation?

I don't, except to note that bad presentation is way more annoying when paired with high-quality content. If it was just about anyone else, I'd have closed the tab.

If enough people complain this time, maybe I won't have to waste a couple minutes clicking "Inspect Element -> display: none" on all the animated junk before reading the next article.

Subjective criticism is subjective.

Yes, you are just an aging fuddy duddy.


The gifs are annoying when one is trying to focus on the text and sees movement out of their peripheral vision.

I mean, that seems like overkill. I'd have advised a quick DOM edit (or an Adblock rule if he's planning on perusing that site often)

whining about gifs seems like overkill

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact