It's been improving lately though, "going big" helped ES here.
My personal favourite: an issue from 3 years ago, where ElasticSearch returns incorrect facet counts (as in, fundamentally BROKEN faceting). Still unresolved: https://github.com/elasticsearch/elasticsearch/issues/1305
I hope this doesn't sound like a knock on open source technology, but when a company's only revenue stream is providing training and support for an open source framework, doesn't writing laconic documentation seem to be in their interest?
And if a company's only revenue stream is monthly or annual subscriptions, doesn't that mean firing all the developers and letting the product stagnate as money keeps flowing in is in their interests?
Want to tune it to actually work for your needs? Trying to figure out how to use our 'RAMOptimization' library? Better start buying some training!
"Some people actually advocate using Elasticsearch as a primary data store; I think this is somewhat less than advisable at present."
"The good news is that Elasticsearch is a search engine, and you can often afford the loss of search results for a while."
My personal favorite solution would reliably channel data from a Riak cluster to an ES cluster. Anyone knows if there is something like that out there?
(Full disclaimer: I work for Basho.)
Edit: I shall add that the most interesting search problems are the ones where you need to join separate data sources, and in such cases it is not really the question of what kind of search solution you are using, rather what kind of async queue and data update you have. So the separated cluster is really about having a distributed queue between the 'data-master' and the 'search-master'.
I tried to use the Elasticsearch thrift plugin but unfortunately it does not work for the version 1.1 and 1.2
So basically I have to inspect each and every byte of each and every request and response in order to be able to send or parse data.
While developing the client a managed several time to crash the Elasticsearch server by sending malformed packets. In addition, this, brought me to review the networking part of Elasticsearch code and I think it needs a refactoring and a better, deeper and cleaner usage of Netty.
I hope they will soon sort out this and the problems mentioned in the article, since I think that Elasticsearch is really an amazing product!
> I am currently writing a Golang client for Elasticsearch
> which uses the native binary protocol
This brought me to dig deeper into Elasticsearch code, find out more about its code quality, deal with machine endiannes, deal with byte shifting, think how to structure code in Golang and overall enjoy the feeling of touching the bare metal again...
This allows to execute operations on a specific shard of a specific index on a specific node of the cluster resulting in better performance than going through the HTTP interface.
It can be used to efficiently store big quantities of data, for instance logs, which then can be visualized with Kibana.
It's just unfortunate that Elasticsearch presents the problems mentioned in the article and which I also experience in production, because it has a series of plugins which makes it a good solution for specific use cases.
If you need to do bulk work, connection pooling, keep-alive, and batching on the client-side over HTTP can easily vastly exceed what ES cluster can handle. Users of my library have confirmed this.
You could use my library as a guide to the abstract data types, even I don't use the native protocol.
> seems like the ES team is moving in the right direction with testing this stuff https://github.com/elasticsearch/elasticsearch/commit/ef7593...
Off topic question about Aphyr's website: the stylesheet is linked to only as
`<link rel="stylesheet" type="text/css" href="/css" />`
How and why does this work?
This is likely a route that aggregates a collection of CSS automatically and outputs it.
Edit: I looked at the archive and found the original post where it is ... erm... explained?
The name of the test framework is Jepsen.
Nodes in distributed systems call each other. Except under partitions, when they call each other... maybe.
Although to be fair, he quotes me generously. Later on in that discussion I give up my wisdom and get confused again.
To quote Shay:
"FYI, the improved_zen branch already contains a fix for this issue, we are letting it bake as this is a delicate change, and we are working on adding more test scenarios (aside from the one detailed in this issue) to make sure. The plan is to aim at getting this into 1.3. We have not yet ran Jespen (which simulates the same scenario we already simulate in our test), but we will do it as well."
I though it worked out real well when used to alleviate pressure on the database which was being used for search results (which were sometimes ajax live search style). The big benefit was our database usage went down and search was better/more reliable.
The other big benefit is if elastic search goes down search stops working. That is FAR better than if elastic search goes down the whole database and site stops working.
At a big enough scale, with thousands of dollars in transactions every day, the database can't go down. Search can break gracefully, but the spice must flow (so to speak).
> The good news is that Elasticsearch is a search engine, and you can often
> afford the loss of search results for a while. Consider tolerating data
> loss; Elasticsearch may still be the best fit for your problem.
Instead I have multiple clusters, 0 replicas, and load balance against those clusters. No split brain problem, but the same data availability benefits as having replicas. Granted, the logic is all in my app now. Now when a cluster is down, I have retry logic to keep reindexing that data to that cluster until it succeeds.
FWIW the split brain problems (while not impossible) are extremely rare in ES in my experience.
Seems like you've traded one potentially rare problem for a lot more complexity and other problems.
It is just a quirk of an incredibly brilliant person, which I personally find hilarious. That being said, he has a great point of view on non-tech things and is a super humble guy. As quoted from the movie Van Wilder, "You shouldn't take life so seriously, you'll never get out alive!"
IOW, his version of "databases are hard; let's go shopping"
The gifs aren't necessary, but I found them refreshingly amusing. This article isn't for a professional press release for a scientific journal, it's aphyr's blog.
Why so serious?
It doesn't mean that aphyr was wrong to include them – if you do the hard work you can style it however you like – but it wouldn't surprise me to learn that some people do something like run it through instapaper before circulating it to the button-down set.
"I lost, like 27 pound."
"Oh my god, what's your secret?"
"... I had my arms ripped off."
If enough people complain this time, maybe I won't have to waste a couple minutes clicking "Inspect Element -> display: none" on all the animated junk before reading the next article.