Hacker News new | past | comments | ask | show | jobs | submit login
Guide to running Elasticsearch in production (facinating.tech)
684 points by thunderbong 38 days ago | hide | past | web | favorite | 158 comments



Mostly good stuff but a few comments:

- article doesn’t clarify if it’s on hardware or VMs

- 140 shards per node is certainly on the low side, one can easily scale to 500+ per node (if most shards are small, typically power law distribution)

- more RAM is better, and there is a ratio of disk:ram that you need to keep in mind (30-40 for hot data, 200-300 for warm data)

- heaps beyond 32g can be beneficial but you’d have to go for 64g+, 32-48g is a dead zone

- not a single line about GC tuning (I find default CMS to be quite horrible even in recommended ~31g sizes)

- CPUs are often a bottleneck when using SSD drives


Is your list not entirely depending on the usecase? I am using ES for years with over 1 million daily users. It provides simple search funtionality. it runs on a single node with 4gb of memory. For more than 5 years with hardly any issues.


If the dataset it fits on one node - good. You might still be missing out on reliability (should that node fail or just plain restart).

Simple catalogs and/or web pages search usually fit in RAM, so the advice of disk:ram becomes less relevant, also the type of drive would be mostly irrelevant.


What is your use case and/or dataset size if I may ask?


And another note on shards - indexing a shard is a single writer process.

If your drive tolerates parallel writes well (=SSD) having multiple primary shards per node helps scale indexing.


Jason from Elastic here.

A shard can handle concurrent writes from clients, there are multiple write threads scaling to the number of logical processors exposed to Elasticsearch, and we have invested in making Lucene accommodate this concurrency[0][1][2].

[0]: http://blog.mikemccandless.com/2011/05/265-indexing-speedup-... [1]: https://issues.apache.org/jira/browse/LUCENE-3023 [2]: http://blog.mikemccandless.com/2017/07/lucene-gets-concurren...

Disclaimer: I am an engineer on the Elasticsearch team; I welcome any and all feedback.


I stand corrected then. I’d certainly need to check my measurements on this one.


I'm kind of surprised this article doesn't mention anything about how many nodes you want in your cluster. Since ES performance starts to degrade once you get past 40 or so nodes .


Certainly a problem with 100-s of nodes. One thing is that GC pauses in big clusters become more problematic as the chance to hit stop-the-world pause on at least one node for every request is getting higher.

Going beyond dozens takes a fair amount of tuning and trouble-shooting.


Serious question: does indexing Logstash/JSON logs really need to take gigabytes of memory + disk and sharding?


No. ELK is a slow and expensive way to store and retrieve logs. The reason people use it is that nothing else exists. (I was blown away when I started using it at my last job. I used the fluentd Kubernetes daemonset to extract logs from k8s and into ES... and the "cat a file and send it over the network" thing uses 300+MB of RAM per node. There is an alternate daemon that can be used now, but wow. 300MB to tail some files and parse JSON.)

I think a better strategy is to store logs in flat files with several replicas. Do your metric generation in realtime, regexing a bunch of logs on a bunch of workers as they come in. (I handled > 250MB/s on less than one Google production machine, though did eventually shard it up for better schedulability and disaster resilience. Also those 10Gb NICs start to feel slow when a bunch of log sources come back after a power outage!)

For simple lookups like "show me all the logs in the last 5 minutes", you can maintain an index of timestamp -> log file in a standard database, and do additional filtering on whatever is retrieving the files. You can also probably afford to index other things, like x-request-id and maybe a trigram index of messages, and actually be able to debug full request cycles in a handful of milliseconds when necessary. For complicated queries, you can just mapreduce it. After using ES, you will also be impressed at how fast grepping a flat file for your search term is.

The problem is, the machinery to do this easily doesn't exist. Everything is designed for average performance at large scale, instead of great performance at medium scale. Someday I plan to fix this, but I just don't see a business case, so it's low priority. Would you fund someone who can make your log queries faster? Nope. "Next time we won't have a production issue that the logs will help debug." And so, there's nothing good.


> The reason people use it is that nothing else exists.

Maybe https://github.com/grafana/loki , but haven't yet tried it.

(Or https://github.com/phaistos-networks/TANK ..?)

> I think a better strategy is to store logs in flat files with several replicas

Agreed. We just used beats + logstash and put the files into Ceph.

> x-request-id and maybe a trigram index of messages, and actually be able to debug full request cycles in a handful of milliseconds when necessary.

Yes, yes, yes. That would be great.


I set up Loki on my Kubernetes cluster last night, as I've been meaning to try it and this was a good excuse.

Basically, it appears that you can do very minimal tagging of log data at ingestion time (done via promtail, requiring a restart for every config change just like fluentd), but not any after-the-fact searching. I didn't play with it at all because I've been down that road before; parsing logs is hard, and you need an interactive editor over the full history to get it right. (Some fun examples... I use zap for logging, which emits JSON structured logs. But... I also interact with the Kubernetes API, which has their own logger. When something bad happens, instead of returning an error, it just prints unstructured logs to the log file. So you'll have to handle that if you're writing a parser. nginx does this too; you can configure it for JSON, but sometimes it prints non-JSON lines. What?)

Loki is good for getting your logs off the rotated-every-10MB "kubect logs" pipeline... but it doesn't really help with after-the-fact debugging unless you really want to read all the logs, in which case you're back to grep.

I am getting more and more motivated to do something. At the very least, Loki's log storage itself seems pretty okay; put logs in, get logs back, so it saves me from having to write that part at least.


Curious why ceph?


Loki and similar solutions are leveraging object storage (not just Ceph) as a way to store chunks of logs relatively cheaply, and scale performance. Loki can work on a single system, storing logs on the local filesystem, but that will eventually become an availability or performance bottleneck. Putting logs in object storage allows multiple systems to store/query/etc.


It was already up and running. Provides the bare minimum I want from a storage thingie.

What would you use/recommend?


Flat files and grep works to a level. For huge datasets it can be _really_ hard to answer some questions using pipes, grep, awk, etc. that something like a structured query makes pretty simple.

We tried using CloudWatch logs insights at work and I was blown away by how fast it was on indexed data (we saw 10+ GB/s searches across a few hours of logs. Only problem is that it was prohibitively expensive so we ended up not going with it.

Biggest thing for me is that I don't want to own a log searching service/software. My customers don't see any benefit from me investing my time in a better log searching platform than what is already available out there IMO. I want to let someone who is an expert on querying logs to solve that problem so that I can solve my specific problem.


Yeah, being able to do structured queries quickly is the key. I don't think Elasticsearch is actually that great at that; it really feels designed for fulltext searches and suffers in terms of speed when dealing with the semi-structured nature of logs. (Overall, the cardinality of log messages is not as high as you'd think, but ES doesn't know this.)

I also agree that operating your own log search is kind of a pain. Often the node sizes are much larger than what you're using for your actual application. I wrote a bunch of go apps and they use 30MB of RAM each, then you read an article about making Elasticsearch work and find that you suddenly need 5 nodes with 64G of RAM each... when you can fit a week of log data in just one of those node's RAM. It's hard to get excited about paying for it when it's your own node, and it's even harder when someone else offers it as a service (because that RAM ain't free).

That is why I like some sort of mapreduce setup; you can allocate a small amount of RAM and a large amount of disk on each of your nodes, and these queries can use excess capacity that you have laying around. When your data gets big enough to need 320GB of RAM... you can still use the same code. You just buy more RAM.

Basically, the design of Elasticsearch confuses me. It's designed to be a globally-consistent replicated datastore with Lucene indexing. How either of those help with logs confuses me. You write your logfile to 3 nodes... if one of them blows up, now you only have two copies. A batch job can get the replication back at its leisure, if you really care. But in all honesty, you were just going to delete the data in a week anyway. (If you're retaining logs for some sort of compliance reason, then you do probably want real consistency. But you can trim down the logs to the data you need in real time, and write them to a real database in an indexable/searchable format.)


If you are going for compliance, don't forget ES is not a database - there are notable edge cases where it drops data.



Jason from Elastic here.

It is indeed good reading, but as you say, old. Since then, we have invested tremendously in the [data replication][0] and [cluster coordination][1] subsystems, to the point that we have closed the issues that Kyle had opened. We have fixed all known issues related to divergence and lost updates of documents, and now have [formal models of our core algorithms][2]. The remaining issue along these lines is that of [dirty reads][3], which we [document][4] and have plans to address (sorry, no timeline, it's a matter of prioritization). Please do check out our [resiliency status page][5] if you're interested in the following these topics more closely.

Thanks for all of the feedback in this entire thread.

[0]: https://github.com/elastic/elasticsearch/issues/10708 [1]: https://github.com/elastic/elasticsearch/issues/32006 [2]: https://github.com/elastic/elasticsearch-formal-models [3]: https://github.com/elastic/elasticsearch/issues/52400 [4]: https://www.elastic.co/guide/en/elasticsearch/reference/7.6/... [5]: https://www.elastic.co/guide/en/elasticsearch/resiliency/cur...

Disclaimer: I am an engineer on the Elasticsearch team; I welcome any and all feedback.


I'm always happier when I see a follow up analysis by the Jepsen team, as a third party verification that the issues have been fixed and no major new ones introduced. Any chance Elastic is going to contract out to them for a follow up?


(Setting aside the fact that Logstash is JRuby/Java app easily eating said gigabytes of heap)

Do JSON logs take gigabytes?

If they do for you then yes, gigabytes of disk and memory are pretty much guaranteed. Also things tend to pile up with time (even on a few weeks horizon).

In all honesty, I believe a finely crafted native code solution for this problem could achieve x3 less ram usage and x2-3 indexing/search performance. Going beyond that is also possible but will take remarkable engineering.

Update: to expand on the last point, c++ solutions are typically closed sourced. Rust and Go both have interesting open-source full text engines: https://github.com/tantivy-search/tantivy https://github.com/blevesearch/bleve

In near future I totally see someone producing a great open-source distributed search project that is at least on par with today’s ES core feature set.


Not sure if you are aware of this. We ran this at Y!

https://vespa.ai/


Looks awesome.

ES is good but it takes a small army to keep on top of the performance and management. And then there's the upgrades which will fix a few bugs and introduce new ones too.


How would you compare Vespa to ES when it comes to CPU/memory requirements and performance for something like log indexing and search use case?


Seen that and starred probably half a year ago. Never had the time to dig around and see what it’s like in perf, operations and scalability.


Saw 4 petabyes in it.

Cant remember the hosts number exactly but it was several hundred bare metal servers of various sizes

Behind Y! Groups


> I believe a finely crafted native code solution for this problem could achieve x3 less ram usage and x2-3 indexing/search performance

Many people believe that about everything written in Java. The story often ends up a lot more complicated though. In the domains where it shines Java is surprisingly hard to beat without very significant effort. And then you must also contemplate what they same significant effort would achieve if directed at the Java solution or cherry picking particular performance hotspots out of it.


I’m basing it on my limited experience of rewriting things from optimized but messy C++/D to JVM. Yes, the end result is much simpler but is memory hungry and is ~2x slower (after optimizations and tuning). Sometimes you can fit Java in less then ~2x of original footprint but at the cost of burning CPU on frequent GC cycles.

Not every application is the same but the moment it involves manipulating a lot of data in Java you fight the platform to get back the control you require for those things. And by the end of day there is a limit at which reasonable people stop and just give up dodging memory allocations, creatively reusing objects, wrangling off-heap memory without the help of type system.


One project worth keeping an eye on is Loki (https://github.com/grafana/loki), which eschews full text search for more basic indexing off of "labels", ie it works a lot like prometheus.

There's a writeup on the differences with the EFK stack here: https://github.com/grafana/loki/blob/master/docs/overview/co...

After working with a client for multiple years continually hitting bottlenecks and complexity with the EFK stack, I'm really looking forward to something different.


Downside is that Loki can't handle high-cardinality labels, such as a `Request-Id`, IP-addresses or user-ids.

I'd be happy to move from ELK to Loki, but no option to filter on IP-address or User-IDs is a big drawback.

It's an open issue though, so may be added in the future: https://github.com/grafana/loki/issues/91


Yep, I can confirm the GC part. We start with that before touching anything else to get the most out of the system. G1GC is pretty tunable.


What kind of performance improvements have you achieved through GC tuning? I'm surprised it's significant enough to start there. Might be something I should look more closely at.

Also, have you looked at Shenandoah or ZGC?


Hi, do you have real use experience running elasticsearch with 64g+ heap?

Is there any articles/benchmark/notes or anything that you would be willing to share?

We have considered trying out 64g+ heaps for our cluster but we are concerned about very long gc pauses impacting the search performance.


Currently we are running ES on big iron in production.

There are both advantages and disadvantages. On plus side:

- easier to manage several big machines than shitload of small instances potentially on top of virtualization solution (debugging overlay network etc. at night is not fun)

- nodes are more resilient to big requests (both big bulk indexing and resource-heavy searches). Your risk of hitting sudden OOM is practically zero (although ES does have circuit-breakers that try to prevent processing requests that would cause OOM)

- while not having compressed oops is sad, not having multiple copies of JIT and compiled code cache etc. is a plus

- using a modern concurrent GC is more sensible on bigger heaps. G1 is actually fine on 64g. Some of us are running Shenandoah on Java 13 in production, I’m looking forward to apply it on my clusters (or get back to testing ZGC).

Disadvantages:

- NUMA. Try to pick more recent Java as it’s tuned to run better on NUMA machines, still not all of JVM is NUMA-aware

- fault domain is larger and getting hot node up to speed (in-sync) after restart can easily take about an hour. Getting it up from empty storage going to take a bloody while

- elastic.co folks seem to have recommended settings for lots of small nodes not big ones, so you are on your own to discover proper limits for every setting


Thanks for this info!

Doing a quick googling on elasticsearch and ZGC I just found this https://github.com/elastic/elasticsearch/issues/44321 where the official response from elastic is that only G1GC and CMS are tested and supported.

I still think that it will be worth trying Shenandoah or ZGC if/when we ever get down to do some experiments with larger heaps.


Last time I ran with ZGC the main problem was that ES OOM circuit-breakers are going nuts since they are not prepared to have memory mmaped 3 times. So their calculations are off by a factor of 3 blocking most of requests for no good reason.

Running with oom circuit-breakers disabled is something I’m currently considering.


Java now has pausless GC.


It’s highly concurrent but not pauseless


Most production systems are barely on JDK 8. G1GC is the most used GC for high-performance production systems at many large companies (1B+ USD revenue) I have worked for.


I highly recommend trying out Java 11. G1 got quite a few improvements since 8.

In particular full GC is parallel since 10: https://openjdk.java.net/jeps/307


Thanks, trying out is not an option for so many things. It is up to vendors to decide which JVM to recommend and I cannot overrule them. As of personal use, I am on 11.


> 140 shards per node is certainly on the low side

That seems high, no?

Unless you were planning on scaling 20x, it seems you could easily have half the number.


The total number of shards per node, across all indices.

As far as number of shards per index goes there are many considerations some good ones are outlined in the article.


> - heaps beyond 32g can be beneficial but you’d have to go for 64g+, 32-48g is a dead zone

I'm curious why that is the case?


If the heap is under 32GB, the JVM can use "compressed oops" with 4 byte "pointers" to 8 byte offsets inside the 32GB heap instead of 8 byte pointers. This makes everything more compact. Since Java doesn't have structs, everything is a pointer and arrays of things are arrays of pointers. So a 48GB heap won't fit a lot more data than a 32GB heap.

See

https://shipilev.net/jvm/anatomy-quarks/23-compressed-refere...

https://wiki.openjdk.java.net/display/HotSpot/CompressedOops


Java compressed object pointers.

Shipilev (JVM performance engineer) explains it in great detail here: https://shipilev.net/jvm/anatomy-quarks/23-compressed-refere...

But the short of it: Java uses 32-bit numbers for object references if it can address whole heap with it. Given default 8-byte alignment of object we have 2^32 * 8 = 32g of addressable heap.

Once we are beyond that number 64-bit references are used and suddenly we are wasting a lot of heap space (Java is very much objects everywhere). Usually up to around 40% of heap is references.


Si it’s better to have 2 nodes with 31gb each than one with 64gb?


See my other reply on big iron vs small instances (as in 31g of heap “small”).

In short there are many variables at play, but without any context generally I’d recommend trying many small instances first. Now depending on your hardware, scale and application things may easily change in favor of big iron.


well, I was thinking more in using more than one ES instances on the same machine listening on different ports, but your points on the other reply still apply.


I actually did that once - splitting one NUMA machine into 2 ES instances each isolated to its own socket (numactl etc).

Sadly at the time there was a big perf problem with indexingdocuments with 20kb+ binary blobs (non-indexed field) and so the change to 2 nodes vs one did have little noticeable effect on the throughput benchmarks I did with our data back then (~2015 ES 1.4) In the end I dropped this split and focused on the other bigger problems.

Would love to revisit this exercise with more of free time and better tools.


> Usually up to around 40% of heap is references.

This blows my mind. I figured it would be non-trivial, but never expected that much.


Thanks, I somehow thought that was at smaller sizes but makes sense of course.


The article isn't that good because it mostly just verbatim repeats (some of) the information in the official documentation but sadly mixes it with a lot of things that are simply not correct/misunderstood. Also it omits a lot of stuff that is actually important.

The hierarchy breakdown in the article is misleading. Lucene indexes fields, not documents. Understanding this is key. More fields == more files. Segments are per field not per ES index. A lucene index is not the same as an Elasticsearch index.

Segments are not immutable but an append only file structure. Lucene creates a new segment every time you create a new writer instance or when the lucene index is committed, which is something that happens every second in ES by default and something you could configure to something higher. So, new segment files are created frequently but not on a per document basis. ES/Lucene indeed constantly merge segment files as an optimization. Force merge is not something you should need to do often and certainly not while it is writing heavily. A good practice with log files is to do this after you roll over your indices. With modern setups, you should be reading up on index life cycle management (ILM) to manage this for you.

The notion that ES crashes at ~140 shards is complete bullshit that is based on a misunderstanding of the above. It depends on what's in those shards (i.e. how many fields). Each field has its own sets of files for storing the reverse index, field data, etc. So, how many shards your cluster can handle depends on how your data is structured and how many of them you have. This also means you needs lots of file handles.

Understanding how heap memory is used in ES is key and this article does not mention the notion of memory mapped files and even goes as far as to recommend the filesystem cache is not important!! This too is very misguided. The reality is that most index files are memory mapped files (i.e. not stored in the heap) and they only fit in memory if you have enough file cache memory available. Heap memory is used for other things (e.g. the query cache, write buffers, small data-structures with metadata about fields, etc.) and there are a lot of settings to control that which you might want to familiarize yourself with if you are experiencing throughput issues. Per index heap overhead is actually comparatively modest. I've had clusters with 1000+ shards with far less memory. This is not a problem if those shards are smallish.

The 32GB memory limit is indeed real if you use compressed pointers (which you need to configure on the JVM, ES does this by default). Far more important is that garbage collect performance tends to suffer with larger heaps because it has more stuff to do. The default heap settings with ES are not great for large heaps. ES recommends having at least (as a minimum) half your RAM available for caching. More is better. Having more disk than filecache means your files don't fit into ram. That can be OK for write heavy setups and might be OK for some querying (depending on which fields you actually query on). But generally having everything fit into memory results in more predictable performance.

GC tuning is a bit of a black art and unless you have mastered that, don't even think about messing with the GC settings in ES. It's one of those things where copy pasting some settings somebody else came up with can have all sorts of negative consequences. Most clusters that lose data do so because of GC pauses cause nodes to drop out of the cluster. Mis-configuring this makes that more likely.

CPU is important because ES uses CPU and threadpools for a lot of things and is very good at e.g. concurrently writing to multiple segments concurrently. Most of these threadpools configure based on the number of available CPUs and can be controlled via settings that have sane defaults. Also, depending on how you set up your mappings (another thing this article does not talk about) your write performance can be CPU intensive. E.g. geospatial fields involve a bit of number crunching and some more advanced text analysis can also suck up CPU.


> The 32GB memory limit is indeed real if you use compressed pointers (which you need to configure on the JVM, ES does this by default).

But the article makes is sound like you should stay below 32G of heap to avoid using oop compression, and this is completely wrong. Setting -Xmx over 32G on a 64 bit architecture disables oop compression by default, and setting it lower enables it. The official ES docs also recommend you keep your heap below 32G, but they say you should do this to ensure oop compression is on: https://www.elastic.co/guide/en/elasticsearch/reference/curr....


>The hierarchy breakdown in the article is misleading. Lucene indexes fields, not documents. Understanding this is key. More fields == more files. Segments are per field not per ES index. A lucene index is not the same as an Elasticsearch index.

Unless I'm mistaken, segments are not per field. A lucene index might be broken into segments depending on your merge policy, but the segment file structure allows it to be an effectively separate lucene index that you can search on and contains data for all the index fields.

And I too find the fact that the term "index" is reused by ES to mean a collection of lucene indexes. Once someone wants to dive deeper it only creates ambiguity.


Does it still have the issue of failing "open to the world" if the x-pack trial expires? See: https://discuss.elastic.co/t/ransom-attack-on-elasticsearch-...


As an alternative, there's an open-source OpenDistro for ElasticSearch [1] that offers X-Pack-like security with some other X-Pack-like features. Although it is not officially supported by elastic.co, but it's a pretty good alternative and is supported by Netflix, Amazon, et all. Worth giving a try.

[1] https://opendistro.github.io/for-elasticsearch/


> ... supported by Netflix, Amazon, et all

So in case I'm stuck I can pay Netflix to help me?


X-Pack Auth is now free and included with Elasticsearch.


According to the docs, it's a 30-Day trial: https://www.elastic.co/guide/en/elasticsearch/reference/curr...

Are these docs out-of-date ? Does anyone know if any features remain after 30 days if you do not subscribe ?


”If you want to try all of the X-Pack features”

It’s quite confusing. There are different tiers of X-Pack: https://www.elastic.co/subscriptions

The Basic tier, which is free, supports simple username and password authentication.


Yeah, I read this link also - before posting above. There is no 'username' or 'password' term on that page - nor any positive checkmark in any security column for the open-source tier.

'Basic' tier does have 'File and Native Authentication'. But it is far from clear what that means.

More importantly, in several different places that _are_ clear about security in the Elastic documentation, it repeatedly says "there is no security", "assume anyone who can reach elastic is a superuser" ... and more ...

So if that is not true, the documentation should probably change ...


(Disclosure, I work for Elastic)

It's definitely complicated, and can be quite confusing due to the number of subscription tiers, the ambiguity around terminology, and historical documentation.

We need need to find a way to bring more clarity to the documentation, and we try, but the subscriptions page, in particular, is very difficult. It's already very long, so we don't want to add detailed explanations to individual points but it's hard to find short sentences that are both accurate and well understood by a variety of audiences.

To be clear:

- There is no security in Open Source elasticsearch.

- There is security in the free "basic" license. It is, at the time of writing, disabled by default.

- Early versions of Elasticsearch had no security at all.

- The security product that we (Elastic) produced was exclusively a paid feature for many years

- The core security features (authentication, users management, role based access control) have been free in the basic license since May of last year. https://www.elastic.co/blog/security-for-elasticsearch-is-no...

- If you download the latest version of the Elastic-licensed distribution of Elasticsearch (which is the default download if you get it from our website or package repositories), you get a version on which you can enable security, free of charge, without needing to register, with no expiry.

The only documentation I found which says "there is no security", etc is from old blog articles (e.g. this one from 2013 https://www.elastic.co/de/blog/found-elasticsearch-security). We don't do a great job of indicating that the information on those articles is out of date.


Thank you for chiming in and clarifying. I was thinking of that 2013 page specifically - good to know it is now out of date.


PS: I work at Elastic.co.

At any point of time, if you feel that docs are not appropriate kindly raise a issue in https://github.com/elastic/docs. You could also consider contributing to docs. Appreciate it!

Other than that, we have a very live discourse forum. You can also put up all sort of questions discuss.elastic.co.

Still want a more real time chat, you can join slack group too ela.st/slack.


I haven't seen anywhere in the documentation that claims "assume anyone who can reach elastic is a superuser".

The Elasticsearch Security documentation appears to be up to date and has notes on if certain features depend on a subscription. https://www.elastic.co/guide/en/elasticsearch/reference/curr...


"Elasticsearch has no concept of a user. Essentially, anyone that can send arbitrary requests to your cluster is a “super user”. "

https://www.elastic.co/de/blog/found-elasticsearch-security

This document says it twice verbatim - once as an emphasizes blurb of its own. It is also re-emphasized it in several other ways.

(same link as also posted by the Elasticsearch employee above).


I'm the lead engineer for Elasticsearch security.

I'm sorry - I have never seen that specific post on our forums before, but the poster was mistaken and by not correcting it at the time, we have allowed incorrect information to perpetuate.

In versions where security was a paid feature, if a trial license expired, security would remain enabled, but certain operations would be rejected for all users (per the warning text "Cluster health, cluster stats and indices stats operations are blocked")

We intentionally did not open the cluster to be world readable/writable. The administrator would be left with a cluster that was secure, but blocked access to some functions that are necessary for running a production cluster. It was up to them to explicitly upgrade to a paid license and re-enable those APIs, or downgrade to a "basic" license which required acknowledgement that security would be disabled.

An example (this is from 6.7.0 because it's the newest version I have installed at the moment, where security was not free. This was true at the time the original forum post was written - I just tested 5.2.0 as well, and the results are the same, with slightly different error messages):

License state:

   license [fc5bee69-f086-4989-a32a-5db329692363] mode [trial] - valid
  license [fc5bee69-f086-4989-a32a-5db329692363] - expired
  recovered [1] indices into cluster_state
  LICENSE [EXPIRED] ON [SATURDAY, SEPTEMBER 28, 2019].

Curl without credentials:

  curl http://localhost:9200/
  {"error":{"root_cause":[{"type":"security_exception","reason":"missing authentication token for REST request [/]","header":{"WWW-Authenticate":["ApiKey","Basic realm=\"security\" charset=\"UTF-8\""]}}],"type":"security_exception","reason":"missing authentication token for REST request [/]","header":{"WWW-Authenticate":["ApiKey","Basic realm=\"security\" charset=\"UTF-8\""]}},"status":401}%
Curl with credentials:

  curl -u elastic http://localhost:9200/
  Enter host password for user 'elastic':
  {
    "name" : "node1",
    "cluster_name" : "es-670",
    ...
Blocked cluster health:

  curl -u elastic http://localhost:9200/_cluster/health
  Enter host password for user 'elastic':
  {"error":{"root_cause":[{"type":"security_exception","reason":"current license is non-compliant for [security]","license.expired.feature":"security"}],"type":"security_exception","reason":"current license is non-compliant for [security]","license.expired.feature":"security"},"status":403}
Logs:

  blocking [indices:monitor/stats] operation due to expired license. ...

However, as mentioned by other sibling comments, security has been included in the free license since May last year, so as far as security is concerned, there is no longer a choice to make when a trial expires.

(Disclosure, as mentioned at the top, I work for Elastic)


We were hit by this on Kibana 6.x as I didn't read the x-pack trial properly. I thought at least login would be there. My bad. We added Nginx auth after that.

But x-pack security is actually free from some point release in version 7. Though x-pack is not open source, just free to use. So our nginx Kibana auth is still there.


Xpack's basic features come builtin in latest Elasticsearch - trial is for a separate set of features.


Unsecured Elasticsearch servers have been implicated in multiple breaches in recent months [1][2]. Since this post is an "In depth guide to running Elasticsearch in production,” it should prominently include information related to security and configuration. With tools like these where there is a learning curve for new users, security can end up treated as an afterthought, leading to these kinds of breaches.

1. https://www.pandasecurity.com/mediacenter/news/billion-consu...

2. https://thedefenceworks.com/blog/250-million-microsoft-recor...

Edited for clarity


This guide is clearly intended to focus on the ops-side of ElasticSearch. No one is being irresponsible, you're basically just complaining that the article was written about one topic instead of another.

Notice how it also doesn't talk about system architecture, load balancers, disaster recovery, etc? It's because the author chose to focus the post on cluster configuration. The topic of security could be its own standalone writeup and I highly doubt that its omission is an endorsement for running an ES cluster totally exposed and unsecured.


The argument is that you can't have an in-depth production guide to Elasticsearch without a section on security. "Production" should be "secure". A better title would be "optimizing Elasticsearch performance in production" or something of the sort.


To be honest I think if you're responsible for running production systems, it would be a no-brainer to run everything as closed up as it gets, with only access from servers which actually need it.


Yet we see security breaches caused by trivial misconfigurations and bad (or no) firewall setups. Chances are, people building these systems aren't accustomed to security-first deployment and will use and bookmark a guide like this to properly set up instances, rarely if ever going back to the docs or looking at other guides.


Chances are, people building these systems aren't accustomed to security-first deployment and will use and bookmark a guide like this to properly set up instances

Or they aren't given the time, running on ASAP-brand project management and/or pushing the POC to prod.


> This guide is clearly intended to focus on the ops-side of ElasticSearch.

What's your point? The ops side of anything also covers security. In fact, you cannot have ops without effective security.


I can't answer for cloakandswagger, but GPs comment sounded to me like this blog post is missing something essential because it doesn't talk about security.

This isn't an expensive course on setting up the perfect ES cluster in production.

As someone who is currently planning to set up a substantial ES cluster, I'm very grateful for someone to write up their learnings in such a compact overview.


The antonym for "insecure" is not "perfect."


An ES stack is fairly easy to get up and running in a development environment with docker-compose. But, not so much with a secure production installation. After going down the path of trying to get production up and running with security, I found Open Distro for Elasticsearch [1] to be very helpful. https://opendistro.github.io/for-elasticsearch/


Indeed OpenDistro is a great option to have. Currently using ODFE Security plugin with X-Pack basic (minus X-Pack security module).

Still waiting on these guys to review my perf patch though:

https://github.com/opendistro-for-elasticsearch/security/pul...


Disclosure: I work for AWS, but not directly on Open Distro

I'm sorry about the lack of engagement on that PR, Dmitry. Let me see if I can get some attention on it.


Thanks, I appreciate that.

I tried reaching out on forums and was assured they’d get to it. I’m used to things going slow in OpenSource so not too worried at this point.

Would be lovely to upstream it though, as I’ve patched 2 versions now and not looking forward to continue doing so ;) Plus I’d be glad to contribute more things based on our experience running ODFE, I’m very intrigued by other upcoming plugins in ODFE repos as well.


Hi Dmitry - Thanks for flagging the PR that needs review. I've pinged our engineers to review. Ping me @alolita on the PR whenever you contribute. Greatly appreciate your patience and your contributions to Open Distro :-)


Hey DmitryOlshansky,

I am one of the PMs and we discussed this PR internally. Our engineer will connect with you soon. Apologize for the delay but our team was working on getting the release out.

Thanks


Not to worry, I’ve been on the other side of open source - being one of maintainers for some parts of D language. It takes a lot of time and effort.

Hope to work with you guys on OpenDistro more closely in the future.


I just wanted to point out that Elastic has made some changes in the last year or so that help with security like...

* [Making security bits available with the ("free") basic license](https://www.elastic.co/blog/security-for-elasticsearch-is-no...) * [Releasing Kubernetes Operators with security enabled by default](https://www.elastic.co/guide/en/cloud-on-k8s/current/index.h...)

This has the effect of making most "getting started" guide setups more secure by default, which is good.

Unfortunately this is a new change and those bits are not in the Apache licensed core offering, but it's still a big improvement IMHO.


While we’re at it let’s touch on how vulnerable nginx is because port 80 is open.

/s


There's a huge difference between an HTTP server listening on a port with no obvious "map" of what's available, an ES server that can quickly be explored and its contents exfiltrated with no prior knowledge of the content.


Does anyone have experience running Elasticsearch as a kubernetes deployment? Can you just spin up some big-RAM containers attached to persisted volumes?

Elastic Co. seems to have an offering specialized for k8s: https://www.elastic.co/elastic-cloud-kubernetes but I can't understand what it does exactly.

Our data is not crazy-big and it doesn't need to be super performant, but for operational simplicity I'd like to deploy as part of the production cluster like all our other app containers rather than some "special" type of container.


I found Elastic Cloud on K8s to be the best way to deploy and manage Elastic clusters on Kubernetes so far.

https://www.elastic.co/guide/en/cloud-on-k8s/current/index.h...


Elastic Cloud on K8s is an operator that uses CRDs to define Elastic resources. The operator manages and deploys the appropriate deployments and statefulsets for the resource. It handles upgrades of the Elastic services as well. The operator pattern creates a more declarative way of provisioning Elastic resources.


Yep, I'm running ES in a StatefulSet. It works nicely out of the box using headless Services for node-to-node discovery, and by using a custom preStop hook, to make sure that the cluster wouldn't become RED after the node shuts down.


Does the Prestop hook just leave the cluster instead of dying and waiting for reconciliation?


It's blocking forever if the cluster isn't GREEN.


I was successful in using this guide: https://aws.amazon.com/blogs/opensource/open-distro-for-elas... to setup Amazon's Open Distro version of Elasticsearch/Kibana. I had to modify it to work with Elasticsearch 7.x (which corresponds to Open Distro 1.x). This guide was written for Elasticsearch 6.x (which corresponds to Open Distro 0.x).


I appreciate the systems perspective and find the writeup useful. However, from a production perspective, I think security should be topic one.


> from a production perspective, I think security should be topic one

Two general approaches to security:

- Upgrade to a paid Elastic cluster, and use their own feature-full security suite.

- Put a reverse proxy server in front of Elastic (like nginx), and configure that to handle security.


We used to use ElasticSearch a lot for log aggregations. Its a beast of its own. you still need a dedicated team to handle this. Eventually moved to Splunk, wavefront like solutions. It ll costs a lot lesser and frees up engineering time to build a better product.


Genuine questions as someone with no experience in dealing with large sclae log aggregation: Can you share some details on what kind of issues you ran into in production with Elastic Search that needed a dedicated team to manage ?


related to this: anyone can share practical advice when it comes to running Elasticsearch for application and infrastructure log aggregation?

We started from using Elastic Cloud which is nice enough and saves us the time to initial configuration. However I'm still unsure if the choices I made are the right way to go when designing the indexes.


1) Firewall it before anything else.

The rest is pretty much common sense for anyone used to run Solr and other indexers in production, but I've always been somewhat amazed at finding default setups insecure (a situation that was still largely true when I looked in-depth at the Python clients last year), which is madness considering that the security (authentication, authorization and auditing) docs are pretty good.

So making sure it's only accessible to either localhost or a restricted subnet _when you install it_ seems like the minimum sane thing every sysadmin ought to do by default.

Maybe nobody actually reads the docs...

(edit: typos)


Is there a non-Java alternative to this ES/Logstash stuff? Preferably rust or a native lang, but okay with CLR too. I'm not comfortable running Java in production after previous memory issues...


You can check out https://vector.dev/ to replace Logstash. Not sure about replacing Elasticsearch with something non-Java. Especially for the search use case -- Lucene is fairly dominant. For metrics you have prometheus (Go, not sure if that is better for memory issues with the non-tunable GC). You will probably want/need a clustered storage backend for prometheus. For that you have lots of options: https://prometheus.io/docs/operating/integrations/#remote-en... . Of those, TiKV (Rust), InfluxDB (Go), and TimescaleDB (C - its a Postgresql extension) seem like decent options.


Not only is Go's GC non-tunable there's also no way to set a heap limit, so you have to rely on the oom-killer for memory limits.


This is obviously a dumb question, being in this thread and all but what's the difference between LogStash, Elastic Search and Lucene? I only ever saw one amazing demo once with real time search and I remember hearing all 3 names. My guess is LogStash stores and the other two retrieve?


Lucene is the core search engine, it can be used on its own and has a lot of depth in its many capabilities.

ElasticSearch (and SolrCloud >> LucidWorks Fusion Server) add a distributed architecture that leverages Lucene's capabilities.

LogStash helps ES easily deal with log files, and it has been one of the main marketing drivers used by elastic which is ironic since most of Lucene's capabilities are more useful on human generated text as opposed to machine generated logs.


Logstash accepts events from different types of sources, processes them (there are tons of inputs, filters and outputs available), converts them to Elasticsearch-compatible format and sends them to the cluster.

It is usually used as a central component.

For logs and metrics reading, one can use Elastic Beats,such as Filebeat for logfiles. Beats are lightweight agents written in Golang.

Recent versions of Elasticsearch also allow to perform the processing step directly in Elasticsearch, using ingest nodes and pipelines (my experience with it wasn't great). In that case Logstash isn't needed.


AFAIK, Elasticsearch is the database system. Apache Lucene is the engine that Elasticsearch uses to perform searches, and the Logstash is the system that retrieves (or receives) data, processes it and gets it into ES.


You should give it another try. New versions of Java have made great progress in the memory area.

I feel like a lot of hate Java gets nowadays is largely due to historical reasons.

Though come to think of it, if you're talking about ES itself and not Java, then I have no idea, I never used ES in prod.


> New versions of Java have made great progress in the memory area.

Elastic uses huge amounts of memory (30GB is the norm). That's not a problem by itself, but Java garbage collections at this size can often take 10 seconds or more to complete on modern hardware. During this time, the server is basically down, the only solution is to increase timeouts or use replication servers.


> but Java garbage collections at this size can often take 10 seconds or more to complete on modern hardware.

For heaps of this size you can use Shenandoah GC and you get pauses well below 100ms. Your post is a perfect example of the type of 'historical reasons' (or historic FUD) the GP talks about.


> For heaps of this size you can use Shenandoah GC and you get pauses well below 100ms. Your post is a perfect example of the type of 'historical reasons' (or historic FUD) the GP talks about.

This GC is relatively new, and Elasticsearch doesn't support it (https://discuss.elastic.co/t/support-for-shenandoah-gc/16237...)

It may be a potential solution, but it's hardly a proven one. The concern is not FUD, it's legitimate.


Couchbase could be a reasonable alternative, depending upon your requirements. Couchbase is written in Erlang iirc and Couchbase's indexing is written in Go.


MarkLogic is an alternative, but it isn't free.


Great writeup. I wish there was a search engine built on the top of Riak that has a bit simpler workload distribution.


There's really nothing simple about building apps on top of Riak. It's one of those things that seems simple until you use it in production, and then you realize it's a total nightmare and you can't wait to sunset it.


Without details, I am not really sure what you are talking about. Riak is just a key-value store, probably the simplest abstraction to store data. I have put it in production several times and operated it for years. Reliability and performance is just outstanding. Scalability was capping out around 150 nodes in the same cluster.


Is Riak still supported and used?

I was under the impression that it’s dead.

Is that not the case?


It is community supported so no commercial support. It is used in many services around the globe.

https://github.com/basho/riak_core/commits/develop-3.0


I recently built a webapp and tried to avoid using ES by using Postgres fulltext search and it’s working great so far


If you secure your Elasticsearch cluster without paying and want to test your queries, it seems to me there are three ways: command line, Deja Vu, Elasticsearch Head extension and if you try to use the Elasticsearch Head extension you will run into https://github.com/mobz/elasticsearch-head/issues/431

Overall, security became free only last year https://www.elastic.co/blog/security-for-elasticsearch-is-no... and knowledge and tooling is thin on the ground.


> every document you put into ES will create a segment with only that single document

That's not correct. A new segment is created (or rather, made immutable) after a commit. Creating a new segment for every document is a surefire way to kill performance.


This was a really helpful article for understanding the architecture of Elasticsearch. However, what I really want to know is why Elastic has the reputation of crapping itself for no reason and what can be done about it.


It seems like most articles discussing Elastic search administration read like an instruction manual for something like steam locomotive - describing just how to constantly shovel the coal in, relieving the steam pressure in just the right way, etc.


I have found you need a queue in front of ES for better write reliability, more so than other DBs. The read/aggregation performance is fantastic though.

Also, you should know a little about GC tuning too


Can I ask how you implemented such a queue?


One way for example would be store all events you want to write to ElasticSearch to a Queue (like rabbit, Kafka, etc) then have a set of workers/importers (this can even be logstash with input from queue and output ES) that read the events from that queue and store them in ES.

This way your system becomes a sync, and your system above is not affected if ES becomes slow writing or goes down for a reason. Your events will remain stored in the queue waiting to be processed.


We had our own database as a service thing that used ES and MongoDB. All teams used that service. Basically you would define your schema using the Thrift API and the service provided CRUD operations AND aggregations. This was a Java shop.

Scaling writes just meant changing that service's implimentation. We wrote them to Mongo and synched them to ES. There are much better ways to do that, but ES would be overloaded w/ just 500 writes a second so it didn't matter.


I was considering to use ElasticSearch to replace my CouchDB indexes that are way too slow, memory hungry, and not optimized. But I read somewhere, can't remember the source, that ElasticSearch doesn't offer any guarantee that all your data will (eventually) be saved or returned when queried. Is that the case?


Maybe you are thinking of Aphyr's analysis of elastic search where he showed that Elastic can lose indexed documents during network partitions:

https://aphyr.com/posts/323-call-me-maybe-elasticsearch-1-5-...

That has been done on relatively old version. Elastic documents known (and fixed) issues here https://www.elastic.co/guide/en/elasticsearch/resiliency/cur... but I wouldn't trust this to the letter mainly because of their previous handling of such issues.

Elastic is great as a search index, not as a primary database.


Thanks for your clarification.

Not using it as a primary database is fine and I understand that it's not designed for that. But I need to trust the results when I query it. Is it designed for fuzzy searches on many documents when missing one or two documents has no consequences or can I trust that I will always retrieve all documents that should match my query ?


In theory with correct settings you should get all (or be told that the results are partial, e.g when shard is unavailable or timed out).

In practice there are bugs, which are notoriously difficult to find and reproduce when you go distributed. Jepsen is closest to that I know about.


Thanks again. Distributed systems are difficults. I think I should have gone with a old school PostgreSQL.


The default refresh rate for ES is a minute or so (can’t remember the exact time). This means when you index a document, it won’t be returned when you search for it until a minute later when the index refreshes.

But you can certainly change the configuration so it’s practically real-time. There will be some performance hit though.


fwiw, the default refresh interval is 1 second. You should manually increase it if your application allows it. For bulk-indexing you can think about disabling it outright. Source: https://www.elastic.co/guide/en/elasticsearch/reference/curr...


They have done quite a bit of work to improve resiliency. You can learn more about the status of this work here:

https://www.elastic.co/guide/en/elasticsearch/resiliency/cur...

TL;DR it is still probably not ideal as a primary data store.


Can someone with experience with multiple indices comment on the relative advantages and disadvantages of running each of the major ones in production:

- Elastic

- Lucene

- Solr

- CouchDB

- Any other ones I'm unfamiliar with


For ES and solr gurus here what is the recommended max size for documents if you want to index lot of office documents?


> For ES and solr gurus here what is the recommended max size for documents if you want to index lot of office documents?

I run a large ES index covering 80TB of data, and regularly index documents as large as 20MB. At that size, the bottleneck is often transferring that much data over a network to the ES cluster. You need to make sure your HTTP client can handle it, and the network has enough bandwidth. Elasticsearch itself is not really the bottleneck.


Whatever makes sense to your users for their search results. Do they want to get back the whole document or just the relevant parts?

If there are separate sections in the office documents that you can pull out and index as separate fields then you should do that. For example, if you were indexing patents, you would want to index abstracts and claims into separate fields.


The text you display back to your user doesn't/shouldn't have to depend on what you index in your information retrieval system.

At my last job, we used to index large documents on Solr. The largest chunks of them were indexed in non-stored fields. Which means they were searchable but you couldn't retrieve the actual text. This drastically cut down on the index size and the resources we needed to support it.

Then after scoring you'd return the top hits to the main application as IDs, which could retrieve everything needed (full text included) along with all the other information they'd retrieve for the document and generate the actual output.


Just relevant parts. ES says their max size is 100 mb. We have a real life scenario where we want to index millions of office documents to find PII/PHI

What is the realistic expectation here. Should we say 50 mb. How everybody else do?


Not sure about ES, but Solr removed it's max field limit in release 4.0. Text documents tend to be a lot smaller than people expect, both in terms of word count and file size. I think you will be fine with 50 mb if you are using ES.


Is it ok to have two nodes (mdi) ? Or does 2n+1 means you always want a odd number of master eligible nodes?


Can anyone provide rough numbers about how a million documents with normal queries on it would cost?


No one can answer that for you, you need to test it out your self. Depends on size of documents, number of keys, volume of reads, etc etc. Spin up 1 instance and test it out ;)


Good summary. I recently did this and had to figure this all out by trial and error.


Struggling to read this ... the domain typo triggering OCD


For a long time, I read articles only in Pocket. The web has become unreadable at this stage. It is a bit funny that it was created to make the distribution of knowledge easier, having readability its primary feature.


Great to read from someone who knows it so deeply.


> Great to read from someone who knows it so deeply.

I'd say it's a decent read on Elasticsearch tuning at the intermediate level, but not enough to really get high performance from Elastic.

One of the problems with Elastic tuning is that the tuning parameters depend deeply on the type of data you're indexing. Indexing text is far different than indexing numbers or dates. A mapping with a large number of fields will behave differently than with just a few. Some datasets can easily be split up into different indexes, and others are cannot be.

To really get the most from Elasticsearch, you have to know what it's doing under-the-hood, and how that maps on to your data. Elasticsearch hides so much complexity (generally a good thing), but unfortunately, it can be difficult to know where the bottlenecks are.


Is there a guide for deploying ES on K8S?


For kubes you could use the helm chart they provide and tune it similarly. https://github.com/elastic/helm-charts/tree/master/elasticse...


Elastic just released their operator called Elastic Cloud on Kubernetes (ECK): https://www.elastic.co/blog/elastic-cloud-on-kubernetes-ECK-...

Operators are basically mini-programs that run in your K8S cluster and automatically handle deployment, upgrades and maintenance for their specific software. This operator can setup all of the ELK stack and it's all done through custom resource definitions. Use this instead of the helm charts and manual YAML files.


I prefer solr




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: