I run a few 10TiB ES clusters (which, is not much to be fair) but infrequently find that I have to reindex or reshard the cluster because I can’t just add another node. There’s something to be said for understanding the index rotation too, and access patterns.
It’s easy to make an ES cluster, it’s difficult to maintain one, it’s nearly impossible to debug one.
- if you consider that “it’s slow” is what you have to debug.
- Hey, a node died!
- Run Terraform to stand up a whole new cluster and restore it from a snapshot.
- Update the app to point at the new cluster.
- Run Terraform to delete the old cluster.
I'm pretty happy with this arrangement.
This is exactly it. This is a problem you encounter with every database engine, but in most of them you can quickly find the bottleneck and fix it. With elasticsearch... it's a frustrating and expensive game of trial and error.
For information, what does "10TiB" refer to in this context?
Is it the size of what ES takes in RAM, or the size of ES' index, or is it the total size of the corpus that ES must index? Or corpus size + index ?
(Contrived example for the sake of illustrating the point)