
AWS Elasticsearch Service Woes - kiyanwang
http://www.havingatinker.uk/aws-elasticsearch-service-woes.html
======
andrewvc
As the author mentions AWS Elasticsearch does lag behind in versions. They
still only support 1.5.2[1] which is over a year old at this point and is
missing a lot of performance optimizations and operational improvements.

Elastic (the company behind Elasticsearch) provides an AWS based SaaS
Elasticsearch product [https://www.elastic.co/cloud/as-a-
service](https://www.elastic.co/cloud/as-a-service) with a very sane version
policy[2] that keeps pace with the latest Elasticsearch releases.

Disclaimer, I work for Elastic

1\. [https://aws.amazon.com/elasticsearch-
service/faqs/](https://aws.amazon.com/elasticsearch-service/faqs/)

2\. [https://www.elastic.co/guide/en/cloud/current/version-
policy...](https://www.elastic.co/guide/en/cloud/current/version-policy.html)

~~~
Tomdarkness
We've looked into using Elastic hosted Elasticsearch ourselves but what really
put us off is that even if your whole cluster fails the only support method
you get is a forum. You have to purchase a support package for the whole of
Elasticsearch at some unspecified price just to get proper support for the
hosted service itself. We don't want support for using Elasticsearch just for
the hosted service we're paying for.

~~~
jdc0589
the "unspecified" price is a lot. I think they reworked their pricing
structure sometime around ElastiCon this year, but prior to that a
subscription (which includes support, and all their products; only way they
sold stuff then) was ~$5k /server/year.

I think it may be cheaper now though.

~~~
padelt
Got a price slightly above that in EUR. A cluster of 1 to 5 nodes should go
for ~24.000€ (customer would've been SMB non-IT enduser company). My gut
reaction was: Insane. They do include infrastructure consulting, support and
the premium plugins as part of that. So at least if you can grab lots of
consulting that way I can see where they are going. Then again, needing lots
of consulting is a problem in itself. And I guess operating Elasticsearch is
not for the faint of heart anyway...

~~~
beachstartup
everyone wants competent 24/7 support, but nobody wants to pay for it.

smart people cost a lot of money, having them on call at any time of the day
or night costs even more money.

~~~
robalfonso
Agreed, however, their model should include a base level of direct support for
straightforward things. Like "my cluster is unresponsive" right now thats a
forum post and seems a really poor experience.

~~~
beachstartup
"my cluster is unresponsive" is not straightforward at all. in fact it's the
vaguest, most un-straightforward kind of problem you could have.

one of the interview questions i ask is "a customer calls in and said their
system is unresponsive. what do you do?"

this will weed out a bullshitter 99% of the time, because the potential
problem domain is the entire OSI stack, across the open internet, and also
could be potentially on the customer side, which is most likely also a complex
setup.

if you want someone who can _actually_ fix the problem to help you, it's going
to cost a lot of money. one of these people is six figures. now imagine a
whole team of them, 3 shifts a day, every single day, forever.

~~~
robalfonso
You miss my point, I simply mean a forum post is not ideal for support. In my
case I would want a ticket, something concrete I know can be sent and seen. A
forum does not engender confidence my issue will be seen and also I would not
want to put private info into a forum.

Also I gave a poor example. I run a Es cluster so my thinking is at the point
of my cluster is unavailable i know it's all red and it's not between my
service and the endpoint.

~~~
beachstartup
anything free or included will be abused by a certain % of the customer base.

if you feel that strongly about it, buy the support.

------
willejs
One of the main issues i had with it was that getting the IAM policies correct
on the cluster was an extremely slow process. Each change requires a rolling
cluster update, replacing the underlying EC2 instances with the correct roles.

Secondly access control sucks. You can only whitelist by external IP, and you
cant access elasticsearch from your VPCs directly, as they sit outside your
VPC.

I abandoned it, wrote some chef and terraform and had a much more stable and
flexible setup. However at an increased management overhead. If it works for
you thats cool, but there are caveats, so beware.

~~~
simonvdv
We ran into the access control limitations as well. They are caused by the
fact that for some reason AWS ES only supports resource based policies which
is imho the wrong way around to manage your policies.

We did get it to work in a useable manner by having the ES policy apply to a
role (i.e. the principal is a role). If you than apply that role to your
instances it will work for with instance profile based auth.

~~~
willejs
Thats exactly what i was doing with the elasticsearch plugin for logstash, but
i still couldent figure out how to auth vanilla http requests to it from
kibana or the like... Then i decided id wasted way too much time on this, and
would just build it myself. Other services such as bonzai support basic auth
which i would have almost preferred :/

~~~
steeef
I was able to use the code mentioned in this AWS forum post to configure a
proxy using node.js:
[https://forums.aws.amazon.com/thread.jspa?threadID=218214](https://forums.aws.amazon.com/thread.jspa?threadID=218214)

Code:
[https://gist.github.com/nakedible-p/ad95dfb1c16e75af1ad5](https://gist.github.com/nakedible-p/ad95dfb1c16e75af1ad5)

Looks like it's been turned into an NPM-installable module too:
[https://github.com/santthosh/aws-es-kibana](https://github.com/santthosh/aws-
es-kibana)

------
aaronkrolik
Another annoying point: the documentation only lists the api methods you _can_
call, but not the ones you cannot. I was a bit irritated to find out that the
/_update method is not supported on AWS ESS

------
skywhopper
I hear 2.3 support is coming in the next few weeks.

Being stuck on an old version is painful but if your needs are simple then
it's very useful. Ultimately if your needs are not simple then you probably
should be running it yourself, just like anything else. Managed hosting for
any database is never going to be as customizeable as you need if you operate
at a huge scale.

~~~
donretag
My sources tell me it is version 2.2 at the end of June (now) or July.

~~~
dcosson
This intel is like gold, where'd you hear it from?

The lack of transparency is my least favorite thing about AWS. I see the value
in keeping brand new services or huge feature additions secret until they're
released. But for small issues like when they'll update elasticsearch version
or when they'll fix an acknowledged bug, I don't get why they can't share that
information with customers (e.g. the way many open source projects track next
version milestones on github).

Is it any different at the Enterprise support level?

~~~
skywhopper
If you pay for top-level support and spend a lot of money on the services you
can get some information ahead of time, but it's all strictly under NDA. The
more money you spend the more open the teams are willing to be, but they are
understandably cagey about committing to timelines until features are
imminent.

------
jdubs
It's a crappy service. When the cluster autoscales to deal with load the
cluster becomes unavailable for a minute or two while it restarts. In my
previous role we had a very large cloud search cluster that was always under
high load and very frequently threw 500s when the environment scaled up.

~~~
AznHisoka
Have you ever managed your own non-trivial Elasticsearch cluster personally?
Elasticsearch is filled with so many subtle gotchas that aren't present in
other non-distributed databases.

I personally feel once you've managed your own ES cluster for awhile, you tell
yourself there's no freaking way any cloud system can abstract all these ugly
(but important) configuration settings and corner cases into a neat, nice
cloud product. No freaking way. I'm gonna manage this myself.

------
freeman478
The main issue of this service is just that it is very outdated (it only
supports ES 1.5.2) when the latest is 2.3.3 and ES 5 is just around the
corner.

The recent versions are far more stable and includes lots of efficiency in
terms of memory usage.

~~~
vacri
1.5.2 is only 15 months old, which isn't all that long in 'enterprise time'.

~~~
happyslobro
Forget enterprise time, it's about to be two major versions old in major
version time, once v5 is official. Not sure what happened to v3 and v4, I
think Elastic skipped them to get all of their products aligned to the same
version.

Elastic says they don't backport fixes either, so in this case, it might be a
good idea to stay near the leading edge of the release cycle.

------
cheald
Elasticsearch is one of the easiest clustered databases to administrate, and
it's a lot cheaper to just run your own EC2 instances with ES than it is to
use the managed ES service.

A roll-your own deployment is basically "install Elasticsearch, install cloud-
aws, configure cloud-aws with a security group ID to identify other machines
in the cluster, add that security group to all machines you want to cluster
together, start the service". Totally braindead-easy.

~~~
fizx
> Elasticsearch is one of the easiest clustered databases to administrate

You seem to have installation confused with administration.

Off the top of my head you forgot security, monitoring, logging config,
backups, handling common production issues such as splitbrains, write
multiplication, garbage collection snafus, upgrades between versions with
questionably compatible internal apis.

Running any database in the cloud is non-trivial, and, running elasticsearch
effectively in the cloud is harder than running Zookeeper or Cassandra, for
example.

Source: I founded and sold an ElasticSearch hosting company.

~~~
cheald
Much of that isn't specific to Elasticsearch; security, backups, logging
config, etc is all going to be extremely common across just about every
similar product. I didn't say that Elasticsearch was "trivial", I said that
it's one of the easiest clustered DBs to administrate. Anything distributed
has a whole set of caveats that you have to be aware of and know how to
handle, but using managed ES vs hosting your own doesn't really change any of
that.

My point is that the value-add from AWS' managed ES product is much, much
smaller than it might be for other similar products because of the relative
ease of administrating ES.

~~~
AznHisoka
"easiest clustered DBs to administrate"

easy + clustered should never be in the same sentence.

------
Shizka
Is it a better alternative to set it up on EC2 instances?

~~~
derptacos
Would defeat the purpose, managing it yourself could cost substantially more
in $ + time.

~~~
elssar
In terms of AWS costs, running your own cluster will cost significantly less,
specially if you use reserved instances. The AWS ES instances cost ~1.6x more
than their regular counterparts. Also, you can't reserve ES instances.

~~~
njovin
That's if you have a qualified sysadmin on-staff to manage it. We tried
managing our own ES setup for a little over a year before giving up. Despite
our simple needs and low traffic volume we regularly had difficult-to-diagnose
issues with memory usage, node interconnectivity, and performance. With 3 EC2
nodes in a cluster we would occasionally have the cluster health go to yellow
and remain in a 'recovering' sate indefinitely, bringing performance to a
halt, and we'd essentially have to rebuild and swap the cluster.

Granted, none of us were proper ES admins, but we had a lot of experience
working with system administration and specifically database performance and
clustering. Despite that we were definitely in over our heads with ES.

~~~
toomuchtodo
Qualified sysadmin/devops here: You can run a small Elasticsearch cluster (<9
nodes) without a dedicated ops person if necessary. Run with more than 3
nodes, ensure you have a proper number of index shards/replicas, and ALWAYS
use an odd number of nodes.

Some overprovisioning will be required, but with the extra infra spend you're
delaying the need for a dedicated role to manage it.

~~~
techdragon
Software I have to 3x over provision to keep alive is crappy software.

ElasticSearch is by far one of the most obnoxious software programs I ever
have the misfortune to administer. I now avoid self hosting it at all costs.

~~~
cheald
You don't "3x overprovision" Elasticsearch in particular, you should always
provision a minimum of 3 nodes for any highly-available clustered DB. 1 node
may fail, you're down. With 2 nodes you can develop a partition or lose a node
and neither node can elect itself master, you're down. With 3 nodes, you can
lose a node or develop a partition, the other 2 nodes can reach a quorum on a
new master and continue operating.

~~~
njovin
This is how it should work, in theory. However, in practice, with ES whenever
a single node went down the whole cluster would fail. Trying to add a new
member to the cluster never worked, nor did trying to recover the failed node,
hence the cluster-swap.

It doesn't help that their logging (at least pre v2) is incredibly dense.

~~~
cheald
ES clusters will fail on the loss of a single node if you aren't running any
replicas on your shards, but that's not really ES's fault. I've occasionally
had a node just wig out and need to be restarted, but that's like a once-a-
year thing, and I'm working on a ~TB cluster that processes a ridiculous
number of writes - this isn't an underworked cluster by any means. As long as
your cluster discovery mechanism is set up properly, adding and removing nodes
from the cluster is about as easy as it gets. I'm certainly not saying that
your experience wasn't valid, but my own experience with it has been that it's
remarkably easy to manage.

ES was pretty brittle in the pre-1.x days, but from 1.0 onward it's quite easy
to work with. The logging is dense, but that's because it's thorough - a
feature I really quite appreciate.

------
trungonnews
Why are they still stuck on 1.5?

~~~
notyourwork
Why are you implying they are stuck and cannot move forward? The question
should say "Why have they not upgraded to a newer version?". Stuck implies
something prohibiting them from doing so when in reality it could be a
voluntary choice they have decided.

