Heaven forbid you make a configuration change that triggers a blue-green deploym...

outworlder · on Oct 12, 2019

> Heaven forbid you make a configuration change that triggers a blue-green deployment

The main usability problem is that they don't tell you when that will be the case. It used to be the case when scaling up instances which was completely surprising. It's not the case anymore for scale up, so it's improving.

We bitch and moan about Google labeling everything "beta"(or even "alpha"), but the beta moniker would be appropriate here.

bifrost · on Oct 11, 2019

> The workaround AWS support proposed was to reserve 2x capacity so we wouldn't run into this issue on subsequent deploys.

Thats pretty lame of them to make you pay them more money instead of functioning properly.

lflux · on Oct 11, 2019

That's exactly what we told our TAM, SA and the AWS ES PM cc'd on that thread.

bifrost · on Oct 11, 2019

Did they ever give you an explanation as to why it happened?

lflux · on Oct 11, 2019

1. We changed a configuration parameter, which we thought would be a no-op but caused a deploy

2. We ran a fairly large cluster with a large amount of i3.xlarge

3. One AZ in us-west-1 didn't have enough capacity of that SKU . The other AZ did, but this doesn't help if you have a multi-AZ deployment.

4. We switched to EBS-based instances after this

nostrebored · on Oct 11, 2019

Reach out to your TAM again to talk about CR and RI combinations which can help to mitigate this problem.

lflux · on Oct 11, 2019

Migrating to a different hosting platform than AWS for ES aslo mitigates this problem too, which is more likely in our case.

SubuSS · on Oct 11, 2019

I am probably missing something here: If I understand what you're calling a blue/green deploy correct, you essentially want the ability to run 2x capacity for at least a little time (deploy time). So why wouldn't you reserve 2x?

Or switch to a AB like deploy? (deploy 5% or so, test against 5% from original deploy and decide on future).

coder543 · on Oct 11, 2019

> you essentially want the ability to run 2x capacity for at least a little time

This isn't the customer's choice. The customer does not want this.

As the article talks about, AWS Elasticsearch isn't actually elastic. On standard Elasticsearch, you can add and remove nodes at will and it will automatically handle rebalancing. AWS Elasticsearch can't do that. It has to spin up a new cluster of the desired size, copy everything over, and then turn off the old cluster. That is a form of blue/green deploy.

> So why wouldn't you reserve 2x?

Why would you want to pay double all the time because AWS can't use Elasticsearch correctly? AWS should foot the bill to ensure that everything works properly within their broken implementation when something requires the cluster to be duplicated and redeployed, not the customers.

> Or switch to a AB like deploy?

To reiterate: this isn't their choice. AWS forces this inefficient methodology on users of AWS Elasticsearch, which is why the article strongly recommends against using AWS Elasticsearch.

runamok · on Oct 12, 2019

Right. It should be trivial when instituting a change that would trigger an event like this to take inventory that the required instances are available.

henryfjordan · on Oct 11, 2019

Reserving 2x costs 2x.

Doing a Blue/Green deployment setup costs 1.003x assuming a 5 minute switchover once a day. Being able to provision an extra server instance for a few minutes is kinda the whole point of moving to "the cloud"

SubuSS · on Oct 14, 2019

Ok -I mixed up some GCP terms with AWS ones then: At least in GCP you can 'reserve' stuff that you want to use and pay for the ones you actually use. I am assuming they cap the reserve - I am yet to run into a situation where the reserved cores were unavailable.

I remember doing capacity planning like that for dynamo - but elastic search might be different.