(Disclaimer: I work for AWS but opinions are my own. I also do not work with the...

Corrado · on Nov 29, 2020

This outage highlighted our dependency on Cognito. Everything else we are doing can (and probably should) be replicated to another region, which would resolve these types of issues.

However, Cognito is very region specific and there is currently no way to run in active-active or even in standby mode. The problem is user accounts; you can't sync them to another region and you can't back-up/restore them (with passwords). Until AWS comes up with some way to run Cognito in a cross-region fashion, we are pretty much stuck in a single region and vulnerable to this type of outage in the future.

otterley · on Nov 29, 2020

Please bring this to the attention of your account team! They will bring your feedback to the service team. While I can’t speak for the Cognito team, I can assure you they care deeply about customer satisfaction.

Corrado · on Nov 30, 2020

That's a great idea. I'm writing an email right now! :)

roman_sf · on Nov 29, 2020

What do you mean by cross-regional dependencies? Isn't running in multi-region setup is by itself adding dependency?

Speaking about multi-region services. What do you think about Google now offering all three major building pieces as multi-regional?

They have muti-regional buckets, LB with single anycast IP, document db (firebase). Pubsub can route automatically to nearest region. Nothing like this is available in amazon, well only DIY building blocks.

otterley · on Nov 29, 2020

If your workload can run in region B even if there is a serious failure of a service in region A, in which your workload normally runs, then no, you have not created a cross-regional dependency.

When I talk about cross regional dependency, I talk about an architectural decision that can lead to a cascading failure in region B, which is healthy by all accounts, when there is a failure in region A.

AWS has services that allow for regional replication and failover. DynamoDB, RDS, and S3 all offer cross region replication. And Global Accelerator provides an anycast IP that can front regional services and fail over in the event of an incident.

roman_sf · on Nov 29, 2020

I haven't used global accelerator but it doesn't look like the same. On landing page it says: "Your traffic routing is managed manually, or in console with endpoint traffic dials and weights".

otterley · on Nov 29, 2020

“Global Accelerator continuously monitors the health of all endpoints. When it determines that an active endpoint is unhealthy, Global Accelerator instantly begins directing traffic to another available endpoint. This allows you to create a high-availability architecture for your applications on AWS.”

https://docs.aws.amazon.com/global-accelerator/latest/dg/dis...

Alternatively, global load balancing with Route 53 remains a viable, mature option as well. Health checks and failover are fully supported.