
Service Discovery: An Amazon ECS Reference Architecture - kiyanwang
https://aws.amazon.com/blogs/compute/service-discovery-an-amazon-ecs-reference-architecture/
======
josefdlange
This is going to come off as incredibly dense, but if the upstream services'
(goodreads, etc) DNS names are statically configured in app, why do we need
all this glue? Can't we just point a DNS record at an LB and add/remove
instances from the LB, making it entirely opaque to the downstream service
(portal)?

I'm not seeing what this buys. Maybe I'm dense. Anyone care to help me
understand the value here?

Looking at the code further, it looks like there are failsafe hardcoded DNS
names, but otherwise the DNS names for an upstream service are delivered via
an ENV variable. I don't get what's happening here that's magical.

~~~
josefdlange
Is it that standing up a new micro service will have its DNS automagically
created? Is that really all this buys? Some predictable automated DNS
management?

------
scg
All you get from this is a pretty CNAME alias for your ELB hostname. Since
you're (manually) creating an ELB for each service, you might as well copy &
paste the ELB hostname to your app's config.

I wish Amazon did more to improve their ELB offering:

1) A single ELB costs $18/mo, regardless of the number of backend hosts. This
might be OK if you're using ELB to front HTTP traffic from users, but it's
crazy expensive for internal service discovery.

If your app has 5 micro-services, that's $90/month just for having a basic
mechanism to access your ECS services.

2) ELB requires all backend hosts to expose the same port for load balancing.
You can't balance to the same EC2 instance on different ports.

3) As a consequence of (2), you can't have multiple containers on the same EC2
host behind a load balancer.

Many popular programming languages have a global interpreter lock; you usually
have to spawn multiple processes to make use of more CPU cores. People do that
with things like gunicorn inside the container or with multiple containers + a
haproxy load balancing container. It would be so much easier if ELB did all
this instead.

4) ELB doesn't support URL maps. (i.e. different sets of backend hosts for
different URL paths) The Google Cloud Platform load balancer supports this,
and it's a tremendously useful.

"""[...] without the need for “sidecar” containers or expensive code
change."""

People use "sidecar" and "ambassador" containers precisely because ELB is
lacking functionality. Improving ELB will go a long way towards making service
discovery easier on AWS.

------
dberg
Dns is not always a great solution if there are multiple A records. We wrote
about a method for doing this with iptables and consul here.

[https://tech.iheart.com/load-balancing-services-using-
consul...](https://tech.iheart.com/load-balancing-services-using-consul-and-
iptables-204cde23b072#.map38p2am)

~~~
spydum
I'm confused on why in your article, you say the setup doesn't use a load
balancer. How is a dynamic reconfiguration of IPTables not the same as a load
balancer? What happens when your iptables box dies?

~~~
dberg
Bc iptables would run locally on each server. So you would access
localhost:8000 which iptables would auto load balance to the nodes consul had.
There is no central iptables server

~~~
spydum
ah, some how that wasn't so clear. Clever idea, but how do you distribute load
across your front-end servers (or whatever is your actual public facing
service)? Do you still rely there on a load balancer or round-robin DNS? I
guess I'd be concerned about how to scale those instances in response to
demand..

------
jonhohle
Amazon had/has an internal ActiveMQ implementation based in this architecture.
When it worked, it worked fine. It was nice to be able find all active hosts
using any DNS tool of your choosing, and applications didn't need to have any
client config. However, when it failed, it meant debugging DNS, which few
engineers were prepared to think about or do at 2am.

------
jamescun
I am surprised to see them advocating Route53 for service discovery. Route53
has a very low API rate limit (5/req/s) where simply loading the web dashboard
can effect production use.

~~~
koolba
Where do you see such a rate limit?

The only limits I've seen for Route53 are the number of hosted zones (50 by
default?) and I don't see why that would be an issue. You'd only need one zone
per app environment (dev/qa/prod).

~~~
cthalupa
[http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/DNS...](http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/DNSLimitations.html)

>All requests – Five requests per second per AWS account. If you submit more
than five requests per second, Amazon Route 53 returns an HTTP 400 error (Bad
request). The response header also includes a Code element with a value of
Throttling and a Message element with a value of Rate exceeded.

Though the general AWS recommendation for API limits is implementing
exponential backoff + jitter.

[http://docs.aws.amazon.com/general/latest/gr/api-
retries.htm...](http://docs.aws.amazon.com/general/latest/gr/api-retries.html)

~~~
koolba
Interesting. I didn't know about that limit.

I suppose you can take it a bit further by batching requests (says up to 100
changes per request) but that would complicate distributed apps where the
requests aren't coming from one coordinator.

------
azinman2
Doesn't this then give a map to all your internal services for the world to
publicly discover?

~~~
luhn
Route53 supports private DNS. [https://aws.amazon.com/about-aws/whats-
new/2014/11/05/amazon...](https://aws.amazon.com/about-aws/whats-
new/2014/11/05/amazon-route-53-now-supports-private-dns-with-amazon-vpc/)

~~~
azinman2
Awesome didn't know

