
Microservice Discovery with SkyDNS - fastest963
http://blog.getadmiral.com/communication-between-microservices/
======
jolynch
I'm excited for all the buzz going on around service discovery, it's often one
of the most crucial design decisions for a SOA architecture. I'm really
bummed, however, that so many of them rely on clients doing DNS lookups. I
vastly prefer a system like airbnb/Yelp SmartStack, consul template, or really
any system that reflects discovery state in real load balancers such as
HAProxy, nginx, or even those new fangled service load balancers like vulcand.

Honestly just don't do DNS. Why?

1\. Performance

Service discovery should not happen during the life of a HTTP request. The
number of times that I've had to debug performance problem that ultimately
ended up being DNS requests taking like 100ms (which for most internal
applications is too damn long) is staggering to me. Yes you can mitigate this,
but it is not easy because even local DNS has to miss the cache sometimes.

2\. DNS propagation sucks.

This article talks about one side of the DNS propagation problem ("it's down"
-> removed from DNS), but they seem to completely ignore the other side of the
DNS propagation problem (removed from dns -> clients stop talking to it). At
any kind of reasonable engineering scale one of your libraries (watch out for
ones that cache DNS "for performance"), programming languages (java), or infra
tools (nginx/haproxy) are going to not re-resolve DNS like you expect them to.

3\. More Performance

Client side load balancing is legitimately hard, and if you're just doing
round robin over DNS records you are likely screwing your high percentiles.
HAProxy/nginx/vulcand/etc all support better load balancing strategies like
least connection load balancing that really make a difference here. This can
be the difference between a 500ms 99% and a 50ms 99%.

~~~
fastest963
These are very good points. I'll be honest, I don't have a lot of experience
with SmartStack, but I'll be reviewing it.

As far as caching, ideally you'd have no caching, however, certain
optimizations can be made on this front, such as combining in-flight-and-
duplicate queries.

Load balancing could be done via the SRV records, but that would be difficult
since it requires that you republish the SRV record. This could be done every
minute or so based on load but not on a per-request basis.

Thanks for your insightful comment!

------
lobster_johnson
SkyDNS is pretty nice, but I'm not sure I see a huge value in layering DNS on
top of etcd for microservices, unless you also standardize the SRV format
being used. Why not just use etcd directly?

After all, the format has a proprietary format/structure that no tool (outside
the SkyDNS sphere) knows about. So you've already invented a new protocol on
top of DNS. You're not able to leverage any existing tool/library out there
that resolves names via DNS, because they can't use your information. (SkyDNS
can of course be used to look up A/CNAME records, which I totally agree is
nice, but as more of a backwards-compatibility layer kind of thing.)

You do get some value from libc's name resolution, and you can institute a
policy across your apps that they resolve subdomains relative to the domains
listed in resolv.conf, so for one app to talk to another, they just look up
"fnord", which resolves to "fnord.dc-east.production.example.com" or whatever.
But I'm still not sure I see a lot of value over a client library that talks
directly to etcd. You're going to want a client library anyway, to deal with
those SRV records.

~~~
fastest963
You bring up good points. The big advantage I see is the ability to fallback
to standard tools if you don't want to use SRV. If I want to debug something I
can just type "dig SRV myservice.mydomain" and see the results. I can also
just use standard DNS libraries to do just regular A lookups if you don't care
about priorities. For anyone else that does, we've released a bunch of
libraries so they don't have to do anything to start using it.

I also wouldn't say that SRV records are proprietary, they're an official
record-type (that many DNS providers support).

Personally, I just don't see why anyone should be creating their own format
when some other established and compatible one exists. DNS is already widely
used for domain/website discovery, why should that not be extended to inter-
datacenter discovery? That's obviously rhetorical for this conversation, but
that's what we asked ourselves.

~~~
lobster_johnson
Ah, you're right — for some reason I was under the impression that the format
of a SRV record was specific to the protocol, but apparently it isn't. My
thinking was confused by the fact that SkyDNS apparently doesn't use the
standard name format, as specified by the RFC:

    
    
        _Service._Proto.Name TTL Class SRV Priority Weight Port Target
    

[1] [https://tools.ietf.org/html/rfc2782](https://tools.ietf.org/html/rfc2782)

~~~
bketelsen
SkyDNS supports whatever naming scheme you want to use. You set the names
yourself. That means it can be compliant with RFC2782 if you choose, or the
names can be semantic -- like Kubernetes uses, with each position in the name
implying a semantic meaning "servicename.kubenamespace.domain"

Disclaimer: I invented SkyDNS.

------
errordeveloper
Well, adding a library for supporting SRV records is not simple... If you use
Weave [1], it gives a unique IP address to every service instance and a DNS
record, all with zero configuration. That means you just stick any service in
whatever default port it has, and it's all good. You can also have round-robin
load balancing through DNS for free.

[1]: [http://weave.works/net](http://weave.works/net)

~~~
mentat
The performance penalties seem pretty significant at least as of a few months
ago. Has that changed?

------
jedisct1
SkyDNS is a very useful tool, but the name is very confusing.

Everytime I read "SkyDNS", I think it's about skydns.ru

------
siliconc0w
IMO - better to have etcd drive Nginx/HA Proxy via confd. This is flexible,
fast, and performant. While it could be a coo idea to push load balancing
logic entirely to the client to eliminate a load balancing tier altogether -
DNS isn't really designed for it.

------
kylequest
Definitely a cool idea! And it sort of works, but then if you need to remove
failed services quickly you need to use a non-DNS interface for it :)

~~~
fastest963
Failed services are automatically removed when the associated service "dies"
or closes its connection with skyapi. If you're using holdingpattern then it's
removed whenever the service stops. This is only because we wanted the removal
to be quick and automatic.

------
cjhanks
What's wrong with actual DNS?

~~~
fastest963
I'm not sure what you mean about "actual DNS" but I'm going to assume you mean
A records and not SRV records. A records are available for all services as
well with this set up we just preferred to use SRV so we can get port
information as well as weighting. If you want to stick to just A records
that's definitely an option.

~~~
KaiserPro
Well an actual DNS server, Whats wrong with authoritative DNS? just use DNS
NOTIFY to update slaves on record change.

That way TTL is not a problem, unless you have SSSD or nslcd doing caching on
the client.

It'll remove a massive layer of complication, and will mean that you can have
redundant service discovery even if your ETCD/skydns fails.

It'll certainly scale much, much higher than skydns. if you don't want to
create your own DNS, Use Dyn. It has a REST interface, a very nice SLA. if you
query the servers directly you also don't have to worry about TTLs

~~~
errordeveloper
Right, if you know how to set it up or have infrastructure in place... But
most developers don't have time to read 600+ pages on how to run bind server
OR simply have ops team that is very conservative at what goes in the DNS
land.

~~~
KaiserPro
just buy it in? dyn have a REST API.

failing that, bind is not difficult to learn. If you have an artificial divide
between your devs and you, then you have far bigger issue. If you can't
convince them of the merit of using DNS then there really is no hope.

the whole point of DNS is that you can delegate subdomains, so you can neatly
isolate zones from each other

Plus saying something _looks_ hard is a terrible justification for not trying
something. I know bind isn't trendy, but it works and is simple. Failing that,
there are at least 3 companies out there with REST APIs and 100% uptime SLAs.

prototype all the things!

