
Linkerd 1.0 - jfoutz
https://blog.buoyant.io/2017/04/25/announcing-linkerd-1.0/index.html
======
ninkendo
A lot of people are confused about why something like this is necessary, and
IMO it really comes down to one thing: sometimes the stuff you want to talk to
moves around.

As soon as you're not allowed to rely on the service you talk to saying on the
a static set of machines (and even on a static set of ports) over time, using
standard HTTP libraries becomes really awkward. (Did this request fail because
an instance shut down and I should talk to another one? How do I convince my
DNS resolving library to ignore its cache and re-lookup the service? Ok I
found 3 instances, which one should I talk to? How do we prevent all the other
copies of me from spamming the same instance? Gee, it'd be really nice to have
something push a notification to _me_ when the services change!)

Not-so coincidentally, this constraint is really common in "cloud native"
setups like Kubernetes and Mesos, where service instances come and go as part
of normal operation. This is because, in this world you don't update your code
by copying new files to some servers and SIGHUP'ing some daemon, you update
your code by deploying new containers, waiting for them to be healthy, and
tearing down the old ones. This means "failure" is a normal part of your
normal software lifecycle.

Linkerd is one of many attempts to make "normal software" behave well in this
world, without relying on too much intelligence in the HTTP client libraries
(or other non-HTTP stuff, for that matter.)

In a perfect world, everything would respect and obey DNS SRV records. Oh if
only most client software knew what they were, and how to handle them! Right
there you can see which backends are available for a service, what ports
they're on, what their priorities are, and how long that information should be
considered valid. But alas, nothing really supports SRV records. So we need
something like this until client software becomes more intelligent (if ever.)

~~~
jmspring
Even if they did, the issue w/ caching DNS info will possibly still bit one.
Java apps is one specific example that comes to mind.

~~~
koolba
I suppose you could get around that with custom DNS server. Something like
xip.io but resolving to the parent and allowing the prefix as a cache buster.

Ex: foo.some-delim.baz.bam.example.com would resolve to baz.bam.example.com
regardless of the foo prefix. The client could then iterate or randomly
generate values for foo to ensure fresh DNS lookups.

------
wyldfire
> linker∙d is a transparent proxy that adds service discovery, routing,
> failure handling, and visibility to modern software applications

One or more decades ago, these problems were proposed (and solved?) by IETF
working groups. I could believe that linkerd offers some new features not
considered by those RFCs. But it would be nice if it were extensions of
existing protocols instead.

Having a transparent proxy could offer a level of convenient integration
rarely found among disparate internet protocols. Then again, there might be
really good reasons why collapsing too many features into one layer is a bad
idea.

~~~
jdc
Which RFCs did they propose the solutions in?

~~~
wyldfire
DNS-SD: RFC 6763

Service Location Protocol: RFC 2608

RSerPool: RFCs 5351-6

~~~
wyldfire
Also, of notable mention is DDS [1]. It's standardized by OMG (what's not to
love about that name?) not IETF.

It's a really interesting pub/sub style IPC. But really it's like a swiss-army
knife. CORBA lovers (and haters) will recall IDL, it's used as the description
for the interfaces used. IMO the commercial implementations are more feature
complete than the open source ones.

If you looked at CORBA in the 1990s and have a sour taste in your mouth, don't
hold it against DDS. It's a really great approach. It describes interfaces
with a dynamically negiotiated Quality-of-Service among participants. The set
of reactions to QoS policy violations effectively create hooks for all of your
interesting cases.

IMO DDS works best when all of your participants share a broadcast domain, but
it's not necessary to set it up that way.

[1]
[https://en.wikipedia.org/wiki/Data_Distribution_Service](https://en.wikipedia.org/wiki/Data_Distribution_Service)

------
Gaelan
This post does a good job of telling me what linkerd _is_ early on. I don't
think a lot of projects realize posts like these are the first many see of the
project. Maybe do a check for first visit and perhaps HN referrer?

------
MeteorMarc
In dutch "linkerd" means a cunning person with an evil touch.

~~~
emmelaich
Interesting, Google translate gives 'gauche' which means clumsy/awkward. Which
makes sense since 'linker' means left and gauche is French for left.

It sounds like linkerd is closer to 'onhandig' \- does it really imply an evil
touch? (sinister - there we go with the left hand thing again)

~~~
MeteorMarc
If you want to use Google translate, first expand with a dutch source and then
translate:
[http://www.encyclo.nl/begrip/linkerd](http://www.encyclo.nl/begrip/linkerd)

1) Fraud 2) Adult Person 3) Most common guy 4) Convenient guy 5) Linkmichel 6)
Nice handy guy 7) Beautiful guy 8) Smooth 9) Smarter 10) Slapped person 11)
False nature

I prefer my own translation

~~~
emmelaich
Thanks, I should have added I didn't mean to say mine was a better
translation.

Just musing out loud.

------
threesixandnine
'Industrial-strength operability for cloud-native applications' \--from the
website front page

What is a cloud native application?

~~~
cschmittiey
"The concept of the service mesh as a separate layer is tied to the rise of
the cloud native application. In the cloud native model, a single application
might consist of hundreds of services; each service might have thousands of
instances; and each of those instances might be in a constantly-changing state
as they are dynamically scheduled an orchestrator like Kubernetes. Not only is
service communication in this world incredibly complex, it’s a pervasive and
fundamental part of runtime behavior. Managing it is vital to ensuring end-to-
end performance and reliability."

[https://blog.buoyant.io/2017/04/25/whats-a-service-mesh-
and-...](https://blog.buoyant.io/2017/04/25/whats-a-service-mesh-and-why-do-i-
need-one/)

~~~
dominotw
The cloud native model combines the microservices approach of many small
services with two additional factors: containers (e.g. Docker, which provide
resource isolation and dependency management, and an orchestration layer (e.g.
Kubernetes), which abstracts away the underlying hardware into a homogenous
pool.

------
pbreit
Could someone explain this in plain english?

~~~
moondev
It's basically a configurable proxy for communication between services.
Kubernetes has the concept of "services" which allow you to declare a dns name
for service discovery, but it's fairly barebones.

By using linkerd, you manage and configure various things like load-balancing
strategies, retry-windows, circuit breaking etc etc. Instead of doing these
things with a library inside the application like hysterix, you move it to
it's own layer. This also has the benefit of being code agnostic, so you can
leverage the same service concepts for any type of app. You can also run
linkerd outside of kubernetes.

~~~
nwmcsween
Ok so why not DNS-sd with service monitoring to balance, etc?

~~~
moondev
That's what this does and more. What you described are techniques. Linkered is
an implementation. From [https://blog.buoyant.io/2017/04/25/whats-a-service-
mesh-and-...](https://blog.buoyant.io/2017/04/25/whats-a-service-mesh-and-why-
do-i-need-one/)

Linkerd applies dynamic routing rules to determine which service the requester
intended. Should the request be routed to a service in production or in
staging? To a service in a local datacenter or one in the cloud? To the most
recent version of a service that’s being tested or to an older one that’s been
vetted in production? All of these routing rules are dynamically configurable,
and can be applied both globally and for arbitrary slices of traffic. Having
found the correct destination, Linkerd retrieves the corresponding pool of
instances from the relevant service discovery endpoint, of which there may be
several. If this information diverges from what Linkerd has observed in
practice, Linkerd makes a decision about which source of information to trust.

Linkerd chooses the instance most likely to return a fast response based on a
variety of factors, including its observed latency for recent requests.
Linkerd attempts to send the request to the instance, recording the latency
and response type of the result.

If the instance is down, unresponsive, or fails to process the request,
Linkerd retries the request on another instance (but only if it knows the
request is idempotent). If an instance is consistently returning errors,
Linkerd evicts it from the load balancing pool, to be periodically retried
later (for example, an instance may be undergoing a transient failure).

If the deadline for the request has elapsed, Linkerd proactively fails the
request rather than adding load with further retries.

Linkerd captures every aspect of the above behavior in the form of metrics and
distributed tracing, which are emitted to a centralized metrics system.

------
zanchey
[https://lwn.net/Articles/719282/](https://lwn.net/Articles/719282/) is a good
writeup on the features of linkerd in a reasonably easy-to-understand way.

------
blibble
could someone explain the purpose behind this software? nearly all of the
features it provides should belong in the HTTP client itself

it seems unlikely that you make your software faster or more reliable by
adding more hops

~~~
vr46
Well, we have a dozen+ services running in a container platform and when (a)
trying to present a uniform API to the outside world with versioning in the
URL, (b) managing internal service discovery and (c) doing A-B testing +
blue/green deployments, this gets tricky. Especially when you want to try and
be event-driven and monitor service lifecycle events.

We could test out Kong, Consul, Fabio, and forget the event-driven callbacks;
We could use Route 53 and ELBs; or we could just use Linkerd to drive this and
get some pooling features and monitoring features as well.

I think of it as a layer both on top of the services and also a pool that the
services swim around in and it ensures that there's one place to manage
traffic, the services don't have to care how to do service discovery for other
services and it's all being load balanced at a reasonable price.

~~~
blibble
things like naming, load balancing and service discovery can be done by a well
managed dynamic dns system

monitoring / "circuit breakers" would probably require a proxy I guess

~~~
sagichmal
Speaking as someone who has successfully implemented service discovery and
load balancing using DNS and DNSSRV, let me emphatically state that these are
not the right tools for the job. They are slow, client support is so variable
it can only be called broken, and they're extremely difficult to rationalize
and debug.

~~~
blibble
as a counter example I've implemented it myself for non-trivial systems using
short TTL SRV records, and it's worked very well indeed

you can use existing tools to introspect state (e.g. dig), and with sufficient
logging it's perfectly debuggable

------
mrmrcoleman
Congratulations to the team!

------
nailer
Short version: a service discovery, management and communication platform.

Helps the instances doing X find and talk to the instances doing Y.

------
hestefisk
So these guys just reinvented a Frankenstein of DNS and UDDI. Nice.

------
jp_sc
I tought it was Linkedin for nerds

------
weavie
Linkerd, namerd, etcd ... what does the d stand for in all these apps?

~~~
rejschaap
It stands for daemon. Which is a type of program that runs as a background
process.

------
sudeepj
Is it conceptually similar to ESB
([https://en.wikipedia.org/wiki/Enterprise_service_bus](https://en.wikipedia.org/wiki/Enterprise_service_bus))?

------
dkarapetyan
> ... service mesh for cloud native applications

Need I say more? Paging Alan Kay to tell these folks what the origins of an
unnecessary complicated mess looks like.

~~~
moondev
Unnecessary and complicated for whom? If you don't need circuit breaking,
retry windows, metrics, ack/nack and advanced load balancing techniques for
your platform then you are not the target market. Nobody is forcing you to use
it.

~~~
fernandotakai
if you have a bunch of thrift servers, linkerd is a must. it really makes
handling thrift connections much easier.

