I hate to be nit-picky but I do feel that articles like this do more harm than good by oversimplifying a fairly advanced architectural pattern by downplaying the testing, resiliency and deployment approaches.
* easier to test - this is flat out wrong. Your application or system logic is spread out across multiple process boundaries - the only way to test it, is to deploy the dependent services on your dev machine and test the set or design your services so that all the application logic can run in a single process. Think spark and spark workers that move from a thread to a distributed model through configuration. Application logic can be tested with this approach, but not necessarily system behaviour (which can be simulated)
* rapid and flexible deployment models - in a large microservices fleet where code is managed by multiple teams and sits in several repositories the dependencies are not explicit (I can not do a "find usages" on a API call and see all microservices that are using it) - so deployments can be decoupled but there tends to be lots of breakages unless you have sophisticated well-thought out testing (see first point).
* resiliency - I'm not going to expand on that because 1st and 2nd point allude to a brittle system and hence reduced resiliency. There's also data and transnational boundaries across services which need to be addressed to. To be fair, there is a hint to solving this problem through something like kakfa but it isn't called out.
Microservices can be simpler with good tooling. Kubernetes is only a very small part of it. Maybe a follow up article may be good to clarify on what tooling can be provided for 1) better testing 2) reliable deployments 3) greater resiliency.
I'm not 100% convinced there's inherent technical value until you're running at the kind of scale the big hitters do, but by then you're also looking at the hosting solution as a whole, not a deployment in a cloud. Docker, Kubernetes, offer the illusion of being easy but as soon as you start getting serious, they are anything but.
What it does do, for smaller businesses, is create a more-or-less 1:1 mapping between a team structure and a deployment pipeline. At the end of it you're distributing functions in a codebase and depending on the network for resiliency, as opposed to the language or the VM.
At the same time, the knowledge of these systems has great value because those skills are in demand now.
The biggest short coming right now is that Windows support for both Windows and Linux containers (LCOWS) is really immature, and Windows containers in particular isn't mature enough at this point. I'm dealing with this for some testing scenarios where containers are simpler than dedicated VMs.
* enables dev teams to do their own ops (mostly) instead of having a centralized ops team which needs to set things up and be on call for stuff when it break. This allows scaling up your organization, since once a couple of teams are setup, you can simply keep adding more and more teams and replicate the same deployment pipeline to all
* trains development teams to do more ops level stuff without getting lost in the weeds. They no longer need to worry about DNS, SSL Certs, LoadBalancers etc., they just use kubernetes ingress and services. Of course they can dig more if they want to but the defaults appear to be sensible enough for most
* promotes a cattle v/s pets model and allows teams to rapidly iterate without worrying about breaking the system in unknown ways
* Manual Testing
* Automated Unit Testing
* Black Box Testing
* Smoke Testing
* Functional Systems Testing
It also can vary by application design and what tooling you have in place.
In a monolithic architecture where your only service boundaries are frontend-to-backend and backend-to-datastore, you still need some integration testing but not a ton. If your backend is comprised of a network of microservices, you need to integration test all of those points and, most likely, all of the data access layers that each microservice has.
Structurally you can still use a mono-repo for all your deployed functions so that they can eve share some test functionality.
An application designed with microservices can be thought of as a monolith with unreliable network calls between the components. That is going to make most things more difficult not easier. Sure microservices might encourage you to design your application in a more modular way, but that's a bit of a strawman argument.
Second - would mixed architecture solve my problems ?
Third - would microservices solve my problem better?
Not every company has to be a Netflix or Zalando or Google...
I think we finally at the point when people should stop jumping on buzzwords and do the hard work of going through pros and cons of every solution.
I had to build a SaaS platform for hosting a business webapp, with the particularity that every client had their own separate codebase and database (no multitenancy). It was a simple django-like stateless app in front of Postgres and we had a low availability SLA (99.5%) so I thought it was simple, just deploy each app to one of a pool of servers, with a single Nginx and Postgres instances on each server, then point the client's (sub)domain to it.
In the end, it worked fine, but since I couldn't find anyone doing the same, it meant I had to write a whole bunch of custom tooling for deployment, management, monitoring, backups, etc.
Were I starting now and decided on Kubernetes, the solution would be more complex and less efficient, but it would mean we would have one-click deployment and pretty dashboards and such in a couple of days instead of many weeks. And if we had to bring someone in, they wouldn't have to learn my custom system.
Buzzwords are a kind of poor-man's standards, they provide some sort of common ground among many people.
However, I would say that it's really early days for K8S and the ecosystem around it. As long as K8S does not try to solve every problem in the world and focus on the problems it's designed to solve, things will get easier and then may be a 60-min video can do some justice. ;-)
That doesn't mean microservices are bad. It is incredibly well to have a well defined border between services, if you manage to draw the border right. And the extra amount of planning that goes into thinking about where to draw the borders and how to design the interfaces is already a big win. They can really help you keeping a feeling of freedom when it comes to changes, because every service is just a box with a managable number of inputs and outputs. Or at least it should be..
They straight up have hundreds of pre-generated tests depending on what your functions input is, and you can run _every_ test locally.
There are pros and cons to both.
But when "easier testing" (from OP) is said to be an advantage of microservices specifically...I am not sure what to say.
I mean it is not like every test has to test the entire monolith. And it is not like testing each microservice in isolation is always going to be sufficient.
It is entirely orthogonal. Of course a well-designed microservice is easier to test than a monolithic ball of mud, but a well designed monolith is also easier to test than a poorly designed spaghetti of microservices.
It is as if some people think "good code" and "microservices" are synonyms. No. They are orthogonal.
Industry is always going in circles. Fads come and go.
In many ways, Cloud Functions are very similar to a horizontally scalable stateless monolith backend. When you break up services small enough, the "monolith" arises again as simply the sum of what you deploy in an organization.
I think the service boundary ends up being a very good place to inject "fakes", because that boundary is not artificial like it is when you fake out parts of a monolith. The RPC service has two methods and that is the only way _anything_ can interact with the real service, so faking out those two RPC calls let you write focused tests easily.
Obviously you can have service boundaries in monolithic applications, but they are easy to ignore "just this once". By having an API boundary enforced in production, you avoid these problems (or the workarounds become more creative, but that's easier to say no to).
The average monolithic app sitting in production is quite difficult to test because no thought is given to internal APIs. Things rendering HTML make literal database queries, and so the only way to test things it so just run the whole monolith against a local database. That ends up being slow and flaky.
Basically, microservices forces code to do less. When code does less, it's easier to test. When any HTML page can write to your database, you have a mess on your hands. That is totally orthogonal to microservices, but microservices enforce your API contract in production, which I find valuable.
Except now you have to cover cases like “the service is down”, “the service is too latent”, “the actual outputs from the service differ from the documentation”, “the service is behind a reverse proxy that mutates headers in a surprising way”, “you are behind a reverse proxy that mutates headers in a surprising way”, etc.
Latency is always going to be a problem and moving things to another computer certainly doesn't decrease it. Everything these days has pretty good support for observability; opentracing to inspect slow requests, prometheus to see how things are doing in general. You can get a handle on it and it doesn't cost much. My team is moving from a PHP monolith that has so much framework code that an empty HTTP response takes 100ms minimum to generate. None of our microservices are that slow, even when 3 or 4 backends and gRPC-web <-> http translation are involved. But it does set an upper bound and that's a reasonable concern to have.
Monolithic apps are not freed from latency; they read from disk, they talk to a database server, etc. So application developers already have this under control (or have filed it away in the "don't care" bucket); for example, every function in Go that does I/O probably takes a context. The context times out and it cancels your local operation just as easily as it cancels a remote operation. So I don't think this is a new concern, or one that people should be too afraid of, other than getting that last bit of performance out of the system.
As for proxies inside your cluster intercepting traffic to other pods and mutating headers in surprising ways... I recommend not running one of those. (Yes, those magical service meshes are some of those. If you don't know why you need one, I recommend living without one until you know you need one. It may be never!)
If you don't control your network, you won't have good luck talking to services on the network. It is orthogonal to microservices; you will be using the network more, so one that's bad will hurt you more. But in general, if you control your infrastructure and the nodes on it, you won't run into a "reverse proxy that mutates headers in a surprising way”. If there is one of those, I recommend killing it rather than not splitting up logical services into separate jobs.
We also work hard to make Linkerd incremental. It doesn't "touch every TCP connection" so much as "touch every TCP connection that you explicitly tell it to".
Pods are not free either, at least not on Amazon. You get 18 pods per node (at least on t3.medium, which is what I use), and daemonsets quickly eat into that and make every additional node less useful as you increase cluster capacity. In a world where you're already running aws-node, kube-proxy, jaeger-agent, prometheus-node-exporter, and fluentd, you have to be judicious about the value of additional per-node services. I see the benefit in linkerd, but not all the extra stuff it comes with. Having an envoy cluster do gRPC load-balancing between services is enough; yes, you can't tcpdump the streams, it doesn't transparently add TLS, it doesn't configure itself through Kubernetes objects, and it doesn't quite insert the observability that linkerd does... but it does work well and comes with less tooling and resource cost.
However if the team is undisciplined (or the org is just too big to achieve tight alignment), then having some enforced architectural boundaries (bounded contexts) inside which the complexity is capped will at least limit the scope of poor architecture inside a specific service, and generally puts a floor on unintended coupling.
There's another dimension too though; I think that microservices require a much higher level of devops / CI/CD maturity to do well. So the maximum value might come from poorly aligned but very ops-savvy orgs, whereas the minimal value would come from highly aligned and disciplined orgs that don't have a strong devops/automation skillset.
Not sure which dimension is more important though.
>Change their database schema.
>Release their code to production quickly and often.
>Use development tools like programming languages or data stores of their choice.
>Make their own trade-offs between computing resources and developer productivity.
>Have a preference for maintenance/monitoring of their functionality.
Every single one of those is an organizational problem. Microservices are fantastic for solving the problem of "very large organization trying to manage multiple releases from multiple teams with lots of interconnecting dependencies". But they solve that problem with a giant flaming chainsaw with greased up handle. It works but don't use it unless you have to.
If not, then no, the unintended coupling is not limited to within a service, it will find its way to the API between the services.
It is more about test-driven development than anything else.
You need a policy to do micro-services in the first place. You can also have policies (enforced by peer review) to have well-tested code and internal API boundaries. It doesn't come by itself, but neither does microservices.
At the end of the day you need development processes driven primarily by automated tests and good coders. Bad coders will make a total spaghetti mess of microservices too, in fact the consequences of microservices can be catastrophic and crippling if the developers don't properly understand how to write distributed systems etc. (trust me I've seen it happen).
Difference is, when you have that micro-service spaghetti mess, refactoring it to something with clean nice boundaries can be harder than with a monolith -- since the position of the boundaries tend to be less malleable. (This varies depending on how the microservices are tested & deployed, and also, how "malleable" and how "hard" a boundary is is really orthogonal.)
If micro-services does anything, it is to raise the bar for what developers can succeed in working with it. Put those same developers at work on a monolith though, and I think the results are the same. It's a filter for good developers, more than the things you say, IMO.
Can easily happen for instance when architects that are too removed from the actual code are choosing the boundaries. Then developers have to work around it.
There will still be coupling issues. It just goes over the network, and is a lot harder to refactor.
I think this is something like a No True Scotsman fallacy. Somehow people see a tightly coupled mess distributed over the network "not real microservices". Ok, but the people who made it intended to make real microservices, and that is what is important.
I would also add that a well-designed monolith is inherently easier to test than a functionally equivalent well-designed microservices architecture.
"How do you know it works?" is a common reply to this line of reasoning, obviously we all have an idea of what "working" looks like, it's just whether or not we have gone the extra step of formalizing it in the form of tests.
I disagree very strongly, and it is also part of why I believe monorepos are generally a mistake.
Microservices are a natural extension of things like decoupling and Single Responsibility Principle.
Just because you superficially could achieve similar effects with gargantuan amounts of tooling and imposed conventions in a monolith class or something is absolutely no type of refutation of the fact that modularity and separated boundaries between encapsulations of units of behavior represent a better way to organize and structure the design.
It is no different and there is no leakiness to the same abstraction when you move to discuss services instead of classes or source code units, or polyrepos v monorepos. The abstraction definitely can become leaky if taken too far in other domains, it just happens that the abstraction is depicting precisely the same organizational complexity properties in the case of source code -> service boundaries -> repository boundaries.
They only "naturally decouple" if you draw the lines between units correctly in the first place. And if you are able to do that well, that's the most important step in making any code turn out well regardless of the size of the services and how much of the boundaries are in the same class/repo/process/service. It also correlates with the "tendency to cheat" within a single service.
There ARE real advantages to micro-services, sure. But you trade them against an ability to quickly refactor if it turns out you drew the lines completely wrong initially. Or perhaps you end up with something that becomes very complex that could in fact have been short-circuited and replaced by 5% as much code by looking at the problem from an entirely different angle -- which you never do because of the pattern that has settled in how the micro-services were divided.
(At the end of the day, the code that is simplest to maintain is the one you don't need to run..)
So I maintain that it's a trade-off.
This seems relevant: https://xkcd.com/2044/
Salad is generally better for your health than red meat. However, some people eat so much salad, with so much dressing, that it ends up being worse than red meat. Meanwhile, with great care about meal planning and moderation, some people stay pretty healthy eating red meat.
Therefore red meat is actually healthier than salad.
Things are always perfect when reading a simple blog post presenting the happy path. How to check what services are down, how to react to the fact, how to come back? Nah. The fact your µs RAM access just became ms network access? Don't care. Someone just decided to change the interface of their microservice so 2 others are not compatible anymore? That's just "microservice done bad". Being able to see the flow of things and add breakpoint where you need it? Nope.
It is funny to see this kind of problem when you've already experienced them in the embedded world with software components in cars or just in distributed computing.
Most application will never see the kind of scale where adding the kind of code and tools overhead have any RoI. So you end with products released too late or products so brittle you may has well not have launched.
This doesn’t make much sense, unless you’re debugging normal-to-high quality code microservices, and still find the code to be worse than average case monolith services.
It seems from your comment that you assume one can always work with only one service, and not need to consider the whole system of services acting together. That is naive.
It’s like complaining that someone mocked out a complex submodule in a unit test, so your breakpoint descends into a mock instead of the real thing. You’re mistakenly wanting the wrong thing.
Testing that spans service boundaries is a known entity. Most of the time you want to be testing one service in isolation and mocking out any dependency calls it makes to other services.
But in cases when you want to do integration or acceptance testing involving multiple live services, that’s fine too. You could for instance run the suite via something like docker-compose.
But if you want the debugger to step through the internals of some effectively third party dependency, that’s just a poor approach to debugging. You need to mock that away and isolate whether the third party entity (whether it’s an installed package, separate service, whatever) is really to blame before descending to debugging in that entity.
Imagine if someone is debugging a data processing pipeline task. It makes a service call to a remote database. You really think your debugger should follow the service call and step through the database’s code? That’s a terrible way to debug. That example extends perfectly well no matter what the service call is into, whether it’s local or remote, whether it’s in the same language or runtime or not...
Context is everything.
I am mainly advocating that it depends on the kind of coders on the teams, how many teams, how sure you are about the up front design / boundaries between services (I have seen such boundaries drawn VERY wrong, so wrong that nothing else ever mattered), how sure you are about the spec, etc
Start with monolith and refactor smaller services gradually as the design solidifies...
I think you're being too charitable accepting this analogy at all - somehow microservices are presented as something obviously and inherently better (salad) versus non-microservice approach (red meat).
If we're going down the route of silly analogies which are terrible way to argue anything, how about this:
Non-microservice architectures are normal diet of meat, fish, vegetables, grains and sugar which you can keep under control if you have any idea of what you're doing. Microservices are gluten-free diet - very popular for no good reason, it makes everything harder and you should only pursue it if you have very good reason to and you understand the cons.
Yes, this is called the Single Responsibility Principle, in this case applied to service architecture. More generally it is a property of modularity and decoupling.
All else equal then satisfying these properties is better than not satisfying them.
The all else equal assumption clearly holds in practice, where people write equally awful code in both cases and so microservices introduces no additional tech debt yet it does introduce SRP and modularity benefits.
Could you find specific examples of monolith services with small enough tech debt that they outperform some specific other example using microservices? Of course.
Does this matter for reasoning more generally about which pattern is better ceteris paribus? Very little, probably not at all.
Sometimes the coupling just jumps into the network/API layer. (Why would it not?) This can happen unless your initial divison into services was perfect (and if you indeed have that much foresight, there would be no reason why a monolith would accrue tech debt either, there would be no temptation to add debt).
The main difference is that when you discover that the initial divison into "Responsibilities" were wrong, it is easier to change and come up with another set of "Responsibilities" in a monolith and deploy the refactored service as a unit.
You talk as if you can just initially define the Single Responsibility then things will be fine. But where I have seen real failure is in identifying those initial responsibilities and choosing the wrong way to look at the total system.
My experience is with monoliths having less coupling and I suspect that the cause is that monoliths are easier to refactor as the requirements change; refactoring the very structure of the service mesh while keeping things running is such a big task that one is more tempted to start adding hack in the API layer.
Yes, one is then violating the Single Responsibility Principle. But if an organization sits there and needs to change the requirements within some deadline -- it is not going to spend 3x the cost and time because a hack violates some principle -- and the alternative is the wrong service taking on some extra work.
If you want to retort "but then they are doing microservices wrong" then I say No True Scotsman. And one could say exactly the same about monolith tech debt too..
This is generally not true in my experience, because the degree of implementation-sharing and reliance on common leaked abstractions is so high in monolith codebases.
Through great concerted effort, some highly disciplined teams might not fall into that ubiquitous problem of monoliths and for those exceedingly rare teams your way of thinking could work. But this is so rare it is inapplicable when considering which approach to use in general cases.
I’ll also say that I’ve worked on several monolith services and several microservices stored in dozens to hundreds of separate repos. The tooling cost to make either pattern work at scale was the same, but refactoring was so much easier with polyrepos that each isolated services. Just spin up a new repo and redraw the service boundaries.
Finally, many times services become associated with a fixed, versioned API, and must support backward compatibility for long periods of time. In these cases, redrawing service boundaries is usually not desirable regardless of initial mistakes, until you hit a point when you can release a new major version of the services. In the polyservice / polyrepo case, this is very easy, and the repos and separated code for v2 need not have anything to do at all with v1, and can be developed entirely in parallel, with mocked out assumptions of service boundaries or reliance on legacy v1 stuff.
If you saw a coupled mess of microservices with a lot of technical debt you would probably say that it is not "Microservices" because it is violating the Single Responsibility Principle all over the place. They just tried to do microservices -- but didn't manage to -- so do you then count it as a failure of microservices thinking?
If not then propose a new architectural alternative: The Debt-Free SRP Monolith!!
Sadly, organizations cannot choose to either make a SRP Microservice system or a Debt-Free SRP Monolith. They can only attempt. And I am yet to be convinced to attempting Microservices is that correlated with achieving SRP.
Obviously if the baseline levels of tech debt or poor implementation are not equal, all bets are off.
Very true, robotics/avionics/etc. have been using "microservices" for a long time, nothing new here. Robot Operating System (ROS), Data Distribution Service (DDS), Lightweight Communications and Marshalling (LCM), and others all encourage that architecture.
ldap: authentication microservice
syslog: logging microservice
smtp: messaging microservice
smb/nfs: file storage microservice
I do think the distinction between "services" and "microservices" is a bit overblown. Clearly something like IMAP is not a microservice - it contains auth, storage, search and a few other pieces. Calling LDAP a microservice is potentially a stretch for similar reasons. But fundamentally, the concerns and architecture of both are extremely similar.
I just realized that adding micro- or nano- or -oriented is a great way to create a buzzword in the current climate. Say, microsecurity, nanolearning, or anger-oriented user interface.
Microservices are services within an application.
Then the MUA connects to an IMAP proxy which talks to LDAP to authenticate and to determine where the messages for this user are stored (again, possibly different LDAP instances), then connects to an IMAP backend that retrieves the data from a clustered file system or object store. The IMAP proxy, IMAP backend, and MDA are separate systems. The object store is, likewise, a separate system.
Meanwhile some of your users are using a webmail client as their MUA. That talks to an outbound-only MTA and the IMAP proxy, but it may talk directly to LDAP for authentication rather than authenticating to the mail servers first. It can pull user preferences from LDAP. It pulls their contact book from LDAP. These might be three different LDAP instances. I has a calendar app in the same page in another tab, but that talks instead to a separate CalDav server. The folder pane which updates with the number of unread messages in each folder updates through a different backend process on a different web server from the listing of mail in your current folder, which is a separate web server from the one that just fetched the content of the highlighted message into your preview pane.
Meanwhile. half of these systems actually forward through another MTA which makes no final delivery decision itself but scans for spam scoring. Those messages that make it through the filtering get forwarded to another service which only scans for malware. Then those messages which pass go to the system that forwards the mail for final delivery to the user or to the remote party's mx server.
All of these systems need their timestamps correct, so they all talk to an NTP service running alone on a VM or container that does nothing else.
All of these systems send their logs to a central logging cluster via a defined protocol. The logging servers do nothing else.
Just because your mail server might run qmail, Courier, Amavis, SpamAssassin, procmail, and mutt on one box with local storage and local logging doesn't mean that's how mail is done at scale. It's pretty clear to me how if you think of "email" as the application that it is composed of microservices.
Microservices are called that because they reside within an app that itself provides a service.
Otherwise, of course you could call everything and anything 'microservice' but then the term would become meaningless.
- Unit test the individual functions and classes using gtest or nose (this is built into catkin, this buildsystem).
- Integration test your ROS APIs bringing up an actual ROS multi-process system and having a designated test node which exits to deliver the pass/fail verdict (this is managed by the "rostest" package).
The much-touted advantage of microservices is supposed to be that you have these obvious service interface boundaries at which to test, but the reality in ROS land is that it's a lot of work to test there. It's work to generate the binaries or playback data, failures are harder to understand, and because you're at the mercy of RPC timing, nothing is deterministic, so your tests end up full of fudge factors and tolerances.
And on top of that, the tests themselves execute way slower, since there's so much more set up and teardown.
Is that a failure in the framework? Not sure, but the end result of it is that you only test at the RPC boundary that functionality which can only be tested there; everything else is stuffed into library functions that can be verified with gtest.
This is not a hypothetical question — I'm actually looking for a better way to solve this right now in Aether's business infrastructure right now.
Large software companies like Google/Facebook home build their own opinionated frameworks for publishing services that include citing of dependencies via config files. Internal engines then scrape for these configs and manage the relationship topology across environments. As far as SWEs are concerned its like magic.
I'm working on trying to standardize such a framework, https://docs.architect.io, and would love preliminary feedback.
- Uptime SLAs will probably be very high for such a central service, so HA backing database choices and / or read-only replicas. For authn & authz caching session tokens / API keys / policy decisions / ... is fairly essential to not overload your DB and keep latency down to acceptable levels.
- Work out what latency budget you can afford given that every user action is going to have to go through this service, possibly multiple times. Stop the world garbage collected languages probably aren't ideal here. Judicious use of caching goes a long way.
- Have load shedding / circuit breaking / per-service quota mechanisms in place to prevent issues cascading around your systems. Exponential back-off is a lifesaver.
- Have good integration tests to catch regressions (functionality & performance), roll-out / roll-back mechanisms to catch the ones you miss. Test these with known-bad changes every once in a while.
I've recently been doing a load of development on similar systems, so happy to give my two cents - email is in profile if you'd prefer. The two SRE books are a worthwhile read.
But with microservices coupling is as loose as possible.
Your authentication microservice provides an interface (REST or whatever) independent on the underlying implementation and it does not matter how it implemented. As long has the interface 'contract' is fulfilled you can re-implement it from Java to Rails, or move it from here to there, if you wish and that is completely transparent to all other microservices.
yes. because making a page render dependent on the availability of 10s of independent network services is going to greatly improve your resiliency.
I guess it's a matter of perspective.
If you are willing to count a partially working experience as still working, then yes, you might be able to say you're more resilient because only, say the service to add a product to the basket is down - you can still look at products, so technically, you're still up.
That's not how availability is measured normally though. In my (and definitely in my users) book, a site where some stuff is broken is broken. Period.
And if that's the environment you're working in, you're really not improving matters by adding unnecessary network layers between components of your application.
Yes. In my next job search I'd like to find a way to feel out a team for whether they adopt new patterns reflexively and dogmatically vs. consciously and contextually.
Wow. Has the author any real-world experience from using micro-services? Or is it just word-of-mouth? Because I have a complete different take on this.