Most people think a micro-service architecture is a panacea because "look at how simple X is," but it's not that simple. It's now a distributed system, and very likely, it's a the worst-of-the-worst a distributed monolith. Distributed system are hard, I know, I do it.
Three signs you have a distributed monolith:
1. You're duplicating the tables (information), without transforming the data into something new (adding information), in another database (e.g. worst cache ever, enjoy the split-brain). [1]
2. Service X does not work without Y or Z, and/or you have no strategy for how to deal with one of them going down.
2.5 Bonus, there is likely no way to meaningfully decouple the services. Service X can be "tolerant" of service Y's failure, but it cannot ever function without service Y.
3. You push all your data over an event-bus to keep your services "in-sync" with each-other taking a hot shit on the idea of a "transaction." The event-bus over time pushes your data further out of sync, making you think you need an even better event bus... You need transactions and (clicks over to the Jepsen series and laughs) good luck rolling that on your own...
I'm not saying service oriented architectures are bad, I'm not saying services are bad, they're absolutely not. They're a tool for a job, and one that comes with a lot of foot guns and pitfalls. Many of which people are not prepared for when they ship that first micro service.
I didn't even touch on the additional infrastructure and testing burden that a fleet of micro-services bring about.
[1] Simple tip: Don't duplicate data without adding value to it. Just don't.
We've moved a lot of services into Kubernetes and broken things up into smaller and smaller micro-services. It definitely eliminates a lot of the complexity for developers ... but you trade it for operational complexity (e.g. routing, security, mis-matched client/server versions, resiliency when dependency isn't responding). I still believe that overall software quality is higher with micro-services (our Swagger documents serve as living ICDs), but don't kid yourself that you're going to save development time. And don't fall into the trap of shrinking your micro-services too small.
The big trade off is the ability to rewrite a large part of the system if a business pivot is needed. That was the bane of the previous company I worked in, engineering and operations was top notch unfortunately it was done too soon and it killed the company because it could not adjust to a moving market (ie customer and sales feedback was ignored because a lot of new features would require an architecture change that was daunting). It was very optimized for use cases that were becoming irrelevant. In my small startup where product market fit is still moving I always thank myself that everything is under engineered in a monolith when signing a big client that ask for adjustments.
Unique storage for multiple services sounds like a recipe for disaster.
The purpose of splitting services, at least one of, is to decouple parts of the code at a fundamental level, including storage and overall ownership thereof.
You're probably better served with a modular monolith if you really can't break storage up.
No, only one service is reading/writing, everything else just call that. Still, things get quite lost when it involves talking to multiple other teams and needing to keep everything in sync.
Ok, but then what's the point of splitting it in the first place?
The way I see it is to split your domain so that a team owns not only the code, but also the model, the data, the interface and the future vision of a small enough area.
If a service owns all the data, then someone who needs to make any change is bottlenecked by it and they would need knowledge beyond their domain.
So the key is defining the right domains (or domain boundaries). Unfortunately most people just split before thinking about the details of this process, so the split will sooner or later hit a wall of dependencies.
We need synchronous work flow and then asynchronous workflows. That was the primary reason. Now that doesn't mean it must split, but since we're running on multiple hosts anyway it wasn't hard to split off the asynchrounous functions to another batch.
> And don't fall into the trap of shrinking your micro-services too small.
^ this
I think the naming decision of the concept has been detrimental to its interpretation. In reality, most of the time what we really want is a"one-or-more reasonably-sized systems with well-enough-defined responsibility boundaries".
Perhaps "Service Right-Sizing" would steer people to better decisions. Alas, that "Microservices" objectively sounds sexier.
> It definitely eliminates a lot of the complexity for developers
We're currently translating a 20 year old ~50MLOC codebase into a distributed monolith (using a variety of approaches that all approximate strangler). I have far less motivation to go to work if I know that I will be buried in the old monorepo. I can change, build and get a service changed in less than an hour. Touching the monorepo is easily 1.5 days for a single change.
We seem to be gaining far more in terms of developer productivity than we are losing to operational overhead.
Sorry ... I should have said "don't kid yourself that you'll save time" instead of developer time. We do indeed have a faster change cycle on every service which is a win even if we're still burning (in general) the same number of hours over the whole system.
I also should have mentioned that it's definitely more pleasant for those in purely development roles. Troubleshooting, resiliency and system effects don't impact everyone (and I actually like those types of hard problems). I'd also suggest that integrating tracing, metrics, and logging in a consistent way is imperative. If you're on Kubernetes, using a proxy like Istio (Envoy) or LinkerD is a great way to get retries, backoff, etc established without changing coded.
Finally, implementing a healthcheck end-point on every service and having the impact of any failures properly degrade dependent services is really helpful both in troubleshooting and ultimately in creating a UI with graceful degradation (toasts with messages related to what's not currently available are great). I have great hopes for the healthcheck RFC that's being developed at https://github.com/inadarei/rfc-healthcheck.
That’s an encouraging story to hear. The thing I’ve noticed is that the costs of moving a poorly written monolith to a microservice architecture can be incredibly high. I also think that microservice design really needs to be thought through and scrutinized, because poorly designed microservices start to suck really quickly in terms of maintenance.
And also trading off with how easy it is to understand the system. If you have one monolith in most cases it's a single code base you can navigate through and understand exactly who calls who and why.
I totally agree. Mostly we are doing microservices the wrong way. We are not drawing the boundaries correctly, they are too small, they have too many interdependencies, they don't really encapsulate the data, and you end up with many interdependencies. There is not enough guidance about sizing them. We are just building distributed monoliths. Which is great for cloud companies because they get to sell many boxes.
Micro-services are just connected things which work together to accomplish something. But where are those connections described? In some tables somewhere. Maybe.
Whereas if you write a single monolithic program its connections are described in code, preferably type-checked by a compiler. I think that gives you at least theoretically a better chance of understanding what are the things that connect, and how they connect.
So if there was a good programming language for describing micro-services then that would probably resolve many management difficulties, and then the question would be simply do we get performance benefits from running on multiple processors.
Literally this. Reading the article I kept thinking to myself "which is why Erlang/Elixir is great because it doesn't make you choose up front". It's wild that with how popular Elixir has gotten that it still isn't seen as a serious contender for many companies
I write Elixir professionally, Erlang/Elixir and BEAM is no one-stop solution to these problems either. It has tools to help you, but you can very easily end up in the same boat.
I have never tried Erlang, but I have read that it doesn't have static type checking. How does it guarantee that different services are following protocols?
Erlang is dynamically typed in part, I believe, to allow hot-swapping code in a running system. It relies on pattern matching to ensure code contracts / ie your types are more like guarantees that a pattern holds true, even if some of the specifics of that protocol have changed. Thus, with zero downtime, you can make updates to the system, while still knowing that your assertions about received data matching your protocol remains true. Any adapters can act like both type update and schema migration simultaneously, as in the case where you wish to support multiple versions of an API simultaneously.
The runtime system has built-in support for concurrency, distribution and fault tolerance. Because of the design goals for Erlang and it's runtime, you get services that can all run on one system or be distributed across a network, but the code that you actually write is relatively simple; the entire distributed system acts as a fault-tolerant VM that has functional guarantees.
If your startup node fails, then other nodes are elected. If a node crashes while in the middle of a method, another node will execute the method instead.
The runtime itself has some analogies to functional coding styles. It runs on a register machine rather than a stack machine. The call and return sequence is replaced by direct jumps to the implementation of the next instruction.
I don't want advocate one way or another (micro vs. monoliths) because tomato tomato. However here are a few arguments in defense of microservices regarding these three signs you commented:
1. Microservices do not have some inherent property of having to duplicate data. You can have data in single source and deliver that data to anyone who needs it through an API. There are infinitely many caching solutions if you are worried about this becoming a bottleneck
2 and 2.5. There are tools for microservice architectures that counter these problems to a degree, mainstream example being containers and container orchestration (e.g. Docker and Kubernetes). One can even make an argument that microservices force you to build your systems so they are more robust than your monolith would be. If the argument for monoliths is that it's easier to maintain reliability when every egg is in one basket then you are putting all your bets into that basket and it becomes a black hole for developer and operations resources, as well as making operations evolution very slow
3. There are again tools for handling data and "syncing" (although I don't like the idea of having to "sync") the services, for example message queues / streaming processing platforms (e.g. Kafka). If some data or information might be of interest to multiple services, you should then push such data to a message queue and consume it from services that need it. The "syncing" problem sounds like something that arises when you start duplicating your data across services which shouldn't happen (see my argument on 1.)
Again not to say microservices are somehow universally better. Just coming into defense of their core concepts when they get unfairly accused
The trouble is that this process of streaming events over Kafka or Kinesis means that subscribed microservices will be duplicating the bus data in their own way in their local databases. If one of them falls out of the loop for whatever reason you are in trouble.
Now, there is a pattern called Event Sourcing (ES) which proposes that the source of truth should be the event bus itself and microservice databases are mere projections of this data. This is all good and well except it's very hard to implement in practice. If a microservice needs to replay all business events from months or years in the past it may take hours or days to do this. What about the business in the meantime? If it's a service that significantly reduces the usability of you application you effectively have a long downtime anyway.
Transactional activity becomes incredibly hard in the microservices world with either 2 phased commit (only good for very infrequent transactions due to performance) or with so called Sagas (which are very complex to get right and maintain).
Any company that isn't delivering its service to billions of users daily will likely suffer far more from microservices than they will benefit.
> This is all good and well except it's very hard to implement in practice. If a microservice needs to replay all business events from months or years in the past it may take hours or days to do this.
In my experience it's not hard to implement, but of course it depends on the problem domain (and probably also on not splitting things up willy-nilly because of the microservices fad). I think the key to event sourcing and immutability in general is to not overdo it. For example you will likely need to redact certain data (e.g. for legal compliance), so zero information loss is out. Systems like Kafka are a poor choice for long term data storage, the default retention is 1 week for a reason.
But the things that are wonderful about event sourcing (the ability to inspect, replay and fix because you haven't lost information) mostly materialize over a 1 week timeframe.
If you need to recover a lot of state from the event log you will need store aggregated event data at regular intervals to play back from to have acceptable performance. But in practice, in many cases the data granularity you need goes down as the data ages anyway, and you do some lossy aggregation as a natural part of your business process (as opposed to to deal with even sourcing performance problems). I.e. for the short timeframe kafka is the source of truth, but for the stuff you care long term it's some database, and this happens kinda naturally. So often you don't need to implement checkpointing.
You're right that micro-services avoid a lot of the pain of micro-services if they have one consolidated "Data service" that sits on top of their data repositories. But a micro-services architecture with a consolidated data service is similar to an airplane on the ground, it's true it can't fall out of the sky, but it's as useful as a car with awful gas mileage.
Once you add in this consolidated data service, every other service is dependent on the "data service" team. This means almost an change you make, requires submitting a request. Are their priorities your priorities? I would hate to be reliant on another team to do every development task.
Theoretically you could remove this issue by allowing any team to modify the data service, but then at that point you've just taken an application and added a bunch of http calls between method calls.
This same problem ops up with resiliency. If you have a consolidated data service, what happens if your data service goes down? How useful are your other services if they can't access any data?
I'm not following. How do things work better for the non-microservice approach?
Re. teams: For any project above a certain size, you'll have teams. If that's a network boundary, a process boundary, or a library boundary doesn't change that you'll have multiple teams for a large project.
I'm not sure I get the resiliency point. I worked on a project where the dependent data service was offline painfully frequently. We used async tasks and caching to keep things running and were able to let the users do many tasks. For us our tool was still fairly useful when dependencies went down. If we used monolith then everything would be down, right? That doesn't sound better.
> Re. teams: For any project above a certain size, you'll have teams. If that's a network boundary, a process boundary, or a library boundary doesn't change that you'll have multiple teams for a large project.
For sure, and one of the big selling points for micro-services is you can split those teams by micro service, with each team having an independent service they are responsible for. But when a big chunk of everyone's development is done on one giant service everyone shares you don't get the same benefits you would if the services were independent. Or put another way, splitting micro-services vertically can yield a bunch of benefits, but splitting them horizontally introduces a lot of pain with few benefits.
> I'm not sure I get the resiliency point. I worked on a project where the dependent data service was offline painfully frequently. We used async tasks and caching to keep things running and were able to let the users do many tasks. For us our tool was still fairly useful when dependencies went down. If we used monolith then everything would be down, right? That doesn't sound better.
I'm not saying to never spin off services. If you have a piece of functionality that you just can't get stable for the life of you, splitting it off into it's own service, and coding everything up to be resilient to it's failure makes a lot of sense. (I am very curious what the cause of the data service crashing was that you couldn't fix.)
But micro-services aren't a free lunch for resiliency. You're increasing the numbers of systems, servers, configurations, and connections which by default will decrease up-time until you do a ton of work. Not to mention tracking and debugging cross service failures is much more difficult than a single server.
Ok, then I'll just keep calm and continue with a distributed monolith. In 5 years it will be the mainstream and a new way to go. I can even imagine titles of newsletters: "Distributed monolith - the golden mean of software architecture".
> Don't duplicate data without adding value to it.
What about running multiple versions of a microservice in parallel -- don't each need their own yet separate databases that attempt to mirror each other as best they can?
I'm assuming you mean in production micro-service, if that's not the case, please elaborate a bit more...
The short answer is "no," as succinctly stated by the, I assume from the name, majestic SideburnsOfDoom. The versions shouldn't _EVER_ be incompatible with each other.
E.g. you need to rename a column.
Do not: rename the column, e.g. `ALTER TABLE RENAME COLUMN...`. Because, your systems are going to break with the new schema.
Do: Add a new column, with the new name, and migrate data to the new column, once it's good upgrade the rest of your instances, then drop the old column. Because, you can use both versions at the same time now without breaking anything. Yes, it can be a little tricky to get the data sync'd into the new column, but that's a lot less tricky than doing it for _every_ table and column.
Sounds exactly like a project I am in.
Not to mention that we have like three "microservices" that access the exact same database.
Oh, and a single Git repo.
The most basic thing I see people neglecting is that inserting a network protocol is really adding many additional components that weren't there before, often doubling or even tripling the amount of code, config, and documentation required. If there's a single large project that combines "A+B" modules with no networking and you split this into networked services, then you now have:
1) "A" Component
2) "B" Component
3) "A" Server
4) "A" Client
So for example if you started off with a single "project" in your favorite IDE, you now have 4, give or take. You might be able to code-gen your server and client code out of a single IDL file or something, but generally speaking you're going to be writing code like "B -> A client -> A server -> A" no matter what instead of simply "B -> A".
Now you have to worry about network reliability, back-pressure, queuing, retries, security, bandwidth, latency, round-trips, serialization, load-balancing, affinity, transactions, secrets storage, threading, and on and on...
A simple function call translates to a rats nest of dependency injection, configuration reads, callbacks, and retry loops.
Then if you grow to 4 or more components you have to start worrying about the topology of the interconnections. Suddenly you may need to add a service bus or orchestrator to reduce the number of point-to-point connections. This is not avoidable, because if you have less than 4 components, then why bother to break things out into micro services in the first place!?
Now when things go wrong in all sorts of creative ways, some of which are likely still the subject of research papers, heaven help you with the troubleshooting. First, it'll be the brownouts that the load balancer doesn't correctly flag as a failure, then the priority inversions, then the queue filling up, and then it'll get worse from there as the load ramps up.
Meanwhile nothing stops you having a monolithic project with folders called "A", "B", "C", etc... with simple function calls or OO interfaces across the boundaries. For 99.9% of projects out there this is the right way to go. And then, if your business takes off into the stratosphere, nothing stops you converting those function call interfaces into a network interface and splitting up your servers. However, doing this when it's needed means that you know where the split makes sense, and you won't waste time introducing components that don't need individual scaling.
For God's sake, I saw a government department roll out an Azure Service Fabric application with dozens of components for an application with a few hundred users total. Not concurrent. Total.
> 2.5 Bonus, there is likely no way to meaningfully decouple the services. Service X can be "tolerant" of service Y's failure, but it cannot ever function without service Y.
Nit: if service X could function without service Y, then it seems to follow service Y should not exist in the first place. And equivalently, the functionality of service Y before some microservice migration.
Classic example is recommendations on a product page. If the personal recommendation service is not available / slow to respond, you might fall back on recommendations based on the best sellers or even fall back further by not giving recommendations at all.
Recommendations are not necessary, and not showing them will significantly affect the bottom line, so you don't want to skip them if possible. But not showing the product page (in a timely manner), because the recommendation engine has a hiccup, is even worse for your bottom line.
Not necessarily. Maybe service X logs in a user and service Y sends them an email letting them know that someone logged in from a new location. Y adds value but it's not essential to X.
> Most people think a micro-service architecture is a panacea because "look at how simple X is," but it's not that simple. It's now a distributed system, and very likely, it's a the worst-of-the-worst a distributed monolith. Distributed system are hard, I know, I do it.
This line of argument fails to take into consideration any of the reasons why in general microservices are the right tool for the right job.
Yes, it's challenging, and yes it's a distributes system. Yet, with microservices you actually are able to reuse specialized code, software packages, and even third-party services. That cuts down on a lot of dev time and cost, and makes the implementation of a lot of of POCs or even MVPs a trivial task.
Take for example Celery. With Celery all you need to do to implement a queuable background task system that's trivially scalable is to write the background tasks, get a message broker up and running, launch worker instances, and that's it. What would you have to do to achieve the same goal with a monolith? Implement your own producer/consumer that runs on the same instance that serves requests? And aren't you actually developing a distributes system anyway?
> Take for example Celery. With Celery all you need to do to implement a queuable background task system that's trivially scalable is to write the background tasks,
that's a little bit of a straw man because that's not the "microservice" architecture this post is talking about. I personally wouldn't call that a "microservice" architecture, I'd call it, "a background queue", although strictly speaking it can be described as such.
what this post is talking about are multiple synchronous pieces of a business case being broken up over the http / process line for no other reason than "we're afraid of overarchitecting our model". This means, your app has some features like auth, billing, personalization, reporting. You start from day one writing all of these as separate HTTPD services rather than just a single application with a variety of endpoints. Even though these areas of functionality may be highly interrelated, you're terrified of breaking out the GOF, using inheritance, or ORMs, because you had some bad experience with that stuff. So instead you spend all your time writing services, spinning up containers, defining complex endpoints that you wouldn't otherwise need...all becuase you really want to live in a "flat" world. I'm not allowed to leak the details of auth into billing because there's a whole process boundary! whew, I'm safe.
Never mind that you can have an architecture that is all separate process with HTTP requests in between, and convert it directly to a monolithic one with the same strict separation of concerns, you just get to lose the network latency and complex marshalling between the components.
Programmers are, by and large, quite bad at not breaking encapsulation when they're dealing with a monolith. Not their fault, really, but when management comes to you and says, hey, can't you get it done in just a day, we really need this, and you know you could if you just hacked through that particular isolation barrier just this once, and yeah it will create bad bugs if things change in a particular way in the future, but you'll leave a comment so it will be fine and this way you can go home and have an ipa and finish re-runs of that show you like so you break the rules just this once. And that only happens a few more times, and then slowly bit by bit things become intertwined and rigid, and then five years down the road when half the people have left and the team has grown and other teams use the project and are trying to commit code to it you have to go to management, hat in hand, sorry, we need to rewrite it from scratch to keep adding features, new and exciting and unexpected things happen when we change things in the current project.
Independent services that create a coherent whole enforce isolation barriers. I don't believe in microservices. These things don't need to be micro. They can be just normal-sized services. I don't even particularly care internal to those barriers how bad things get. There are programmers that write shit I think is hot garbage and will cause all kinds of bugs as time goes on. But if they are confined to their space space, and they're solving their problem and are happy and making management happy then w/e. That piece of the whole will collapse and die eventually, but it won't take everything with it. It's just a piece, and it provided value for a while, so probably worth it from a business sense.
But when you have a monolith? And Developer Dave and Programmer Pete tell Big Boss Bob that they could sure get that feature out quick if only it weren't for those pesky rules preventing them from putting in a few mutexes in that one module so they can just read some data directly from it, and boy wouldn't it that be swell? Well Big Boss Bob says we need to get this feature SHIPPED BOYES so buckle up and put in those mutexes, and now the fucking thread goes into a deadlock state every so often but it's really intermittent so you spend hours debugging the damn things and late nights because yeah features gotta ship but shit gotta work, and you trace it back to Developer Dave and Programmer Pete and their change but what do you do? Big Boss Bob said do it that way and what? You gonna whine to Director Dan about it? Is that gonna get you back your late night spend figuring out what was going on? Nah.
Make systems that are just small that when people break them completely it doesn't mean the end of the world to throw it out and start over.
I'm really tired of immensely awkward and problematic design patterns that are intended to act as guard rails for good design. It is a myth that this actually works. A rushed project is a rushed project; a team that is inclined to overbuild will overbuild no matter what artificial constraints you give them up front. A microservice design that has concurrency problems can be extremely difficult to debug, in my personal experience this is easily much more difficult than debugging a monolith. Having to spin up 30 services in order to reproduce an issue that could otherwise be done in a single in-memory unit test is a real thing.
Not every tool is right for every job. But the idea that services provide no guardrails is, in my experience, categorically false. Like I said, not a fan of "micro", but there is a ton of inherent value in defining implicit isolation barriers. It's absolutely true that there is NO design impervious to idiots. This has always been true and will remain true until the heat death of the universe. But man, "hard" isolation barriers are nice to have, either through services or packages. I worked for a company and I was on a team that built a set of packages which were distributed to a bunch of other teams, and due to organization changes another team took ownership of one of those packages. In almost no time flat it became absolutely convoluted. Not my problem though, right? Here's the thing: it became buggy and hard for them to maintain, but it had no effect except for when using that particular package. I had worked on a project previously where there was a legit monolith, no packages or library delineations, one code base, and boy with all the teams sticking their fingers in that particular pie things got out of hand in a BAD way. It was so bad that certain teams would get into change wars where one group would make a change in a module that would break another group who would change it back and break the first group, and back and forth.
After my team had taken that project and split it into a bunch of smaller packages, it's not like everyone magically became better programmers. The same people who were introducing fucking spaghetti code that did who-knows-what in overly complex ways were still around, and they were given effectively their own sandboxes. In fact quite a few more teams within the company began using those packages, which just became modules they shipped with. We no longer had to deal with people screwing with core packages because they no longer had ownership, they could no longer make merges into that area of the code base, so we could keep things stable.
So I'm a hard sell on monoliths. Like, I'm not actually pro micro-services, per se. I'm mostly just anti-monolith. Giant code bases with multiple teams simultaneously contributing are doomed to fucking disaster.
One last comment on this, if you have like 5 guys just hacking away on some project and you feel that splitting it up into 30 micro-services is the only way to make things work then you've got problems. I'm not talking about small teams building straightforward systems. I'm talking about giant organizations building giant complicated systems, trying to design and manage that.
Hard process boundaries, largely. I mean, unless you use mmio in which case you're probably going out of your way to break things. Also design constraints should be taken into account- if you need to build highly performant systems, yeah, probably shouldn't have a bunch of services talking to each other over http/json. But you could probably also isolate the part of the system that actually needs to be performant.
Also, like I said, not a fan of the "micro" terminology, it just confuses the issue.
And keep in mind that coupling is not the opposite of encapsulation. Encapsulation just means hiding internal state. So unless you're going completely out of your way to break things, you'll have services that communicate through some message passing protocol, which means services are inherently encapsulated. It's not that you can't break that, it's that generally it's hard.
Here's an example of breaking it, though:
You have an API that talks to the database. You need to get users, groups, and preferences. Instead of writing different endpoints to access all of these things individually, clever you decides to simply make an endpoint that is "postQuery" and you give it a sql string and it will return the results. Great. You have now made the API effectively pointless.
Another example: you have an API that needs to perform standard accounting analysis on an internal dataset. Instead of adding endpoints for each calculation (or an endpoint with different inputs), you create an endpoint "eval" that will internal eval a string of the language of your choice. Congrats, you can now use that API to execute arbitrary code! No need to define any more pesky endpoints.
So yeah, absolutely people can make shit decisions and build garbage. It's entirely possible. But hey, at least it's pretty obvious this way and if you see your team do this you can look for another job.
Physical isolation. I don't actually like the arguments but I can't quite say it's wrong, because the Java landscape just put a huge amount of effort into modularity and encapsulation because developers kept using reflection to bypass module boundaries. Usually for performance reasons. With a microservices architecture that is impossible anymore because there is physically no way to read across address space boundaries without sending a message and introducing an RPC, which would require the support of the target module.
Now that said, with the new jigsaw module system in the jvm, and the multi-language support that is constantly getting better, a disciplined enough senior management team could enforce module boundaries within the process. It means any change to jvm command line flags would require the approval of the most senior tech lead, because that's how module boundary enforcement can be disabled, but if you have that and it works you would get significant performance and simplicity benefits.
You'll never ever be able to stop bad developers from making poor choices and ruining things.
So you have encapsulated services... boss comes and says "we need feature X right away". What if feature X spans all of your microservices? The bad programmer will hack together a monstrosity between multiple services. It's the micro-lith problem: instead of a monolith, you have a monolith disguised as micro services. Now their poor choices are spread across a lot of services and distributed, it's not really confined to just one service.
> that's a little bit of a straw man because that's not the "microservice" architecture this post is talking about. I personally wouldn't call that a "microservice" architecture, I'd call it, "a background queue", although strictly speaking it can be described as such.
It is not a strawman; it's a concrete example of the technical, practical, operational, and economical advantages of microservices architecture, more specifically service reuse, specially managed services provided by third parties.
While you're groking how a multihreadig library is expected to be used, I already fired a message broker that distributes tasks across a pool of worker instances. Why? Because I've opted not to go with the monolith and went with the microservices/distributed architecture approach.
> While you're groking how a multihreadig library is expected to be used, I already fired a message broker that distributes tasks across a pool of worker instances. Why? Because I've opted not to go with the monolith and went with the microservices/distributed architecture approach.
I'm so confused by that statement... because I can't for the life of me figure out how you got there.
You absolutely can have a monolith which is multi-threaded or asynchronous and "resource/task pools." The JVM for instance has threads, I use BEAM (Elixir) personally, and it's even pre-emptively scheduling my tasks in parallel and asynchronously... but, I still don't get what multi-threading has to do with microservices.
Microservices and monoliths are boundaries for your application they aren't implementation details (i.e. all microservices must be asynchronous is strictly not true) in and of themselves, they're design details. That design can influence the implementation but they are separate.
Ex. there are plenty of people who use Sidekiq and redis like you're using Celery but don't call it a microservice. It's just a piece of their monolith since it's largely the same depdencies.
beam is god damn magical. It would be hard to replicate that kind of decentralized monolith without considerably more work with any other technology.
I mean consider a gen server mounted in your supervisor tree. its 5 min of work; tops. Doing the same with kubernetes would require coordinating a message broker, picking a client library, creatig a restart strategy and networking. all of which would add considerably to your development time
I'm completely in your camp, and I'm surprised by the lack of nuance HN seems to show (especially regarding micro-services & Kubernetes).
There are many benefits to having microservices that people seem to forget because they think that everyone interested in microservices is interested in splitting their personal blog into 4 different services.
They take coordination, good CICD, and a lot of forethought to ensure each service is cooperating in the ecosystem properly, but once established, it can do wonders to dev productivity.
I can't tell if my project is a monolith or microservices, but it's going well so far. We use a single scalable database instance as a message broker and persistence source, and have a common framework implements distributed algorithms that every service uses to expose "OS-like" constructs (actor activation, persistent collections, workflows, Git etc. All communication is done through independent protocols.. there's not much coupling between protocols (except for common ones like "schedule this job on this device"), so it's not really a cobweb of dependencies, but everything relies on that single database.
I think if the database gets too overloaded I'll partition certain tree nodes across multiple masters (this is feasible because the framework doesn't rely on a single timestream).
With the level of shared code (the framework) and the single database, it's somewhat monolithic but the actors themselves are quite well-behaved and independent on top of it.
I'm a database guy, so the question I get from clients is, "We're thinking about breaking up our monolith into a bunch of microservices, and we want to use best-of-breed persistence layers for each microservice. Some data belongs in Postgres, some in DynamoDB, some in JSON files. Now, how do we do reporting?"
Analysts expect to be able to connect to one system, see their data, and write queries for it. They were never brought into the microservices strategy, and now they're stumped as to how they're supposed to quickly get data out to answer business questions or show customers stuff on a dashboard.
The only answers I've seen so far are either to build really complex/expensive reporting systems that pull data from every source in real time, or do extract/transform/load (ETL) processes like data warehouses do (in which the reporting data lags behind the source systems and doesn't have all the tables), or try to build real time replication to a central database - at which point, you're right back to a monolith.
Reporting on a bunch of different databases is a hard nut to crack.
Okay, sounds reasonable enough for a complex enterprise.
> to feed a torrent of raw data to your lake
Well, there's the problem. Why is it taking a year to export data in its raw, natural state? The entire point of a data lake is that there is no transformation of the data. There's no need to verify the data is accurate. There's no need to make sure it's performant. It's just data exported from one system to another. If the file sizes, or record counts match, you're in good shape.
If it's taking a year to simply copy raw data from one system to another, the enterprise has deeper problems than architecture.
If you are "export[ing] data in its raw, natural state" then haven't you lost the isolation benefits of microservices? Now you have external systems dependent on your implementation details, and changing your schema will break them.
That's a problem for the future data engineers to deal with. The data lake is an insurance policy so you only need to think about these problems if you later want the data. If you already know you want to analyze the data, then a data lake is not a good choice.
Yes, it makes life harder for the data engineers in the future, but it might turn out that analysts only ever need 5% of the data in the lake, and dealing with these schema changes for 5% of the data is easier than carefully planning a public schema for 100% of it.
It can be helpful to include some small amount of metadata in the export though, with things like the source system name, date & time, # of records, and a schema version. Schema version could easily be the latest migration revision, or something like that.
But if I haven't spent the effort to extract it, do I really own it? Let me argue that I don't have it because all my implemented queries turn up none of your data. You wouldn't tax me on gold that hasn't yet been extracted, would you? (End of joke.)
What I think you'd typically do is put different data under different keys/paths, so that red is personally identifiable data, yellow contains pointers to such data, and green is just regular data. You could have a structure like s3://my-data-lake/{red|yellow|green}/{raw|intermediate}/year={year}/month={month}/day={day}/source={system}/dataset={table}
Then you just don't keep red data for longer than 30 days.
Changing the schema of an upstream data source almost always breaks or requires updates to the downstream analytics system. It's an unavoidable problem whether its a microservice or a monolith; you just get to choose where you put the pain.
Consider:
Source Data -> Data Lake -> ETL Process -> Reporting DataWarehouse(s)/DataMart(s) -> User Queries
vs
Source Data -> Data Lake -> User Queries
vs
MonolithDB -> User queries
vs
MonolithDB -> ETL Process -> Reporting DataWarehouse(s)/DataMart(s) -> User Queries
A schema change in the source data should be easily updated in the ETL process in example 1. Most changes are minimal (adding, removing, renaming columns). And for a complete schema redesign in the source data, a new entry in the data lake should be created and the owners of the ETL process should decide if the new schema should be mangled to fit their existing reporting tables or to build new ones. Across the four models I outlined above, the first is by far the easiest to update and maintain, IMO.
If the benefit of microservices is primarily an organizational one around ownership, Conway's Law, and so on, then Example 1 still seems problematic, because it's likely a different team that has to deal with the fallout.
Another strategy is that the service has an explicit API or report specification; that way, the team that owns the services also owns the problem of continuing to support that while changing their internal implementation.
Of course, whether the benefits are worth the cost is probably organization specific, just like microservices in general.
The reasons for it are hard to explain succinctly in HN comment but if you look up data lake there will be a lot of explanation. But it basically comes down to "is it better for the data integrators or the data consumers to massage the data?" And data lakes was the insight it's really great when the consumers decide how to massage the data.
This implies that the engineers who lobbied so hard for microservices were even all that concerned with these benefits to begin with, and took this into account when designing the architecture of the system.
More often than not, in my experience, the developers involved are more concerned with code ownership than genuine architectural concerns.
Yeah I think you'd want the microservice to expose an bulk export API point, to an API specification. It might need to transform data most likely and possibly ignore some data. And then you grab data from those and piss them into the lake. The lake now conforms to your published APIs.
To me this sounds great. And honestly you should do the same thing with a monolith. Nothing worse than "oh you can't make that schema change because a customer with a read-only view will have broken reports".
You also lose benefits of microservices if they have to stop and change the data exporting system all the time too, slowing down development.
The main benefit of a dumb copy is that the production service is not impacted by reporting, only a copy is. This relates to performance (large queries) but also implementation time.
One way to avoid this is to have the microservices to publish data changes to some other (monolith) system, like an MQ system with a specified scheme for the payload.
On the other hand, the notion that “microservices == completely independent of everything else” is an unrealistic one to hold.
Where I was a bit more than a year ago they hired a real expensive consultant who did a data lake project which wasn't finished by the time my team were told to use it, so we rolled our own according to that team's instructions and best practices (best practices were a huge deal at this company, everything we did was best practices). We exported data s3 under a particular structure and I built an ETL system around that and spotify's Luigi. Noone else on my team knew what ETL was, which made me feel old. We spent two, maybe three months on this. The BI team got their data and so did the marketing automation team.
But yeah, it's funny how these projects get complicated in larger organizations. Personally I would have rolled something even simpler on gnu/posix tools and scripts, in rather less than a month.
Google Analytics is associated with gleaning useful, actionable insights from your users' behavior on your web sites and in your apps, which was what the guy who sold you on the concept of a data lake was promising.
While this is absolutely true in my experience and Google Analytics has handled most of my needs in contrast with a homegrown data lake or ETL, there's always the spectre of Google pulling the rug out from under you with service shutdown or massive price increase.
Use off the shelf stuff but be prepared to have to move in a (relative) hurry.
"Google Analytics" is, as I understand the original context, shorthand for "something simple, cheap, and immediately useful." Feel free to substitute Mixpanel or some Show-HN-Google-Analytics-replacement Docker image or whatever.
Hook Microstrategy up to your lake then. If they just want inbound analytics and conversions then IT were probably recommending a data lake out of their own want to make it, rather than the actual need.
In my experience people arguing for data lakes talk a lot about some unspecified future benefit. Companies typically want to learn things about their interactions with their customers that allow them to make more money, and thus GA or its equivalent represents the 80/20 — or more likely 80% of the benefit for 1% of the cost — solution.
Your statement implies a lot of assumptions about a business’s model. Our company cares about user interactions in our app, software development metrics, quality metrics, sales metrics, etc. GA is just one small piece of the puzzle.
“Data lake” may not be the right answer, but GA certainly isn’t.
I'm mostly being sarcastic but I'm partly describing an exact scenario that I'm witnessing right now. Business wants "analytics". IT starts spending a load of money. Business has no idea what it's for and buys their own thing to just track website visits which is what they wanted.
> Some data belongs in Postgres, some in DynamoDB, some in JSON files. Now, how do we do reporting?
One of the key concepts in microservice architecture is data sovereignity. It doesn't matter how/where the data is stored. The only thing that cares about the details of the data storage is the service itself. If you need some data the service operates on for reporting purposes, make an API that gets you this data and make it part of the service. You can architect layers around it, maybe write a separate service that aggregates data from multiple other services into a central analytics database and then reporting can be done from there or keep requests in real time, but introduce a caching layer or whatever. But you do not simply go and poke your reporting fingers into individual service databases. In a good microservice architecture you should not even be able to do that.
Sorry, but "making an API that gets you this data" is the wrong answer.
Most APIs are glorified wrappers around individual record-level operations like- get me this user- or constrained searches that return a portion of the data, maybe paginated. Reporting needs to see all the data. This is a completely different query and service delivery pattern.
What happens to your API service written in a memory managed/garbage-collected language when you ask it to pull all the data from its bespoke database, pass it through its memory space, then send it back down the caller? It goes into GC hell, is what.
What happens when your API service when it issues queries for a consistent view of all the data and winds up forcing the database to lock tables? It stops working for users, is what.
There are so many ways to fail when your microservice starts pretending it is a database. It is not. Databases are dedicated services, not libraries, for a reason.
It is also true that analysts should not be given access to service databases, because the schema and semantics are likely to change out from under them.
The least bad solution? The engineering team is responsible for delivering either semantic events or doing the batch transformation themselves into a model that the data team can consume. It's a data delivery format, not an API.
>It is also true that analysts should not be given access to service databases, because the schema and semantics are likely to change out from under them.
Its not perfect but what we do is create a bunch of table views that represent each of the core data types in the system. We can then do all of the complex joins to collect the data analysts want in to an easy to query table as well as trying to keep the views consistent even as the db changes.
The service will need to read all its data and put it into objects, then extract the data from the objects to report it, then garbage collect all of that. For every single record in its entire data set.
You could say but oh, why not just return the underlying data without making objects? Well now you are exposing the underlying data format, which is what we’re trying to avoid by giving this job to the service.
And thus such patterns lead to the absurdity where 90% of enterprise apps do little actual computations beside serializing and deserializing JSON (or XML if a "legacy" app).
It's remarkable what you can do with just functions and nested data structures. Used to be big on the whole OOP thing, data roles, so much effort for so little.
Now I try to think about problems as "I have input data of shape X, I need shape Y" and fractally break it down into smaller shape-changes. I am kinda starting to get what those functional programmers are yammering on about.
The parent comment said “is asked for all records..GC hell “.
Since a micro service deals with only its own data and reporting is then across services, we’d need to query across services to get data and make sense of it. If we’d ever need to query all records, then such records would become domain objects in the micro services first before being passed along. A large number of domain objects would require a large amount of memory. Processing and releasing domain objects will result in GC on the released objects.
Wait, I would assume that the people in need of reporting would have a pretty good idea of what those reports should look like. That means you know exactly what data needs to be read from a data store optimized for reporting. Each micro-service contributes their share of data to a data store optimized for reading. This is a text-book use case for a non-relational document store. I'm really not seeing what's so difficult about building such a process.
Reporting and non-relational are like oil and water, coming from experience working with people who make reports.
It’s not like they come up with every report they think they might need while the micro service is being architected. They come up with a new report long after engineers have moved on. If it’s a SQL database, no problem. If it’s some silly resumeware data store, then what?
I came here simply to echo this statement! Design a reporting solution that is responsible for ingesting data from these micro services' persistence layers. Analysts should only ever be querying this reporting solution and should not be allowed to connect directly to any micro service persistence layer or API.
We have a whole industry around Analytics and Data and the tools and processes to build this reporting layer is well established and proven.
Nothing will give you as many nightmares as letting your analysts loose on your micro service persistence layers!
Having more than one schema owner is practically a death sentence for development and engineering...
We used to have a few of those, especially on exadata clusters. Finally carted them out of the local dc after moving to RDS Aurora databases with strict policies. Might have caused 3 or 4 people to quit, but totally worth it for the 500+ people that stayed who now can own their data, schema and development (and be held responsible for it! -- another issue of multi-db-access, it's always someone else's fault). Went from deploying once a day with a 'heads up' message to no-message deploying multiple times per hour.
Why monoliths? Everyone still wants to to have OLAP and OLTP systems where analytics are done on OLAP. Where having this separation you can get data from multiple sources to put into your analytics.
I cannot imagine people not doing that and having need to have stats in real time. For most shopping/banking stuff you can get away with once in 24 hours dumps and then analytics can be done on that.
> But you do not simply go and poke your reporting fingers into individual service databases.
This is why I distrust all of the monolith folks. Yes, it's easier to get your data, but in the long run you create unmaintainable spaghetti that can't ever change without breaking things you can't easily surface.
Monoliths are undisciplined and encourage unhealthy and unsustainable engineering. Microservices enforce separation of concerns and data ownership. It can be done wrong, but when executed correctly results in something you can easily make sense of.
You're saying "monoliths encourage unhealthy engineering" and then in the next sentence say "when executed correctly" for microservices. That sounds like a having/eating cake type situation.
> In a microservice architecture it's harder to pretend you're doing it right.
After seeing a few of them, I'd say: "it's less embarrassingly obvious that you're doing it wrong."
But dig into the code for a few endpoints and it usually don't take long to find the crazy spaghetti and the poorly-carved-out separation of responsibilities breaches.
The argument (which I sort of buy) is that microservices provide rails that keep people from doing certain stupid things like N clients depending on the data schema (making the schema a de-facto public interface).
The trick with microservices is that the ecosystem is maturing and there are still lots of ways to screw up other things that are harder to screw up with monoliths. In time 95% of those will go away (my specific prediction is that one day we will write programs that express concurrency and the compiler/toolchain will work out distributing the program across Cloud services--although "Cloud" will be an antiquated term by then--including stitching together the relevant logs, etc and possibly even a coherent distributed debugger experience).
> It can be done wrong, but when executed correctly [...]
Quite the self-fulfilling prophecy there.
> Yes, it's easier to get your data, but in the long run [...]
Systems can and should be evolved and adapted over time. E.g. deploying components of the monolith as separate services. You can't easily predict what the requirements for your software going to be in say 10 years.
And depending on the stage a company is, easy access to data for business decisions outweighs engineering idealism.
> easy access to data for business decisions outweighs engineering idealism
I think there are different levels of sophistication of "engineering idealism". GP talks about "data ownership", and I get the desire to keep the data a microservice is responsible for locked in tightly with it. But let's be precise why it's good: because isolating responsibility reduces complexity. Not because code has some innate right to privacy.
In my own engineering idealism, there's no internal data privacy in the system. Things should be instrumentable, observable in principle. If an analyst wants to take your carefully designed internal NoSQL document structure and plug it into an OLAP cube for some reason, there must be a path to doing that; if that's an expected part of the business, the service needs to have it on the feature list, that this should be doable without degrading the service.
Software needs to be in boxes because otherwise we can't handle it mentally, but the boxes really shouldn't be that black.
Isolating responsibility reduces complexity for that piece of code. It increases complexity for assembling the whole thing into a holistic package, which is usually what analytics primary need is.
YMMV, but the tradeoff is less complexity at the SWE/prod department, and more at the analytics team.
> But let's be precise why it's good: because isolating responsibility reduces complexity.
The thing is, it just shifts around complexity. Once you have microservices, you have to deal with a bunch of new failure modes, plus a bunch of extra code whose only purpose is to provide an interface to other services. And in terms of separating data, the worst part is that you've prevent access this data with some other data within the same transaction.
Microservices require your organization to have an engineering culture. I would be afraid of introducing them at, say, Home Depot where (I've heard) your average programmer doesn't even write tests.
If you have engineering talent within a small multiplicative factor of Google (say 0.5), then you can pull off Microservices at your org.
Edit: I'm being downvoted, but I don't think it's a dangerous assumption or point to make that it takes a certain amount of discipline and experience to implement microservices correctly. When you have that technical capacity and the project calls for it, the benefit is tremendous.
I think you're being downvoted because you're implying monoliths don't require an engineering culture and that microservices are a silver bullet in getting systems built correctly.
I've seen good and bad in each approach. It's certainly possible to enforce good SOCs and proper boundaries in monorepos, and also possible to plough a system into the ground with microservices.
They're all just tools in your toolbox and both have a part to play in modern development.
You’d be surprised how sophisticated Home Depot is. They switched their monolith to microservices using Spinnaker and even contributed back to Spinnaker.
My client is doing a lot of it wrong. To be fair, they got sold a lot of really horrible and ridiculous advice from IBM consultants (is there another kind?), but they also have people in charge (organizationally and technically) who aren't great decision-makers.
As the article says though, you can't fix a people problem (bad engineering practices and discipline) by going from one technology to another (monolith to microservices).
Only when done by folks that never learned how to write modular code and package libraries.
The same folks aren't going to magically learn how to do distributed computing properly, rather they will implement unmaintainable spaghetti network calls with all the distributed computing issues on top.
And untangling a monolith tends to be much less problematic that untangling a bunch of microservices. For one thing, you can refactor/untangle it all offline, do your testing, and do a single release with the updated version, as opposed to trying to coordinate releases of a bunch of services whose interfaces/boundaries were poorly defined.
It's about code quality, microservices are easy replaceable. Modules are too.
With both systems, the core part ( eg. mesh, Infrastructure, ... ) Is crucial.
I think experienced developers can see this, the ones that actually delivered products and had big code changes. The ones that handled their "legacy" code.
Microservices are just a way to enforce it, there are others. None are perfect or bad, both have their use-case.
I do not claim expertise here, but it would seem like microservices would add significant performance costs. Stitching together a bunch of results from different microservices is going to be a LOT more expensive than running a query with joins.
Humans are the most expensive part of the system. You have to make it easy for humans to understand and change the system, and at the end of the day that's the number one thing to optimize for. This is why microservices are compelling.
But to speak directly to your concern, you have to think about service boundaries and granularity correctly. Nobody is saying make a microservice out of every conceivable table. Think about the bigger picture, at a systems level. Wherever you can draw boxes you might have a service boundary.
Why would you need to join payment data to session and login data?
Do you need to compare employee roles and ACLs against product shipping data?
These things belong in different systems. If you keep them in the same monolith, there's the danger that people will write code that intertwines the model in ways it shouldn't. Deploying and ownership become hard problems.
The goal is to keep things that are highly functionally related together in a microservice and expose an API where the different microservices in your ecosystem are required to interact. (Eg, your employees will login.)
When the data analytics folks want to do advanced reporting on the joins of these systems (typically offline behavior), you can expose a feed that exports your data. But don't expose an internal view of it to them or they'll find ways of turning you into a monolith.
In my experience it is a lot more difficult to navigate around all the different microservices to understand what needs to be done compared to being in a monolith where you can jump from file to file.
Also then what also happens is microservices are created using different languages which in turn adds so much complexity to understand what is going on on the whole big picture level.
And code gets repeated a lot more. If there is change in a microservices or update everyone will need to figure out what services depend on and how they will have to adapt. With monolith you can just use your IDE to see what will break if you make a change. So much repeated business logic. Creating a new feature involves having to have many meetings to figure out what services in which way have to be updated.
It is crazy mess in my opinion.
I have been with a company that had monolith application which they split up to more than 15 services (some python, some js, Scala, Java, etc...). Monolith still is used for some parts that are not migrated. I was working on single service having no idea how the whole system worked together. Then I had to do something in the old parts and I very quickly got an understanding how everything works together.
>And code gets repeated a lot more. If there is change in a microservices or update everyone will need to figure out what services depend on and how they will have to adapt. With monolith you can just use your IDE to see what will break if you make a change. So much repeated business logic. Creating a new feature involves having to have many meetings to figure out what services in which way have to be updated.
This is what people mean when they say "distributed monolith" vs. microservices.
I work on a monolith with a team experimenting in microservices and good lord do I hate it. The microservice represents a required step in our user flow, and due to the way we're set up I have to spin up my own private copy. Very often there have been configuration or API changes that were not communicated to me and so for the past few months that service have been broken and I've managed to avoid it for the most part. When I can't, I find it is faster to simply re-assign existing database records or simply bullshit them in a database editor rather than deal with the "why isn't the XXXXXXXXXXXX service working for me again?" flavor of the day
And holy fuck is debugging that stuff difficult. HUUUUUGE waste of time, but management looooooooooves their blasted microservices...
Having to have that documentation, finding, reading, understanding and trusting it already adds so much overhead.
It is still nowhere close to ability to jumping around with IDE.
It might be in a different language, different design patterns and to get to the details you have to check out that project anyway because you can't document absolutely everything out of code base. And if you do you will end up with multiple sources of truth.
It is so much more likely that for every little issue which you otherwise might be able to find an answer to yourself very easily you will have to contact the team owning that microservices.
It is not only mentally exhausting. It is time consuming, it requires so much back and forth. It creates so much dependence on other people because figuring out how things are related is so much more difficult.
Sometimes I have 8 or more different IDE windows open to understand what is going on.
> you have to think about service boundaries and granularity correctly.
This is the hardest part.. I'd argue that this is almost impossible to do correctly without significant domain modeling experience.. also microservices by nature make this hard to refactor these boundaries (compared to monoliths where you'd get compile time feedback)
I prefer to make a structured monolith first (basically multiple services with isolated data that are squished together into a single deployable) and pull them out only if I really need to... Also helps with keeping ms sprawl under control
If you already can't serve your requests from one DB, and you already want to factor out the analytics stuff, the long running background queries, modularize the spaghetti, scale the maintenance load, CI build + testing time, etc...
That's what SOA and microservices is supposed to solve.
At that scale you do reporting from a purpose-built service.
> enforce separation of concerns and data ownership
You can enforce separation of concerns and data ownership in a monolith just as much as you can not enforce these two characteristics in a micro service architecture. Microservices and monoliths are a discussion about deployment artifacts, full stop.
Microservices provide an abstraction. That is kind of the point. If you feel like the data yours service operates on would be better off stored in a redis database instead of an RDBMS, you can rewrite your persistence layer, test and roll out the new version of the service. As long as your APIs do not change, nobody cares how you produce responses to requests. In a monolith, this would be a nightmare. You don't have a single persistence layer to change, you have to go through every module, find all the places where this specific table or tables are being accessed and change retrieval and storage functionality everywhere.
So do modules/classes/interfaces etc. You don't need a layer of HTTP in between components to have abstraction.
In addition, it feels like microservices solve a problem that very few people really have. I've never run into a case where I though "boy, I'd sure like to have a different database for this one chunk of code". If that did happen, then sure, split it out, but I can hardly believe that splitting your entire code base into microservices has a net benefit. The real problem in nearly every project I've worked on is complexity of business logic. A monolith is much easier to refactor, and you can change the entire architecture if you need to without having to coordinate releases of many different applications.
Seems to me you are talking about a database access layer instead of microservices.
My understanding of microservices is a bunch of loosely connected services that can be changed with minimal impact to the others
Problem with the ideal is in reality this never works as complexity grows the spaghetti code moves to spaghetti infrastructure ( Done a network map of a large k8s / istio deployment lately ? )
The impact would be minimal only if the API of the microservice didn't change. But in the same codebase too, if you have a module whose API doesn't change the changes from refactoring it would likewise be minimal.
Depending on where you work, it can be a problem, because the separation is not always appropriate, and can for political reasons be much harder to revert when visible at the service level (for example because the architect doesn't understand consistency, or because your manager tells you that the distributed architecture documentation has been sent to the client so it cannot be modified).
In case of undue separation, reworking the internals of the enclosing monolith should have less chance to cause frictions.
Splitting code into libraries works better, it's simpler and faster. The only thing micro services bring to the table is being to deploy updates independently (although this is also possible with libraries). If you don't need to deploy independently then micro services are useless complexity, if you can't deploy independently then you've got a distributed monolith.
A co-worker had a smart solution for this: your service's representation in a reporting system (a data warehouse for example) is part of its API. Your team should document it, and should ensure that when that representation changes information about the changes is available to the people who need to know it.
This really makes sense to me. I love the idea that part of a microservice team's responsibility is ensuring that a sensible subset of the data is copied over to the reporting systems in such a way that it can be used for analysis without risk of other teams writing queries that depend on undocumented internal details.
> But you do not simply go and poke your reporting fingers into individual service databases. In a good microservice architecture you should not even be able to do that.
I agree. In a monolith architecture, though, you CAN do that (and many shops do.) That's where their pains come from when they migrate from monolith to microservice: development is easier, but reports are way, way harder.
> when they migrate from monolith to microservice: development is easier [...]
Not even that -- that idea is still highly debatable.
I would argue that it absolutely isn't easier, and the stepping-back-in-time of developer experience is one of the biggest problems with microservices.
> you do not simply go and poke your reporting fingers into individual service databases
Side point: This is a needlessly hostile and unprofessional way to refer to a colleague. Remember that you and the reporting/analytics people at your company are working towards the same goals (the company's business goals). You are collaborators, not combatants.
You can express your same point by saying something like "The habit of directly accessing database resources and building out reporting code on this is likely to lead to some very serious problems when the schemas change. This is tantamount to relying upon a private API." etc.
We can all achieve much more when we endeavor to treat one another with respect and assume good intentions.
I've noticed reporting/analytics people going extinct around my workplace as micro services make monitoring easier. There might be some pent up hostility towards the technology side
If you think telling colleagues not to "simply go and poke your reporting fingers into" things won't insult them or put them on a defensive footing, I encourage you to try it and closely note the reception you receive. In my experience, people do not appreciate being spoken to like that.
I'm going to disagree heavily here. The world of cloud computing, microservices, and hosted/managed services has made the analyst and data engineers job easier than ever. If the software team builds a new dynamodb table, they simple give the AWS account for the analytics team the appropriate IAM permissions and the analytics team will set-up an off-peak bulk extract. A single analyst can easily run an entire data warehouse and analytics pipeline basically part time without a single server using hosted services and microservices. With a team of analysts, the load sharing should be such that the ETL infrastructure is only touched when adding new pipelines or a new feature transformation.
And for data scientists working on production models used within production software, most inference is packaged as containers in something like ECS or Fargate which are then scaled up and down automatically. Eg, they are basically running a microservice for the software teams to consume.
Real time reporting, in my opinion, is not the domain of analysts; it's the domain of the software team. For one, it's rarely useful outside of something like a NOC (or similar control room areas) and should be considered a software feature of that control room. If real-time has to be on the analysts (been there), then the software team should dual publish their transactions to kinesis firehouse and the analytics team can take it from there.
Of course, all of this relies heavily on buy-in to the world of cloud computing. Come on in, we all float down here.
Cloud computing helps here, but microservices still make this harder. Some of the data is in Dynamo, some of it is in Aurora, some of it is in MySQL RDS, some of it is in S3, and nobody knows where all of it is at once.
From a project management perspective, each data source should have some requirements behind it from the business team. Those requirements should be prioritized meaning you can prioritize which data source to tackle first. You automate the process in AWS data pipelines for that data source, write the documentation for the next analyst, and move on to the next data source.
The complexity you and the OP seem to be describing are more in the management and prioritization of analytics projects than in the actual "this is a hard technical problem" domain. It's just a lot of it is tedious especially compared to "everyone just put all your data in the Oracle RACs and bug the DBA until they give you permission" model of the past.
Also one of the service teams might need to change their schema, which the reporting team needs to adjust their process to handle that. That's fine, but they need to know that in advance, and then they might have a backlog of other things that they need to do, and then some other teams schema changed without notice, so now they always have to play catch-up.
What/where do you run this mythical one-analyst pipeline, though? Is that in cloud services too? Airflow? Kubeflow? Apache Beam? It sounds like you're just pushing the problem around.
Lambda is mostly used for it's trigger functionality for data or artifacts that are created at irregular intervals. Eg, an object is uploaded to s3 which triggers a lambda which runs a copy command for that object into redshift. The kind of stuff that's well below the threshold for leaving the free tier.
It's a little sad because originally, people thought there would be a shared data base (now one word) for the whole organization. Data administrators would write rules for the data as a whole and keep applications in line so that they operated on that data base appropriately. A lot of DBMS features are meant to support this concept of shared use by diverse applications.
What ended up happening is each application uses its own database, nobody offered applications that could be configured to an existing data base, and all of our data is in silos.
Do you know why the shared database vision didn't work out? Because I still think it would be the best approach for many companies. Most companies are small enough that they could spend less than $10k/month for an extremely powerful cloud DB. Then you could replace most microservices with views or stored procs. What could be simpler?
I think one reason to avoid this approach is because SQL and other DB languages are pretty terrible (compared to popular languages like C#, Python, etc...) But why has no one written a great DB language yet?
I've worked on a service like this. 800k lines of PL/SQL and Java stored procesures (running inside the database, so you could call Java from PL/SQL and vice versa), powered by triggers.
* Testing is god-awful. To test a simple thing you had to know how the whole application worked, because there's validation in triggers, which triggers other triggers, which require things to be in a certain state. This made refactoring really hard/risky so it rarely got done.
* There's a performance ceiling, and when you hit that, you're done. We did hit a ceiling, did months of performance tuning, then upgraded to the biggest available box at the time, 96 cores, 2TB ram, which helped, but next time the upgrade won't be big enough. You're limited in what one box can do (and due to stored procedures being tied to the transaction there's limits to what you can do concurrently as well)
I think they go too crazy with the stored procedures. I would rather see constraints and views to make the data work. Triggers are useful for simple behind-the-scenes things like logging, or creating updateable views, but their behavior should be kept simple. If they're sending e-mails and launching missiles, it's probably too much.
Debugging PL/SQL without the PL/SQL debugger is nearly impossible. Unfortunately a lot of shops cheap out on developer tools after they buy the server licenses. I never liked the idea of Java on the database. The good thing about PL/SQL is that nobody wants to write it so it has a tendency to not be overused by most developers.
The performance ceiling probably wouldn't be too low if there weren't too much extra activity with each update. As with all databases, sharding and replication are your friend.
DBAs were a bottleneck. In the best case, you’d throw your application’s data needs over the wall to the DBAs (remember, this was the waterfall era) and hope they’d update the schema for you in a timely manner. In the worst case, the DBAs were petty tyrants who stifled progress. In the worser case, they were incompetent and you ended up with all these applications directly reading and writing the database, making schema changes impossible and coupling applications in obtuse, incredibly difficult to fix ways.
In any case, designing a single schema that encompassed all the needs of the organization and could grow and change as the organization did was nearly always too much to ask.
This was in the days of enterprise data modeling where people believed there really was just one data model or object model that could represent the whole org, independent of the needs of any given application. I don’t think anybody believes that any more.
Probably for the same reasons that waterfall development doesn't work out. This approach requires up-front specification of the data before the application domain is well understood. Any application desiring a migration to a different schema would need to work with the others to do it. Finally, developers bristle at the idea that they have to wait for another team to get their job done. In the absence of a strong company policy, it will inevitably drift toward shipping an application rather than keeping everything together.
Perhaps refactoring, had it been better understood around 1970, could have gone a long way toward harmonizing diverse schemas, allowing experimentation with eventual refactoring into the common database.
Our current environment makes this impossible. There's no way that Salesforce is going to ship a version that works with your company's database schema. You're going to have to supply that replication yourself. Same for Quickbooks. To get that kind of customization you need to be spending hundreds of thousands for enterprise software.
I didn't make this clear in my original comment, but I was envisioning that an app like Salesforce would use its own schema, but still live in the same DB with other schemas. I'm going to assume that Salesforce has some notion of an Employee. Salesforce would use its own Employee table by default, but it would provide extension points (views and SPs) that would allow you to read and write Employee data from tables in another schema if you want. This might be preferable to duplicating Employee data.
Edit - I just saw that you addressed this in your original post: "nobody offered applications that could be configured to an existing data base"
> I think one reason to avoid this approach is because SQL and other DB languages are pretty terrible (compared to popular languages like C#, Python, etc...) But why has no one written a great DB language yet?
SQL is actually hard to compare to programming languages, because in SQL you say what you want, while in iterative language you say how. I only know one language that was competing with SQL (and lost) it is QUEL (originally it was used by Ingress and Postgres).
BTW for triggers and stored procedures you actually can use traditional language, I know that PostgreSQL supports Python, you just need to load a proper extension to enable it.
For workloads which are read-heavy, I think a single database can be a great solution -- a small monolith has exclusive write access to the db, and any number of polyglot read-only services are connected to as many read-replicas as are needed for horizontal scaling.
I disagree with the conclusion. While every situation is unique, the default should be separate persistence layers for analytics and transactions.
Analytics has very different workloads and use cases than production transactions. Data is WORM, latency and uptime SLAs are looser, throughput and durability SLAs are tighter, access is columnar, consistency requirements are different, demand is lumpy, and security policies are different. Running analytics against the same database used for customer facing transactions just doesn't make sense. Do you really want to spike your client response times every time BI runs their daily report?
The biggest downside to keeping analytics data separate from transactions is the need to duplicate the data. But storage costs are dirt cheap. Without forethought you can also run into thorny questions when the sources diverge. But as long as you plan a clear policy about the canonical source of truth, this won't become an issue.
With that architecture, analysts don't have to feel constrained about decisions that engineering is making without their input. They're free to store their version of the data in whatever way best suits their work flow. The only time they need to interface with engineering is to ingest the data either from a delta stream in the transaction layer and/or duplexing the incoming data upstream. Keeping interfaces small is a core principle of best engineering practices.
In my last job I was a DevOps guy in a Data Eng. team and we used microservice (actually serverless) extensively to the point that none of our ETL relied on servers (they were all serverless; AWS lambda).
Now databases themselves are different stories, they are the persistence/data layer that microservices themselves use . But it's actually doable and I'd even say much easier to use microservices/serverless for ETL because it's easier to develop CI/CD and testing/deployment with non-stateful services. Of course, it does take certain level of engineering maturity and skillsets but I think the end results justify it.
This isn’t a new problem to microservices though, although maybe it’s amplified. Reporting was challenging before microservices became popular too with data from different sources. Different products, offline data sources etc that all had to be put together. The whole ETL, data warehousing stuff.
In the end everything involves tradeoffs. If you need to partition your data to scale, or for some other reason need to break up the data, then reporting potentially becomes a secondary concern. In this case maybe delayed reporting or a more complex reporting workflow is worth the trade off.
+1, Informatica was founded in 1993. Teradata in 1979. These are not new problems.
DataWarehousing has drastically improved recently with the separation of Storage & Compute. A single analyst's query impacting the entire Data Warehouse is a problem that will in the next few years be something of the past.
Microservices are primarily about silo'ing different engineering teams from eachother. If you have a singular reporting database that a singular engineering team manages I'm not sure its a big deal. Reporting might be a "monolith" but the system as a whole isn't. Teams can still deploy their services and change their database schemas without stepping on eachother's toes.
> Teams can still deploy their services and change their database schemas
No, because as soon as you change your schema, you have to plan ahead with the reporting team for starters. The reports still have to be able to work the same way, which means the data needs to be in the same format, or else the reports need to be rewritten to handle the changes in tables/structures.
That's no different than changing your API specification or refactoring your code or anything else. Ideally the entrypoints into your team's services should be considered hard contracts to the outside world and the rest of the organization. Changing the contract out from under your customer is not something that should ever be easy.
IMO the devops folks should define some standard containers that include facilities for reporting on low-level metrics. Most of the monitoring above that should be managed by the microservice owner. The messages that are consumed for BI and external reporting should not have breaking changes any more than the APIs you provide your clients should.
"That's no different than changing your API specification"
This is a great point. The way to make backward-compatible changes to an API is by adding additional (JSON) keys, not changing / removing keys. The same approach works for a DB -- adding a new column doesn't break existing reporting queries.
Isn't that a weakness of any data dependency? I could push a new service that deprecates a field and supplies a replacement. I still have to communicate it and get downstream consumers to migrate. Or is the problem that reporting teams are looking at the low-level implementation details, rather than some stable higher-level interface? (I don't know how to avoid that when you're just pulling someone else's database directly or indirectly.)
No one has solved that problem and it sucks, and what ends up happening is you end up again porting that data from those disparate SQL and NoSQL databases either to a warehouse which is RDBMS or you put it into a datalake. That's again possible if you somehow manage to find all the drivers. You're doubly screwed if you have a hybrid - cloud and on-prem setup.
The high latency between operational data and that data being reflected in reporting using traditional data warehouse pipelines makes it difficult for companies to make effective business decisions in a fast-paced business environment. Even in competent execution, that latency is frequently measured in weeks for big businesses. In the last few years, I've been approached by a number of traditional big businesses looking to rearchitect their database systems to make them more monolithic for the express purpose of reducing end-to-end latency in support of operational decisions.
It is extremely expensive and slow to move all the data to a data warehouse. Ignoring the cost element, the latency from when data shows up in the operational environment to when it is reflected in the output of the data warehouse is often unacceptably high for many business use cases. A large percentage of that latency is attributable solely to what is essentially complex data motion between systems.
It is not necessarily the case that data warehouses have high update latency. Open source streaming tech (e.g. Apache Kafka, Beam, etcetera) can be used to build an OLAP database updated in near real-time.
Sure, I've done this many times myself. The big caveat is that this only works if your data/update velocity is relatively low. I've seen many operational data models where the data warehouse would never be able to keep up with the operational data flow. Due to the rapidly increasing data intensity of business, there is a clear trend where this latter situation will eventually become the norm. I've already seen instances at multiple companies where Kafka-based data warehouse pipelines are being replaced with monolith architectures because the velocity of the operational system is too high.
For it to work, online OLAP write throughput needs to scale like operational databases. This is not the case in practice, so operational databases can scale to a point where the OLAP system can't keep up. The technical solution is to scale the operational system to absorb the extra workload created by the data warehousing applications, but current database architectures are not really designed for it so it isn't trivial to do.
Our data is near real time. But I can see how latency can be an issue based on size of the data and transformations are needed on the data before it hits the DW.
But there are solutions - kafka being one.
This is what Kafka is for. You put Kafka on top of your database to expose data and events. Now BI can take the events put them into their system as they want.
> You put Kafka on top of your database to expose data and events. Now BI can take the events put them into their system as they want.
Right, that's the data warehouse method that I described. "Put them into their system" is a lot harder work than just typing in "put Kafka on top of your database."
Things like Maxwell[0] help a lot with the getting stuff into Kafka from the DB, but I agree with your point entirely. Kafka is not something I’d recommend unless you’ve got a team dedicated to maintaining it. Thankfully Confluent is rising up to fill that need and reasonably priced. Confluent also has an offering for streaming binlogs into Kafka too.
Worth pointing out that running Kafka is itself no small thing. So now you've added BI and also Kafka, which itself requires a bundle of services to operate. And you have to keep BI and Kafka in sync with all your various other data stores' schemas, and with each other.
Shameless plug: feeding data from operational databases to DWH is one main use case of change data capture as e.g. implemented by Debezium (https://debezium.io/) for MySQL, Postgres, MongoDB, and more, using Apache Kafka as the messaging layer.
Data Warehouses are (usually) not optimized for individual inserts, updates, deletes (DMLs). Loading data into DWHs is usually done through copy/ingestion commands where aggregated/batched data is copied from blob storage (s3, azure, etc...) to a staging table in the DWH.
In a non-append-only scenario, Debezium tracks each source operation (insert, update, delete) from the replication log (oplog, binlog, etc) as an individual operation that's emitted into a kafka topic. How does one efficiently replicate this to a Data Warehouse in an efficient manner?
I have not been able to use Debezium as way to replicate to a Data Warehouse for this very reason. At least not without having to resort to very complicated data warehouse merge strategies.
Note, there exist Data Warehouses that allow tables to be created in either OLAP or OLTP flavor. I understand that Debezium could easily replicate to an OLTP staging table. But are there any solutions if this isn't an option?
Without tone of voice over text, unsure if the suggestion is in sincerity or sarcasm.
The article talks about micro services being split up due to fad as opposed to deliberate, researched reasons. Putting Kafka over the database also makes the data distributed when in most cases, it’s not necessary!
This is a solved problem, "Data Engineering" teams solve this by building a data pipeline. It's not for all orgs, but for a large org, this is worth doing right.
"data pipeline" is just the new trendy phrase for ETL, which the GP mentioned. just because a solution exists, does not make it a solved problem. it's not the right solution for everyone
"Change-Data-Capture (CDC) allows capturing committed changes from a database in real-time and propagating those changes to downstream consumers [1][2]. CDC is becoming increasingly popular for use cases that require keeping multiple heterogeneous datastores in sync (like MySQL and ElasticSearch) and addresses challenges that exist with traditional techniques like dual-writes and distributed transactions [3][4]."
Then it doesn't seem to be solved. Seems like teams operating at a lean scale would have an issue with this, especially teams with lopsided business:engineering ratios
The remark wasn't that there weren't solutions, but that it is still a problem that needs a solution. Storing all data within a single database is a much simpler way to get started if you want to run analytic queries. You spin up a read slave that is tuned for expensive queries, rather than OLAP ones.
I don't think you're right back to a monolith with centralized reporting. Remember, microservices doesn't mean JSON-RPC over HTTP. Passing updates extracted via change data capture and forwarding them to another reporting system is a perfectly viable interface. Data duplication is also an acceptable consequence in this design.
> Passing updates extracted via change data capture and forwarding them to another reporting system is a perfectly viable interface.
Right, that's the data warehouse method I described, keeping a central database in a reporting system. But now you just have to keep that database schema stable, because folks are going to write reports against it. It's a monolith for reporting, and its schema is going to be affected by changes in upstream systems. It's not like source systems can just totally refactor tables without considering how the data downstream is going to be affected. When Ms. CEO's report breaks, bad things happen.
Sorry, I noticed you made the same observation in another thread after you got dogpiled by everyone; I left my question over there. Yeah, I've had to contend with that problem before -- hard to imagine how I forgot about it :)
You can also make your own connectors that make your services appear as tables, which you can query with SQL in the normal way.
So if the new accounts micro-service doesn't have a database, or the team won't let your analysts access the database behind it, you can always go in through the front-door e.g. the rest/graphql/grpc/thrift/buzzword api it exposes, and treat it as just another table!
Presto is great even for monoliths ;) Rah rah presto.
It's been about 4 years since I've been in this world, but I remember there being several products all doing a very similar thing: Presto, Hive, SparkSQL, Impala, perhaps some more I'm forgetting. Is the situation still the same? Or has Presto "won out" in any sense?
Presto and SparkSQL are SQL interfaces to many different datasources, including Hive and Impala, but also any SQL database such as Postgres/Redis/etc, and many other types of databases, such as Cassandra and Redis; the SQL tools can query all these different types of databases with a unified SQL interface, and even do joins across them.
The difference between Presto and SparkSQL is that Presto is run on a multi-tenant cluster with automatic resource allocation. SparkSQL jobs tend to have to be allocated with a specific resource allocation ahead of time. This makes Presto is (in my experience) a little more user-friendly. On the other hand, SparkSQL has better support for writing data to different datasources, whereas Presto pretty much only supports collecting results from a client or writing data into Hive.
I know Hive can definitely query other datasources like traditional SQL databases, redis, cassandra, hbase, elasticsearch, etc, etc. I thought Impala had some bit of support for this as well, though I'm less familiar with it.
And SparkSQL can be run on a multi-tenant cluster with automatic resource allocation - Mesos, YARN, or Kubernetes.
Presto can't deal with ElasticSearch. It can query basic data but it's not optimized to translate SQL to native ES query (SQL ES query is for paying customer).
Where I work we use micro-services through lambda (we have dozens of them) and use DynamoDB for our tables. DynamoDB streams are piped through elasticsearch. We use it for our business intelligence. Took us about a week to setup proper replication and sharding. I don't have a strong opinion on monolith or micro-service, pick one or the other, understand their culprit and write high quality (aka. simple and maintainable) code.
We used to do that but dynamodb being very dynamic (which we like) makes it harder to use a declarative schema. ES auto detect the schema which makes it a breeze. I’d say try with JSON fields but I don’t think you’ll get great query speed
But I’d argue Monoliths don’t have anything inherent to them which makes reporting easier. A proper BI setup requires a lot of hard work no matter how the backend services are built.
> But I’d argue Monoliths don’t have anything inherent to them which makes reporting easier.
It's easier to join tables in databases that live on a single server, in a single database platform, than it is to connect to lots of different data sources that live in different servers, possibly even in different locations (like cloud vs on-premises, and hybrids.)
Whether or not your system intentionally ended up with this architecture, GraphQL provides a unified way of fetching the data. You'll still have to implement the details of how to fetch from the various services, but it gives people who just want read access to everything a clean abstraction for doing so.
I am not sure about that. Both RESTful APIs and GraphQL APIs need to be designed and flexibility is basicaly a measure of how good your API design is. I don't see how GraphQL is intrinsically more flexible.
It's nice to have a clean abstraction, but if the various services are on tables in different databases, things like joins get much more expensive, no? Performance is important.
There are many business environments where copying the data is effectively intractable so you need to run all of your data model operations in a single instance. More data models have this property with each passing year, largely because copying data is very slow and very expensive.
This is not a big deal if your system is designed for it, sophisticated database engines have good control mechanisms for ensuring that heavy reporting or analytical jobs minimally impact concurrent operational workload.
> Surely you wouldn't run analytics directly on your prod serving database, and risk a bad query taking down your whole system?
Uhh, yep, that's exactly how a lot of businesses work.
There are defenses at the database layer. For example, in Microsoft SQL Server, we've got Resource Governor which lets you cap how much CPU/memory/etc that any one department or user can get.
That's not a good idea. The usage patterns for a production database and one that runs reporting are very different. Reporting has long running queries with complex joins, production has many parallel short queries. If you start mixing the two, you can no longer reliably tune your database. For example you would want the alert for slow queries set set to a different time out on production and on reporting.
Also, you complicate locking down access to the database. A reporting database can typically contain less sensitive info, reporting would not have password (hashes) for user accounts for example.
I don't think Brent was necessarily saying it was a good idea, just something that is commonly seen. At my own company we try hard to get customers to run reports against replicas/extracts rather than production, but some customers insist that they absolutely _need_ to pull in live data from production. So they run complex reports against a production OLTP schema and wonder why the system is running slow...
> Reading other comments Brent’s made here, I’m not so sure.
No no, it's not a good idea, but it's just the reality that I usually have to deal with. I wish I could lay down rules like "no analyst ever gets the rights to query the database directly," but all it takes is one analyst to be buddy-buddy with the company owner, and produce a few reports that have really high business value, and then next thing you know, that analyst has sysadmin rights to every database, and anybody who tries to be a barrier to that analyst's success is "slowing the business down."
Pretty trival to setup read replicas using log shipping in MSSQL and postgresql to offload analytical loads to secondary servers.
I work on a monolith that does this, but its usually not even necessary a single db server on modern hardware with proper resource governing can handle quite a bit.
> Uhh, yep, that's exactly how a lot of businesses work.
Just because a lot of businesses do it, doesn’t mean it’s a good idea. A lot of businesses don’t do source control of any kind, so should we all do that too?
is that a bad thing? I don't think anyone is against a reporting monolith. to me, being able to silo data in the appropriate stores for their read / write patterns, and still query it all in a single columnar lake seems like a feature, not a bug to be solved
> now they're stumped as to how they're supposed to quickly get data out
I'd argue that (given a large enough business) "reporting" ought to be its own own software unit (code, database, etc.) which is responsible for taking in data-flows from other services and storing them into whatever form happens to be best for the needs of report-runners. Rather than wandering auditor-sysadmins, they're mostly customers of another system.
When it comes to a complicated ecosystem of many idiosyncratic services, this article may be handy: "The Log: What every software engineer should know about real-time data's unifying abstraction"
Reporting on databases is a rather 90's thing to do. Why would you still actively pursue such a reporting method from the OLTP/OLAP world? If you use tools that are purposely made to work in such a way, an analyst using those tools will obviously not be able to utilise them in an incompatible environment.
Or you could have CQRS projectors (read models), which solve exactly this - they aggregate data from lots of different eventually consistent sources, providing you with locally consistent view only of events you might be interested in.
It will lag behind by some extent, roughly equal to the processing delay + double network delay, but can include arbitrary things that are part of your event model.
Though, it's not a silver bullet (distributed constraints are pain in the ass yet), and if system wasn't designed as DDD/CQRS system from the ground up, it would be hard to migrate it, especially because you can't make small steps toward it.
Yes, but data lakes don't fill themselves. Each team has to be responsible for exporting every transaction to the lake, either in real time or delayed, and then the reporting systems have to be able to combine the different sources/formats. If each microservice team expects to be able to change their formats in the data lake willy-nilly, bam, there breaks the report again.
This problem exists in monolithic data stores and codebases too. It's not as if independent teams have absolute sovereignty over their table schemas.
Schemas evolve as the needs of the product change, and that evolution will always outpace the way the business looks at the data.
The best way I've seen to deal with this is to handle this at report query-time (e.g. pick a platform that can effectively handle the necessary transformations at query-time, rather than at load-time).
You can't entirely escape this problem. Even when companies want to fit everything in a single database, they often find they can't. With enough data you'll eventually run into scalability limitations.
A quick fix might be to split different customers onto different databases, which doesn't require too many changes to the app. But now you're stuck building tools to pull from different databases to generate reports, even though you have a monolithic code base.
I've seen that BigQuery has federated querying, so you can build queries on data in BigQuery, BigTable, Cloud SQL, Cloud Storage (Avro, Parquet, ORC, JSON and CSV formats) and Google Drives (CSV, Newlinen-delimited JSON, Avro or Google Sheets)
Even if you have a monolith, you’re still going to have multiple sources that you want to report on. Even in an incredibly simple monolith I could imagine you’d have: your app data, Salesforce, Google Analytics. Having an ELT > data warehouse pipeline isn’t difficult, and what reporting use case is undermined by the data being a few minutes old?
> Reporting on a bunch of different databases is a hard nut to crack.
Maybe, but your business analyst already needs to connect to N other databases/data-sources anyway (marketing data, web analytics, salesforce, etc, etc), so you already need the infrastructure to connect to N data sources. N+1 isn't much worse.
This is a problem but I’m not sure having everything in a single data store is a great idea either. Generally you want your analytics separate from your operations anyway. We do this by having a central ES instance which just informs on the data it needs, which had worked perfectly fine for our needs
I thought that's what solutions like Calcite[1] were for: running queries across disparate data sources. Yes, you still need to have adapters for each source to normalize access semantics. No, not all sources will have the same schema. But if you're trying to combine Postgres and DynamoDB into a single query, you would narrow your view to something that exists in both places, e.g. customer keys, meta data, etc.
It seems like interacting with customers and enforcing business rules is one job, and observing what's happening is a different concern. Observing means collecting a lot of logs to a reporting database.
My employer adopted microservices for a very specific reason: it became nearly impossible to deploy the monolith. With hundreds of commits trying to go out every day, probability that at least one would break something approached 1. Then everything had to be rolled back. Getting unrelated concerns into separate deployable artifacts rescued our velocity.
It came with many of its own challenges, too! A great deal of infrastructure had to be built to get from O(N) to O(1) infrastructure engineering effort per service. But we did build it, and now it works great.
There is a reason monoliths were traditionally coupled with quarterly or even annual releases gated by extensive QA.
The solution to this is just writing modular code and using an artifact repository. It's a model I've rarely seen attempted even though it's much easier than microservices and serves the same purpose.
You can have individual dev teams, with their own repo ,backlogs, own stakeholders, etc all working at their own paces. They build modules (jars, nuget packages, npm modules) and deploy semver versioned artifacts to a repo like Nexus or JFrog. Any frontend/consumer applications can build towards versions of those modules and upgrade on their own schedule. Only the consumers need to worry about deployment.
This gives you the organizational flexibility but not the infrastructure overhead.
The discriminating factor that makes microservices necessary if these individual services have divergent hardware needs.
When developers complain about being verbose and not a great language for coders, I counter that it doesn't exist to solve programming problems, but rather organizational problems. The killer feature that launched Java wasn't crap like checked exceptions, it was javadoc. Strict, self-documenting APIs are 10X more valuable than any intrinsic language feature.
i dont care about it not being great. its good enough as a language. but maven... but websphere... java is a hellish platform that ordinarily would not win from interpreted languages or those which focus on fast compilation. But it runs literally everywhere, including your toaster, but more importantly on mainframes which also run the real mvp aka Cobol. run once, run anywhere remains a killer feature no other platform has replicated as well.
Interesting. Ie, just the way the larger community (ie open source) ecosystem works, but just inside your company instead.
That makes an obvious kind of sense, when your number of contributors have grown too large to operate like a monolith... why not operate using models well-established for very large inter-entity communities?
I wonder why more very large companies don't do this, if they don't.
I work at a very large company where each team builds and deploys its own production artifacts, but we always build everything off of head rather than choosing versions of each dependency and upgrading on our own schedule. The choose-your-own-adventure approach seems like it might be nice until you write (e.g) a critical security patch and you have to go nag N teams to upgrade their version of your library. With our system it goes out in everyone's daily or weekly release.
The system only works because of very good integration testing infrastructure and a culture of being able to roll back almost any change that broke you.
If for some reason the more traditional versioned release dependency approach made more sense for other reasons (as the GP suggested they were using it) -- it shouldn't be too hard to write automated tooling to go tell everyone to upgrade for a security release, or even make PR's dependabot-style; if an org already has "very good integration testing infrastructure", adding that tooling for security updates of dependencies is perhaps within the capacities.
My previous company went this direction. I wouldn't recommend it.
Say you version each module and pull in specified versions. It'll work fine, right up until two modules both try to pull different versions of a third module. In practice, you have to update multiple modules at once to avoid conflicts, which, in turn, can require updating other team's code.
It also turns out some tools like Maven don't prevent conflicts by default. You can end up exploring pom.xml files in Eclipse, trying to add exclusions, or figure out which repo is dragging them in.
Yeah, it sure seems like Maven, by design, tries to avoid dependency locking and version resolution configuration, but in turn, becomes kind of a bear to manage once you get fairly large.
I've switched to Gradle, and one of the first things I do is usually flip on dependency locking, and then even go so far to reject and/or flag anything that doesn't have a clear semantic version scheme.
Gradle's documentation can be a little overwhelming, but there's a lot more to these topics that developers usually overlook:
Many companies would benefit from real dependency locking, and making sure they have reproducible builds. It's tricky, but, it can be a lot easier than "containerization", which I've often heard touted as a solution. (Containers are useful, but you should fix your CI separately.)
Also the way I see containerization being implemented (just Dockerfile) instead of relying on deploying the same image essentially kills the reproducibility part.
That's what semver is for, right?
Breaking changes go in a major, same major means that the latest is always compatible.
You'll have the same issue with microservices if you introduce breaking changes.
That would have let you know when to expect the impact, but not eliminate the impact itself.
Occasionally people versioned their module's APIs, which seemed like a cleaner way to handle the module update, as you don't have to update everything at once. They only went to that effort when they realized they'd have to update thousands of callers.
Yeah, but even with semver library owners have problems. If you need to make a critical change (e.g. a security update), you can either (a) wait until everyone does a version bump on their own schedule, or (b) do the version bump yourself, which means you're deploying every app that relies on your library. (a) might not be feasible for critical things, and (b) means that you might be deploying changes that the app isn't ready for yet.
For example, if master is v1.5.0, and I'm an app that uses v1.1.0, then if the library owner bumps to 1.5.1 for a critical update, I need to go from 1.1.0 -> 1.5.1, which might involve changes I'm not ready for yet. I better have phenomenal integration testing to make sure the update is safe to do.
> For example, if master is v1.5.0, and I'm an app that uses v1.1.0, then if the library owner bumps to 1.5.1 for a critical update, I need to go from 1.1.0 -> 1.5.1, which might involve changes I'm not ready for yet. I better have phenomenal integration testing to make sure the update is safe to do.
That could be solved by backporting security fixes to version 1.1.
Of course at a certain point you should deprecate older versions of your package. At which point it's the client's responsibility to upgrade (like for any third party library they would be using).
My company does this in a sense. We build a (mostly) open-source eCommerce platform with almost 100 public repos and a lot more proprietary ones for larger retailers. It's not "microservices", but it's definitely a lot of packages to manage, and we have frequently had trouble keeping them all organized and in sync with the latest changes to the core platform. The core platform is designed internally with a modular, domain-driven architecture, and the applications made with the platforms are monoliths, but the features of the platform are distributed amongst a couple hundred RubyGems. It's a lot to keep track of, but it's still a great compromise between a modular, plug-and-play architecture, and a monolithic application with all those features installed at once (which would be really hard to maintain). We're looking for ways to reduce our workload in that sense, because our team is rather small (about 6 people) and as I was saying, it's a little difficult to manage all those repos. So we've been thinking about keeping the modular architecture but keeping everything in a monorepo, so it's easier to release new versions and figure out what's changed (or what needs to change).
Yep, that approach works best. I think microservices can still be introduced for things that are quite separate and don't require internal communication (the biggest thing that people underestimate when going with micro services is that communication between services is not an easy problem) between components. Although not every business will have such situations.
If you have the problem for which microservices are the better solution (as you describe in your example), great!
But I have seen people try to build a solution using microservices "because this is the way to go". But this often trades the problems of commits and unit verification for problems of architecture. It can take a lot of work and skill to design a robust architecture using microservices and architectural bugs and failures can be a lot harder to debug, understand and resolve. Pick your poison.
> With hundreds of commits trying to go out every day, probability that at least one would break something approached 1. Then everything had to be rolled back. Getting unrelated concerns into separate deployable artifacts rescued our velocity.
Software tends to reflect the structure of the organization that creates it, so this makes sense to me. If you have multiple teams contributing to a stack, eventually it is easier to have the teams work on their own (micro-)service(s).
I recommend to most people that stacks should start out as monoliths though, and move to microservice architectures only when they encounter enough pain. I think starting out with microservices from the get-go just reduces initial velocity with little payoff until you hit a certain scale.
I think you are missing the intermediate step of libraries.
1. monolith
2. libraries
3. services
If you skip step 2, there is a high probability that the services you end up with are going to be just as disorganized as the monolith which is causing grief.
yup, agreed. I generally would consider your step2 as part of the lifecycle of a well architected monolith, but calling it out as a separate step is certainly clearer, and does reflect more of what I have seen "work well" in the real world too.
This is the right reason why an organization would want to adopt microservices, in my opinion. If you have hundreds of commits going out every day, you're probably dealing with a relatively large team, yes? Microservices makes sense for large companies, because it allows for organization of developers into teams that manage each microservice rather than having them all under the umbrella of the same application. But if your team was smaller, say, 6 people...managing 20 different services becomes a real chore and adds a lot of overhead. In this case, a monolithic application is a way better choice, because it allows all the developers to work together on the same codebase instead of spread out and distant amongst a bunch of different applications with varying degrees of quality.
My company is currently moving to microservices, but for different reasons.
The problem you raise was in fact fixed years ago in our org simply by properly decoupling our "monolithic" app into modules. The top-level build was simply a collection of all pre-built modules. After each dependency update, an automated regression would run, and if pass rate was less than X%, the change was not pushed upstream.
Really, microservices give you 2 things better than that: 1, the ability to combine more technologies; 2, much simpler (horizontal) scalability, since properly done microservices naturally scale well with multiple copies running on multiple machines. Of course, the costs are there, in terms of more debugging difficulties, more complex logging needs and usually a higher minimum overhead.
I've horizontally scaled plenty of monoliths. Tends to be easier. Less to monitor. You don't have to worry about "what" you're scaling. You just scale it all. Just throw another server on the pile, it reduces resources used by everything on the existing servers.
If you have a process that takes up a lot of a particular resource that others don't (e.g. disk throughput), maybe spin that into its own service. But in my experience, lots of things are just fine being lumped together and don't really exhibit resource use profiles that are all that different from eachother.
Modules help some, but being tied to a deployment cadence with say, hundreds of other developers, can be very painful. Having to delay my team's features because another team broke the build doesn't help my customers. Isolating CI pipelines by team can help that somewhat, but you still end up sharing fate with teams with whom you might have no real interdependencies.
I think this is pretty much it. You can have one microservice per team. Then you can do whatever you want in terms of tools, languages, deployment workflows, etc. As long as you honor your API contracts.
I guess this also depends a lot on the type of product you build. For me, we are building a piece of software that we ship to customers, so the release cadence is limited by many factors outside engineering anyway.
But I can see how on a live service-type product, microservices would help a lot more on this front.
Feature toggles are a solution this. That lets you do feature releases independent of code deployment. Doesn’t work 100% of the time but I can’t imagine doing a large monolith without feature toggles.
Our company migrated for very similar reasons - I'm surprised that people aren't mentioning this pain point more. Maybe the overall execution of microservices is poor, so people generally would rather go back to the old thing.
We deployed to prod 30x a day at Coinbase, precisely for this reason.
It worked great and I was honestly shocked when I worked for other companies that plan their releases like they're shipping shrink-wrapped CDs in 1999.
I've found it to be much harder to manage and deploy multiple projects that it was to deploy a war. It's also been a huge PITA to test the newer architecture because all these serverless artifacts need to be deployed together. What we have now is a distributed monolith, just like the article refers to.
Wouldn't continuous deployment and rolling forward be better than switching your entire application architecture?
In this scenario, you deploy ~30 times a day. If there's a bad commit, you revert it, and then do another deploy. So there's no rollbacks, you don't lose as much velocity, and since deploys are safe and a reverted commit is just a new commit, the revert is safe too.
It took at least 4 hours to deploy if everything went well; not sure if that was policy or an infrastructure limitation, but in any case more than 2 deploys per workday would have been impossible.
But how big are your microservices ? How many did you build ? A monolith that has hundreds of commits is very worth separating it. But do you break it on 10, 100, 1000 microservices ? We have a case where we have 20 people managing 40 microservices. Way too granular. In the general literature there is not good consensus finding boundaries and rightsizing. Many authors just duck the question.
Have a solid, standardized base/framework layer for your services, and a way to roll out changes to it that doesn’t involve manual work for each service. Migrations are the devil. You will have an extraordinarily difficult time evolving cross—cutting concerns like auth, tracing, metrics, profiling, health checking, RPC encoding, discovery and routing, secrets, etc. without it.
And definitely don’t start migrating stuff that matters until you have mature implementations and operations around all those things, and probably many others too.
I've been a software engineer for over 30 years and have dealt with companies always trying to jump on the next bandwagon. One company I worked with tried to move our entire monolith application, which was well architected and worked fine, over to a microservices-based architecture and the result was an unstable, complex mess.
Sometimes, if it's not broke, don't try to "fix" it.
I can say the same regarding a lot of what is going on in the JavaScript ecosystem, where people are trying to replicate stuff that works fine in other languages in JavaScript. Mostly because they are only familiar with JavaScript and don't realize this stuff already exists and doesn't need to be in JavaScript.
Any language that gets into enterprise architect hands, with projects spread around multiple development sites with several consulting agencies, gets their FactoryFactories and such.
No, Java has its own share of problems that it didn't inherit from anywhere. No custom value types, no operator overloading and the distaste for AOT compilation, to name a few. Also culture of code generation instead of using some kind of macro system.
Custom value types will arrive, that is the point of project Valhalla.
C and Go also don't do operator overloading.
Code generation was a thing in C and C++ during the 90's. Borland C++ 1.0 came with a macro library (BIDS), which was later replaced by a new template based version in Borland C++ 2.0.
And the Go culture of //go:generate goes beyond anything that Java has had on its almost 30 years of existence.
AOT compilation exists since the early 2000's. The only distaste is that most developers don't want to pay for their tools, so they rather used the free beer JDK from Sun instead third party vendors. So only big corporations got to buy the JDKs from IBM, Oracle, ExcelsiorJET, Aonix,....
However now AOT free beer exists on OpenJDK, OpenJ9, GraalVM and although not strictly Java, Android.
I love my XML, with comments and static type validation, making my graphical tooling and IDE based editing quite comfortable, while observing YAML and JSON formats catching up on XML features.
I can't imagine your level of cynicism. I've only been at this for ten years, and the number of times I've seen the wheel come full circle and old ideas come back into vogue, the problems with them rediscovered, reactions to those problems, and then the thing that preceded them again take precedence is somewhat depressing. At best I feel like we are grinding ahead a few inches each cycle.
Agree with your last point. And as someone who really liked old school callback/prototype/closure based JavaScript, I can say not only would these people have been better off using a better language, they also ruined JavaScript for the things it was great at.
There are very real reasons not to use DOM methods assuming you're using a front end framework. Not only is the coding style more imperative than say React, but frontend frameworks have been designed to efficiently update the DOM and have diffing algorithms that can check for necessary state changes. Is there really a compelling reason for writing your own DOM manipulation in a sizeable frontend codebase in 2020?
> Is there really a compelling reason for writing your own DOM manipulation in a sizeable frontend codebase in 2020?
It is several orders of magnitude faster. It’s also how I prefer to code because managing state isn’t challenging and I am not hopelessly paralyzed by imperative code.
Maybe they don't want to force their users to download several MB of framework libraries in order to use their website.
DOM manipulation is perfectly fine for interacting with styled documents, the web's forté. If your website amounts to a set of configuration forms and a blog then you probably don't need React.
> people are trying to replicate stuff that works fine in other languages in JavaScript. Mostly because they are only familiar with JavaScript and don't realize this stuff already exists and doesn't need to be in JavaScript.
Do you have a concrete example to illustate this, and what issues it causes?
On the surface I'm not sure I agree, if what you're saying is people wanting a certain feature should switch languages to get it rather than build it into the language they already use.
Well, indeed, Node is a good example of the point I'm making - it provided a full environment for writing programs in JS outside the browser.
There were a plethora of other languages/runtimes available for writing such programs, but Node made this available to people who already knew & liked using JS without having to switch to and/or learn a new language.
It isn't really, though. This is Some Guy's Opinion™. There are many Some Guy's, and there are countless anti-monolith articles being penned at this moment (probably).
People have different experiences with different groups and different tech stacks and different needs. Results may vary.
Just to give my own Some Guy opinion, people fail with so-called microservices when it's not really microservices but instead is a monolith with artificial walls (in the same way that firms do waterfall but pretend that they're agile by having incredibly frequent "scrums" that are nothing but status meetings). When you actually divided into lots of different projects and teams and they each get to construct their own internal world so long as they provide the appropriate robust and documented external API, it can be absolutely liberating. For some projects.
Your next to last sentence is spot on but it requires well defined interfaces and engineering which is orthogonal to middle management’s desire to commodities development.
Sometimes a microservice architecture is the best way to fix your problems. Sometimes it’s the worst. But you’ll only be able to tell after your monolith is, indeed, truly good and busted.
Your architecture is similar to something that we want to get towards. How do you handle standing up all these pieces (or even a subset of these pieces) in the dev environment?
I'm tempted to write a blog post... I bristle a little when microservices are described as "best practice". Monolith vs microservice is really about _people_ and _organizations_. Monoliths make sense in some contexts and microservices in others, but the deciding factor is really the size of the team and number of people working on different functional contexts.
The best analog I can come up with is monoliths in larger organizations are like a manifestation of Amdahl's law. The overhead of communication and synchronization reduces your development throughput. Each additional person does not add one persons worth of throughput when you cross a critical individual count threshold (mythical man month and all that).
I'm not describing this clearly so I should probably actually commit to writing out my thoughts on this in a post describing my experience with this.
Spot on. The metaphor I typically use here is cleaning up a mess vs spreading it around. If you have a really big mess and spend a year or two rearranging it into dozens or hundreds of smaller messes, yes the big obvious mess is gone, but the overall amount of mess has likely gone up and by segregating everything you’ve probably made it much harder to someday get to a clean state.
If you’re moving to microservices because the number of people working on a project is growing too large to manage and you need independent teams, great. If you’re refactoring to microservices because “we’re going to do everything right this time,” this is just big-rewrite-in-disguise.
Whatever engineering quality improvements you’re trying to make—tech stack modernization, test automation, extracting common components, improved reliability, better encapsulation—you’re probably a lot better off picking one problem at a time and tackling it directly, measuring progress and adjusting course, rather than expecting a microservices rewrite to magically solve a bunch of these problems all at once.
>but the overall amount of mess has likely gone up
I don't think this is really the intuitive outcome. Think of cabinets, dressers, shelves etc. They're all basically little messes but they're much easier to deal with than one large mess.
Complexity (as in unintended/unexpected behaviour) varies with N^p where p > 1.0 so having N messages of 1/N size is a definite advantage and does make it easier to clean up the little messes.
It depends on what the messes are. Separating into different services adds significant overhead to addressing cross-cutting concerns.
If the modules of your system are already relatively independent with well-defined interfaces, microservices would be fine and yes would make changes like upgrading the language runtime version easier.
But when I think of messy, tangled, poorly-tested code that prompts people to start talking about needing to refactor to microservices, I’m thinking about different sorts of problems. The messiness I usually see has to do with lots of missing abstractions, lots of low-level code reading and writing directly to files and message buses and databases and datastores instead of going through some clean API. This makes it really hard to change things, because instead of updating some API backend, you have to find and update all the low-level accesses.
Now the problem is, typically when going to microservices, people aren’t looking at the question of, “What common stuff can we pull out to make all our messy code simpler?” They’re taking the existing, messy modules, with lots of cross-cutting shared abstractions dying to get out, calling the existing module a service, and putting a bigger barrier around it.
There are many ways to approach the problem of moving to cleaner, simpler abstractions, and microservices can help. But you can easily go to microservices without addressing all the needless complexity, instead crystallizing that complexity in the process, and many organizations end up doing exactly that.
There are two big reasons to go to microservices (note that the exact definition of microservice can vary a lot).
1. Organizational streamlining. If the team working on the monolith becomes to large, then coordinating and pushing out changes quickly can become incredibly difficult. One rule of thumb I've heard is the two pizzas rule. If two pizzas can't feed the team working on a system, it's time to break up the system.
2. Horizontal scaling. If some components of your workflow require much more computing power than others, then it makes sense to break up your system to move computationally intensive tasks to their own services.
While there are lots of other decent reasons to break up a system, if you can't invoke at least one of the two above reasons, you may be shooting yourself in the foot. I think he's dead on when he points out that if you don't have engineering discipline in the monolith, then you won't have it in the microservices.
I have this idea for a new framework/language. I'm sure if it either already exists or it's a dumb idea in practice but anyways.
You build a monolithic application. Everyone works on the same code base. Things are broken up into modules/classes/packages. From the programmers point of view it's just like working on a standard Java project or something similar.
The magic happens at the method and module boundaries. When the application is first started everything works normally. Methods call other methods using addresses. As the application runs some parts of it become hotter than other parts. At some trigger point an included process starts that spins up 1+ cloud instances. Only the hot code is deployed to the instances. If necessary the instance is load balanced on multiple nodes. You configure the triggers and whatnot as part of the applications config. The framework/language would either come with support for popular cloud services or allow you to create whatever system you need to create the instances.
My hypothetical language/framework would proxy all method calls and remap object instances to the new instance(s). If the extracted code cools down enough it is integrated back into the main monolith. At that point proxying is turned off and the methods use address again.
Using this approach you get the all the advantages of a monolith (interface compatibility checked by compiler, not needed EVERY service writing their own http code, etc). Of course you can't optimize latency as easily and merging is harder with monoliths. There's undoubtedly a hundred other reasons why this is a terrible idea.
Its not a terrible idea but introducing a network boundary adds all sorts of constraints and issues that a normal program flow can safely ignore. Network partitions dont happen within a local system. I’d read up on CAP and Fallacies of Distributed Computing. They’ll more or less explain the challenges.
This isn't really true other than network errors are more likely than a machine getting shut down but you should really be writing your code as if something could go wrong at any moment.
It's a different class of error though. You can have code that ensures transactional integrity everywhere in the case of power failure, but that alone doesn't mean it handles network partitions, 429 responses, timeouts, token expirations, corrupt responses, etc efficiently or correctly.
This reminds me of something I find myself constantly explaining to colleagues again and again: You can scale a monolith in its entirety to handle elevated traffic in one of its "subsystems"; having code paths that aren't receiving traffic doesn't cost anything (at least in the architecture of the systems I look after).
I don't see the value in separating a monolith to allow independent scaling, unless there are wildly different performance demands across its components which makes reliably autoscaling difficult.
I've gone through this thought process too, but where I arrived is that it's kind of the tail wagging the dog. The whole point of microservices (or at least one major one) is so that teams can operate independently without having to use the same framework. Each service provides a fairly static API and other services write code against it, but under the covers each service is relatively free to implement that contract however they want, whatever language, whatever database, whatever scaling strategy, whatever version control, whatever deployment cadence, whatever hardware SKUs, all specific to the service.
So with a hypothetical framework like this, yeah it would make microservices easier (in theory, though there are lots of technical problems too) in terms of "look ma, I made microservices", but it wouldn't actually address any of the problem microservice architectures actually try to solve. So, tail wagging the dog.
Some of the technical problems stem from memory working different from API calls: they can fail, the overhead is much higher (shouldn't call in a big loop), can't pass pointers, global state may differ on remote machine. So an application model that tries to abstract away those differences is bound to have problems. Also deployment: remotes will be temporarily broken if the API changes, until the deployment has fully propagated; a team running a microservice takes a different mindset with respect to API versioning than does a compilation unit. And that's really the crux of the whole thing: microservicing requires an entirely different mindset than monolithing, and approaching one with the mindset of another will cause problems.
Why exactly do you want to independently scale a hot code path? If its hot, its already using most of the resources in your monolith. Take your monolith, distribute it to more servers, and it will reduce the load of your other servers regardless of which parts of your code are causing the load.
Sure you can do that, but it's not as efficient as deploying only the code that is getting hit. Also, it might be taking up most of your resources, but it might not be as well.
For example, let's say in an eCommerce application that the shipping calculator is getting hit a lot. You'd like to be able to scale this independently as a service, so you can handle all the requests without also having to replicate all of the other resources, such as the cart persistence, user sessions, etc. that are a lot more memory intensive.
> For example, let's say in an eCommerce application that the shipping calculator is getting hit a lot. You'd like to be able to scale this independently as a service, so you can handle all the requests without also having to replicate all of the other resources, such as the cart persistence, user sessions, etc. that are a lot more memory intensive.
Assuming you allocate different resources for it. If you're using the same instance type for all your microservices, you aren't benefiting from this. In fact, you're paying for resources you aren't using.
You might even be paying more - allocating high memory instances to high memory services, and regular instances to low memory services that don't use that memory. You might be able to get by with only regular instances if you distributed your services in a monolithic fashion.
In my experience most small teams I've seen aren't that specific with their resources. Unless something is obviously super high memory, like a cache, they tend to just use default instances.
I have worked with a service where we saved some money by splitting them up and specialized the VM SKUs like that. But it's far from as trivial as GP implies. You have to plan out the interface, shake out any shared data (shared in-memory cache etc), design it to not be so chatty, and as with everything perf-related, do lots of testing. So it's not like a GGGP's solution that automatically spins up microservices could "just work".
This was a pretty high volume API at Azure, and the savings after all was said and done was sadly only around 15K/mo, so it'll take a while to pay for itself. So, I'd say for the average website this should not even factor into consideration.
(The other thing I forgot to mention was having a good consistent hashing strategy in the load balancer, so that the instances of each service consistently speak to the same instances of the other service, even as machines are spun up and down, but still maintains an even spread of load. This helps greatly in terms of allowing each instance to cache only the data for requests that target that instance. With a round-robin load balancer you end up with a lot more variation in your incoming requests, and thus require either more memory or expect more cache misses. That was probably the hardest part of the project, since the built-in load balancers don't have the specific functionality we needed).
I don't personally have much experience on this area, but something as valuable as this holy grail of automatic parallelization (be it threading or distribution), i.e. "just write the same good old code and the system will be smart and parallelize it" is bound to be something that people have been trying forever and failing, at least on the general case.
To begin with, anything with side effects is mostly a non-starter. You would need some way to annotate the boundaries where side effects can not happen, and this already requires considerable refactoring effort.
Even if you ignore this and suppose you are working with a purely functional language (this is probably what the "You just invented Erlang and OTP " meant), the overhead of the "smartness" can be pretty big. If you see a "map" over a list, you know you can parallelize it, but which one of the 100 "map"s should you really parallelize? If you try to be too smart, you will burn a lot of effort and may not get a payoff. If you try to be just a little big smart and you choose the wrong one you shuffle a lot of data for nothing.
In some way it seems to me a lot of today's big data systems like (surprise) MapReduce and Spark do exactly this, they offer ways to explicitly mark those boundaries where you want your program to be parallel (and require that you obey their rules about side-effects there) and contain some of this "smartness" for how to distribute your data.
Even in OpenMP you also have something like this, you can easily say "this piece of code is a task and the runtime can decide to run it in parallel", but you need to also tell it all the inputs, outputs, etc. (since figuring this out automatically is hard if not impossible) so it doesn't fit very well in a "big ball of mud" project.
In the (near?) future as computing power is more abundant and cheaper but the gap between sequential and parallel computing continues growing ever wider I can see those general "smart" approaches paying off, even if they are much worse than hand-optimized code. It only needs to be better than the average programmer.
So I've never actually done this, but I think this makes sense:
You can just structure your code in this way -- divide hard boundaries in your monolith. Segment things apart the same as they would be in microservices. Have a collection of methods for accessing each segment of code, and don't allow calling anything but that collection (API) from other parts of the codebase.
Set up monitoring/logging. If a segment of your code is using a huge amount of resources, it'll now be trivial to pull that segment out into its own microservice because you already have a defined API and hard boundaries.
I would also add cost savings and ops excellence. Having individual business processes broken up into separate services can often lead to being able to tune individual services and allocate just the resources it needs. Especially when used with containers. It's also easier to spot offending commits.
A monolith is hard to tune and often ends up being a money pit.
Regarding point 1, why is coordination required? I think that continuous deployment, where you're integrating dozens of times per day solves this problem much better.
In large, distributed teams working on very large monoliths, it's pretty easy to end up with conflicting changes. In such monoliths, the testing process also tends to be long. So you run your tests, get a pass, but someone else merges in before you and there's a conflict. You resolve the conflict quickly (if you're lucky. Some conflicts are not easy to resolve), you rerun your test suite, only to find out that someone else has merged ahead of you again and you have to go through the loop one more time. And all of this assumes that no one breaks the pipeline.
At this point, many teams institute a merge queue. Which works only until more devs are added to the teams, which makes the merge queue very long and it can take several days in the ideal case to get things merged.
I wouldn't say the journey was completely pointless, because the fact that we had to deploy 10+ services to make a single environment whole required us to build extremely powerful CI/CD management tools that we happen to be able to re-use in the (new) monolith case today. This journey was also a really good growth and learning opportunity for the team. Everyone who has touched this project and has seen both ends of the distributed<=>monolith spectrum is now radicalized towards preferring the monolith approach.
On the trip back into a monolith, we didn't just stop with the binary outputs of our codebase. We also made the entire codebase a monorepo. We have a single solution (VS2019) within that monorepo which tracks all of our projects. Prior, we had upwards of 15 different repositories to keep track of. Being able to right-click on a type, select "View all References" and legitimately get every possible reference to that type across the entire enterprise is the most powerful thing I have yet to see in my career.
My experience with microservices has been just a shift in worries. I don't worry about scale or ssh configs but I rather worry about cloudformation and cloudwatch or billing impact. It has also been some challenge to get testing locally to work easily and there have been quite a lot of meetings and discussions used up on that alone. I don't find the microservice pitch from a developer perspective to be easier at all, actually harder overall. I do like the approach of gcloud or elastic bean stalk better for cloud as you have an auto-scale but still can do local testing easily for a monolith approach. The use case for microservices IMO is more like you have a couple of highly-used sets of functionality that are disproportionate to your monolith and can be split out to save money but not to build everything around microservices and pretend everything is easier. Personally I feel my cognitive load increases when using microservices purely.
Designing an application from scratch where pure microservices is implemented is in my opinion the same as over
engineering possible future performance issues.
Splitting up your application in many services requires a lot of thinking and designing. Challenges with syncing, communication etc are not always easy to deal with.
That's why I agree to start as a monolith but with architectural principles to still have multi modules/components. But I would for example never split up the database in multiple.
I'm totally with you here, but from my experience with a chaotic team, people end up doing weird things when they have access to the full database from every microservice: say you have an authorization service and several applications with a public API that rely on tokens signed by that service.
Suddenly we get a requirement to automatically generate a user account when something happens in another application.
I didn't look what my devs were doing for a second, and suddenly the offending application does an insert on the users table. The developer even went the extra mile to copy-paste the token generation method into their code.
I ended up restricting database access, but wouldn't have thought that was necessary. To me it was obvious all services should only communicate via their public API, but I guess that's not so much of a no-brainer as I thought.
> The developer even went the extra mile to copy-paste the token generation method into their code.
I had to make an authorization service and the idea to use a single authorizer to handle token generation and authorization was shot down by management due to worry about "lambda startup times". I complained that the startup time is less than a second (for nodejs) and honestly would not be an issue. They gave the task to someone else to have the token generated in their service. The developer did it by copying the code I'd written into their own service verbatim.
This is why I don't like microservices as we do them. Management would rather we wrote small programs with a lot of duplicated functionality in many repos instead of writing a large program where we can enforce some discipline. This is also better for them because they meet with us individually to ask for functionality rather than have design or architecture meetings where we can push back on implementation details.
Why is management making engineering and architectural decisions? Reeks of micromanagement. Tell them to solve people problems and let engineers solve technical problems. Update your resume regardless of outcome.
I did this. I built a big service microservice first, with the single caveat that I wrote a local Python version that did everything with fake data in memory first, just to test it out.
I'm very happy with this approach.
> Splitting up your application in many services requires a lot of thinking and designing.
In my experience, you spend a bit of time asking "What does my project do, and how do I decompose it?" and that's about it. The same thing you'd do with a single binary that you decompose into modules.
> Challenges with syncing, communication etc are not always easy to deal with.
Depends on what you're doing certainly. For me, I'm basically doing an ETL and analytics pipeline, so I have no issues there.
I have found it much easier to reason about code and boundaries. I did a ton of experimentation with code and services (I didn't know how to build it when I started, so there was tons of iterationt) and the ability to just rewrite any service fairly trivially helped a ton.
I haven't had to do any "splitting". Merging services is trivial, splitting is not. I'd much rather say "Ah, the synchronization here is too hard, I'll shove it into one process" than "How the hell am I going to scale these two modules separately with all of this shared memory between them?".
I've gone through a "We have to start splitting things for reliability/performance/conway's law" and it's years of very difficult, dangerous work.
It's really a "to each their own" but I don't think I overengineered things at all. Microservices just make sense for my use case.
I think the underlying point, as expressed by the author, is that trendy new architecture patterns will never be a panacea for bad engineering, though that's often how they're implicitly sold as ideas.
> architecture patterns will never be a panacea for bad engineering
That's undeniably true; but, it's too easy to slip from that to "good engineering doesn't require good architecture, good technology, etc." I think it has to be stressed that adapting the big picture features of a solution is as important as adapting the finer features.
It is actually sold as a way to fix bad engineering? I've literally never heard that before. I think almost everyone knows that microservices are hard.
On the other hand they are often sold as a way to increase developer velocity. And I do sometimes wonder if that is the case (based on personal experience).
We did a massive rewrite of a 7 year old giant code base to microservices gradually over a couple years. One of the only few positives that we gained from it was that when we crashed we didn't bring down everything at once. The biggest negative was that we ended up with an extremely noisy event bus that made backups and diagnosing out of sync issues very problematic.
Yes, it is a pretty common saying that it forces separation of concerns and avoids the spaghetti that monoliths end up like.
People are selling it as easy. You can deploy them independently, each service is small with low complexity.
That it is not that easy in reality and that the total complexity increase is so big is not told as often. There is a team at my work with 6 developers developing something that processes a couple of gigs of data each day and has a front end serving maybe a hundred users total, I would guess 20 would be logged in at the same time.
Kubernetes, 10 microservices, 2 different types of databases.
I often agree with Kelsey Hightower, but there are so many things he doesn't mention here. For example, being able to independently deploy components frees up certain kinds of development worklows. Distributed components also scale and fail independently, and you can use nifty things like message queues between them to provide resilience and soak up load spikes. I'm sure the pattern has often been applied in the wrong use cases, and that many people have over-applied it, but "the monolith is the future" seems just as wrong as "microservices are the future." We are nowhere near the size of a large bank... or even a small bank, and yet we've benefited from a distributed set of independently deployable and scalable components. You can call them microservices, or not. I can think of ways we could restructure on a monolithic backend, but just noodling on the idea leaves me with more constraints than benefits. Idk, it's a thought-provoking statement at least, but I sort of wish we'd stop reacting to fads with anti-fads.
Being able to independently deploy components is only useful if you can test the deployed components in isolation. This requires extremely well thought out interfaces and is really what makes the components independent.
It seems in many cases teams have decided to port their highly coupled monoliths over to highly coupled distributed monoliths and now they have the worst of both worlds.
Why do we need to choose one of monolith and microservices? What about simply "services"? Monolith doesn't have to be split into 50 microservices, it can be split to 3 services
I think there is a lot of grey area in this debate of microservice vs monolith. There are more aspects to this than how many executables you need to start before production is up.
According to most of the intent of the definition of "microservice", we do have many of these within our monolith. There is a nice big folder in our core project called "Services" and within this lurks such things as UserService, SettingService, TracingService, etc. Each with their own isolated implementation stack+tests, but common models and development policies. All of these services are simply injected into the DI of the hosting application. We are injecting approximately 120 dependencies into our core project and it is working flawlessly for us in terms of scalability and ease of implementation. Microsoft DI in AspNetCore is awesome. For those trickier cases we will usually just pass IServiceCollection to whatever needs to arbitrarily reference the bucket of injected services (e.g. rules engines).
I think you can have the best of both microservices and monoliths at the same time if you are clever with your architecture and code contracts.
Yeah, I don't have much experience with microservices but the one time I worked with them they were a nightmare and I think the major cause was there were too bloody many of them.
I feel like if you end up talking about eventual consistency or needing ids to be passed around you've surely built it wrong and you've split what should be a single service into a mess just to feel cool having more services and needing the hyped new tools to manage them.
The touted benefit that you can scale the bottleneck separately also suggest the need for 1 monolith at most 1 or 2 services. If you have more than that many bottlenecks the code is doomed anyway presumably?
Yeah, seems like people like extremes. I've seen a website split into multiple services that are largely independent and to make things manageable between teams the monoliths were split into smaller libraries.
The big thing that people often underestimate is the complexity of facilitating communication between micro services. Not only you need to plan the API well, but also figure out the routing and you also get overhead.
No, tiers comprise slicing the application horizontally by technology, basically, ie. "All database stuff here" and "all frontend stuff here". To me that's like "let's put all the house sinks in one room to ease repairs for the craftsman".
Services is more like feature folders? Where you define everything you need related to a VERTICALLY sliced part of your application, ie. Products, or whatever.
What's really going on here is that a remote procedure call (RPC) to a microservice or REST API is conceptually equivalent to calling a function in a library specified by an interface in a header file. There is an incredible amount of handwaving that obfuscates minutia around synchronous blocking vs asynchronous callbacks/promises/async-await but there is no reason why we can't convert from the distributed to local paradigm losslessly.
What I'm not seeing is any attempt to go in the opposite direction. A compiler should be able to look at ordinary code and slice it up into microservices automagically, converting the header interfaces to API specifications like OpenAPI/Swagger. We should literally be able to write a monolithic program in any functional or C-style imperative language and get a conversion to a bunch of lambda functions. If that doesn't work, then something is seriously wrong (probably having to do with determinism, like inadequate exception handling for timeouts, etc).
So frankly, the first day I saw lambdas, I was skeptical. I don't understand the point of writing all of the glue code by hand. Incidentally, I reached this same conclusion after manually building a large REST API around the JSON API standard just before GraphQL went mainstream and made a mockery of my efforts.
I think that the HTTP spec and things like separation of concerns serve a purpose for human readability. But we're well past the point where the gains made by the early internet are providing dividends in today's highly-interoperating stuff like Rust, Go and Node.js. Basically 90% of the work done today would be considered a waste of time (bike shedding and cargo culting) in the 1980s and 1990s. Just my two cents.
The author does not seem to understand when to correctly apply microservices. There are two basic use cases: 1) Different parts of your solution have different load patterns and it is economically beneficial to scale them at different rates and 2) Different teams need to be able to work & ship autonomously. It's not at all about technical merits or architectural beauty. It's about people and costs.
I kind of think the items (1) and (2) you list don't automatically mean micro-services, so much as they mean separation of concerns can be beneficial.
Isn't there room for a middle ground with modularity that can live in between a full blown monolith or a full blown microservices pattern, particularly for operations that are more medium scale?
Yea just take the ideas that apply to your problem domain and implement them in a way that's sane. Design/architecture patterns are much more useful as templates to be specialized for your problem.
I really don't like the dogmatic view of architectures. Leaves no room for craftsmanship, and it's only really useful for creating code monkeys that have to follow a spec and need to be interchangeable cogs in the machine.
I think people sometimes forget that a healthy level of pragmatism is what keeps shipping. Just because someone said "microservices is the new all" you don't need to do it. Just because some said "monoliths are the future" it does not have to be true for you.
Take all these as case studies and solutions that apparently worked under specific conditions.
No, but the dude works at google and has written that and other books about Kubernetes so obviously he knows about scaling parts of a system and huge organizations working on software together.
That is not what he is commenting on here though. It is that microservice architecture is starting to become the default pattern on how to build applications for a lot of people. Instead it should be an exception when you reach those very specific problems that most people don't have.
I didn't declare their argument won, lost or invalid. I did suggest that they consider the bona fides of the person whose opinion they were tossing off as uneducated.
> I did suggest that they consider the bona fides of the person whose opinion they were tossing off as uneducated.
It would be more constructive to give reasons why one argument is better than another though, rather than resorting to status, and you did not reference any of the commenter’s points.
> Thanks for the downvote, though.
I don’t have the karma required to give a downvote. I’m not sure who you should thank.
Honestly, you're missing or avoiding my point: I'm not deferring to the author because he's respected or has a popular blog.
I am suggesting that the OP has little business writing off a legitimate expert's opinion in a domain where they are highly qualified to comment. This isn't controversial.
If you had the karma to downvote, would you give a hard time to the OP who started with "The author does not seem to understand when to correctly apply microservices."?
I have seen monoliths successfully transition parts of their functionality into small services. I have not seen a microservice-first approach work very well. When you're building something new, your intuitions about which parts are going to be tightly coupled and which parts are going to be relatively independent are just guesswork.
Once you've iterated on a monolith enough to see which parts are relatively independent and would actually benefit from decoupling, then you can move them into separate services.
One example that comes to mind: I wrote a recommendation service that also handled user feedback events. This was the easiest way to start. After about a year I saw that we were iterating faster on the event processing than on the actual rec delivery. We were also deploying this monolith across more machines mostly to scale up event handling capacity. So we broke the high volume event handling out into a separate service that was smaller and optimized exclusively for event processing.
With respect to the author, who probably is a much smarter person than I am, this is yet another in a long, long series of HN articles that should be grouped under "I don't know what the hell X is, but I was an expert in it, and I can tell you it sucks"
I've seen X be a dozen things: UML, databases, User Stories, Functional Programming, Testing... It's too much to list.
Yes. If you do it that way it will hurt, and you should stop. I don't know this author, but I suspect that many people who jump into microservices are not getting the foundations they need to successful. The idea that microservices are just broken-up monoliths is a big clue. They're spot on about marketing and spend, though. In this community we're quick to hype and sell things to one another whether it's a good idea or not.
I've seen some great criticisms of microservices, some of which made me pause. Now, however, I think there's a reasonable way through the obstacles. It doesn't have to be a mess. Nothing is a magic bullet, but about anything will work if your game is good enough. You don't buy a bright and shiny to make your game better. Doesn't work like that.
In my opinion a lot of the problems i have seen people having with micro-services, is because they jump into it way too naively without a proper assessment, understanding of the tools and planning. As well as basically expecting magic to solve all the worlds problems.
For example some time ago, I talked with devs that were about to change their monolith to micro-services. I pointed out that having decentralized the data is going to be tricky deal with. It got immediately dismissed as not a problem, because all the services are completely independent. Couple of months later they were struggling hard, because, turns out, a business needs to be able to ask questions about all its data, not just per service.
Sure, a problem that can be fixed. But I got the impression that they haven't spent 10 minutes looking at potential downsides of their decision before making it. In a similar vain, people that equate monolith with spaghetti code and then end up with a spaghetti system almost immediately.
Yes, the problem is that all these things are sold as magic bullets -- or worse -- that you current solution is sold as obsolete garbage. The word "monolith" has a negative connotation so obviously you can't have that and so you need something else.
When microservices started to catch on, it was just a name given to a really good solution to a specific problem. And there are plenty of problems for which creating an independent service is a great way to manage both technological and organizational issues. But it doesn't just magically solve those issues -- you can't apply that model to everything just because -- it has to fit the problem space.
I had a conversation on Reddit with a developer whose application had over 3,000 independent micro-services. He was very proud of this solution. But I can't imagine that could be anything but a monolith with function calls replaced with network I/O.
Yes. It wouldn't surprise me if it's reached the point where 90%+ of people saying they're doing microservices are making disasters. There's just so much stuff you have to un-learn, and that's uncomfortable.
I consulted with a team last year that was moving to microserivces. They bought BigToolX and had already created a disaster ... and they weren't even through their design. Most all of what they were doing was just best practices in some other paradigm. It was painful.
Whenever I fall in love too much with a technology, I get paranoid. There's usually something I'm missing.
I've seen some great criticisms of microservices, some of which made me pause. Now, however, I think there's a reasonable way through the obstacles. It doesn't have to be a mess. Nothing is a magic bullet, but about anything will work if your game is good enough. You don't buy a bright and shiny to make your game better. Doesn't work like that.
With all due respect, the argument that microservices can work is not an argument for doing microservices instead of a monolith.
By default a monolith is simpler, lower latency, has lower operational costs (RPCs are not actually free!), tends to be easier to refactor, leads to less duplication of code, and has better tools for traceability. (Do not underestimate the value of stack backtraces!) With best practices (that few do), all of these problems except the latency one are solvable with microservices. But you should not expect to solve them in most organizations.
Given this, you should only adopt microservices if they solve a real problem. For example if your codebase is too big for a single server to hold it, or you need extreme horizontal scalability, microservices can be wonderful. But most people using microservices do not actually have those problems. Most organizations that are trying to use microservices would be better off with monoliths. Eventually reality will settle in and they will realize it.
Incidentally this is not a new debate. At its heart microservices vs monolithic is the same as microkernel vs monolithic kernel. It is worth reading https://yarchive.net/comp/microkernels.html for Linus' criticism of microkernels - much if it applies directly to most microservices deployments.
I work on a project where dev teams (plural) occupy time zones spanning 16h. Having a very hard boundary between service concerns helps immensely with remaining sane even if the project could be rolled into a monolith from the perspective of code or data complexity.
There are ways to get such boundaries without having separate microservices.
Also having dev teams across time zones is itself a challenge. The devs are cheaper, but integration is worse. In Steve McConnell's book Software Estimation the industry average seems to be that development is over 40% longer, and defect rates also go up.
“Nothing is a magic bullet, but about anything will work if your game is good enough. You don't buy a bright and shiny to make your game better. Doesn't work like that.”
This 100x. If you aren’t able to maintain a monolith you will most likely mess up microservices too. Every approach has its own set of trade offs and problems. if you know what you are doing you can make things work.
Kelsey Hightower is a rather pivotal person in the Kubernetes world. It's unusual that he's basically cautioning people not to use the system he's so involved in. His point is that many people are doing microservices wrong
Kubernetes is a deployment strategy. It should be orthogonal to microservices.
I'll delete the comment if I was unnecessarily cruel or missed the sarcasm. It was not intentional. But it is important to understand that you want to think of persistence and deployment coupling as independently of your microservices strategy as possible. The vast majority of problems we see with people implementing microservices is people carrying baggage over from some previous project or pet technology. K8S's great. It's just not relevant here.
You're right, though k8s is often associated with microservices you can deploy a monolith with it. But there's a disconnect where as an expert in associated areas he's saying people aren't doing microservices properly, and you're saying just do them properly.
It is very relevant though as it has become so tightly connected with microservices and if you are one of the most well known people in the Kubernetes world you will see a lot of applications that should not be microservices.
I think Kubernetes can be used well with monoliths that are actually more monorepos. You keep the code and dependencies in one place, and then use the service definitions within your cluster to define your boundaries. (Imagine an application where your front end, back end, and background workers all share some objects, but not all of them)
A typical UML-bashing article/comment doesn’t suggest some potentially superior alternative, but takes the “everything is awful” stance.
The text of the article reads in that same vein, but given that the title is “Monoliths are the Future” I think the author’s original intent was to describe the advantages of monoliths and point out that microservices have advantages only in relatively rare cases. Too bad they made the article about microservices instead of monoliths.
Sounds to me like a good opportunity for the author to write a followup post.
"who probably is a much smarter person than I am" -- I have a different take-away from the quality of that writing. I'm also no fan of the kubernetes code.
It's okay to ship a bunch of services together, if you can be serious about keeping hard boundaries between subsystems. Microservices force you to do this (e.g., your microservices might have to communicate via REST APIs, but they can't access eachother's internal implementation details).
Your customers do not care about your monolith. They don't see a monolith; all they see is features. Untangling it may or may not be the right choice.
In a certain set of situations, the path forward, instead of trying to untangle your monolith is --if you so desire-- create new services actually be true microservices, and keep your monolith as-is.
I agree with your words but the realist in me has to point out that microservices don't force you to do anything, it's a pattern not a highly opinionated and restrictive framework.
There are plenty of clusterfuck hybrids out there with services sharing database state etc. Anything can be an antipattern when you add people into the mix.
I've settled on a compromise in this debate. Halfway between monoliths and microservices is the shared-library model. Instead of creating a microservice for your image processing, break it out into a standalone NPM or Composer or whatever module, then use that in your monolith. Gives you good separation of code and responsibilities, gives you good upgrade paths for your monoliths, avoids the overhead of microservices.
Sorry, but I don't think that's part of the 12-factor app. The 12-factor guideline is that apps shouldn't be dependent on implicit dependencies (I dunno, like having imagemagic installed on the system), but instead should use a dependency management system (like NPM or Composer). So I don't think the statement "shared code is an anti-pattern" is true. But I could be wrong, haven't read the 12 factors in a while...
This is based on the assumption that the purpose of microservices is to split up code. And that isn't a good reason to use microservcies, because it is possible to modularize code and still combine in into a monolith. But for me the purpose of microservices is to isolate different pieces of functionality in order to increase stability. Microservices allow you to scale the resources available to each service independently and allocate the appropriate resource for each service, deploy changes to individual services at a time, rather than changing the entire application at once, have different SLAs for different services, and if done right can gracefully handle failure modes in one service without taking down the whole application.
The article touched on it and I've experienced the same thing... The added complexity of having to manage 10's to 100's of different code bases, pipelines and deployment concerns is a huge downside and should be considered before adopting the new shiny.
One thing a microservice architecture does really well is enforce bounded contexts. Oh you want to access that data? Well you need to go through the public API because it exists in a separate process. In a monolith it's all too easy to just 'call this piece of code and grab what I need' (no one will ever know). Project isolation can help but it's not a silver bullet.
The author makes good points though, there are many places doing microservices because it's the hip thing to do and a monolith would easily suffice. But if you have independent software teams in your org that should be able to deploy code independently, then microservices makes a lot of sense.
That is true and when you are at big enough scale it probably works.
But when you have chosen a cool new microservice architecture for your team to implement and you grab that small user story that spans 3-4 different services things suddenly went from. "Hey, easy implementation and refactor and the compiler will tell me if I fucked up" to something much more time consuming and error prone.
In an ideal world that would not happen of course. Just like it in an ideal world a monolith is built correctly as well.
The actor model also gives you a good amount of isolation between your contexts (no public API, just async message passing). It gives you the advantage of running everything on a single node as long as it's feasible without having to go through some `data -> JSON -> HTTP -> JSON -> data` route and still when you need to scale out, it is easily and transparently done.
as always the truth is somewhere in the middle. monoliths make a lot of sense when you're starting out and you can see all your code in one place and you can build, test and deploy everything together. as the service grows there are arguments to be made around splitting it (based on usage patterns, loads, etc).
the things that most people don't get is that: microservices are not free (now you're doing all this devops stuff N times and you have to think long and hard about changes that need to happen across api boundaries). The anti-pattern is that you take your monolith and you split it in 10 but apart from actually doing all this work you still treat it as a monolith (ie you still do mono-repo because it's convenient, the deployment still happens at the same time for all services, you centralize everything when it comes to logging and metrics and you even force people to do things in a certain way when it comes to their service). Everything grinds to a halt and now you're more concerned about "growing" the team to fix the issues that popped up and maybe chasing the new shiny thing to keep your resume up-to-date. Even worse people start feeling like they "own" their service and now the decisions that are made are maybe locally optimal but who cares about global optimization.
So my take is: start with a monolith and in 85% of the cases you'll be just fine forever. you don't need all the bells and whistles to get the job done. Introduce new things only so solve actual pain-points and when you do actually thing through what it means to introduce them (so go N->N+1 and never 1->N)
This is the greatest illusion of our time. We tend to think of all things in the world as "middle" like apples and oranges. If this is your viewpoint then you are biased, the truth has equal probability in being in all extremes just as well as the middle.
I agree that for most startups, monoliths are all you need. But at some point, as your engineering organisation keeps growing in size, there are other benefits to microservices. Benefits not addressed in the article.
Data isolation. Allow individual teams/services to own their own data stores, and prevent any other team/package/service from reading or writing to their data store, and inadvertently breaking the associated invariants.
Performance isolation. Prevent one team/feature hogging too much memory/cpu/io, and negatively impacting every other team as well. Debugging performance hogs in a sufficiently large monolith becomes infeasible at a certain point.
Deployment isolation. Allow individual teams to made code updates and deployments whenever they want, without having to be tied down by a company-wide deployment process.
Language/dependency isolation. Allow different teams to use whatever language, dependencies, and dependency versions make most sense, for their use case.
At bigger companies that have hundreds or thousands of engineers, monoliths simply do not scale, and need to be broken down into more manageable pieces. It's unfortunate that smaller companies start cargo-culting these same practices without thinking critically about whether they actually need them.
I don't think we're facing new problems with cloud computing and large systems. We're facing the same problems we've been faced with again and again since the very beginning of computer science. Over time, the ease and scale of what you can do with computer resources increases and we have to organize and understand the existing resources we have.
But we keep reinventing the same solutions at each scale. At one time we had to invent functions to enforce segregation of responsibilities and create abstractions and shorthand. We had to group these together in modules and libraries. We had this clump of programs running on a computer that we had to organize into an operating system. Now an operating system is nearly a program or function and people are regurgitating the Unix philosophy and the end-to-end principle like it's a new thing. In the end, we're going to wind up with a well-architechted series of integrated microservices which present a comprehensible interface to users through a handful of abstractions that have proven useful over the years.
Computing is cheap enough that we can now talk about meta-computing, a higher level of abstraction from a computer, which is multiple layers of abstraction on top of eachother. Now we just have to build the next layer. And I think it will basically be a sort of meta-operating system. The same things, but we'll call it "orchestration" and "microservices" instead of a file/process/whatever manager and threads.
At the moment, however, we're still offering piecemeal services and products and so we don't have many fully formed concepts of what it is to build a cloud system. So things are still a bit chaotic, but at some point in near future we'll get there.
A lot of people seems to think that monoliths are a bad thing and microservives a good thing. That isn't the case. There is nothing wrong with monoliths.
Trying to make everything work as microservives just for the sake of it, or because it sounds cool is just a terrible idea.
Start out with a monolith, and if you later see a need to create a microservive, then do it, when you have more knowledge about the bounderies of the service.
I love creating high performance services and playing with containers. It sure is cool with microservives that can scale linearly over a lot of machines. I also enjoy using the latest frameworks.
But guess what, my first ever service is just using a cheap dedicated server, serves an average of ~250 highly dynamic webpages each second while still using less than 7 % CPU, on PHP and MariaDB. Last 12 years have resulted in about 6 hours of downtime. A couple of hours planned, a couple as a result of denial of service attacks and a couple when there was a power issue at the datacenter.
So what I'm trying to say is that more complicated doesn't mean that it's better.
Some are best served with aggregates; some with monoliths.
For myself, I have always developed in a "layered," and "modular" manner, with discrete subprojects; each, given its own configuration management and lifecycle. The resultant applications tend to be "monolithic," but some are parts of a larger, loosely-connected architecture.
My approach to this is pretty simple. For existing systems, don't refactor from one to the other unnecessarily. If you do decide to refractor, do it in small steps, one piece at a time.
There are huge advantages to both patterns. For newer systems, if there's a clear enough split such as "backend" and "frontend" (where frontend is a statically-hosted SPA) then it could be advantageous to keep the codebases and deployments separate.
If data is shared between services, then keeping the code to interact with the data all in one service is likely most useful.
I like to use a few services, with one often ending up being the large "monolith" potentially with a few supporting microservices on the side as it makes sense. "As it makes sense" means that the service has a specific individual encapsulated concern. Billing could be a good example, depending on how it integrates with the rest of the system.
I find microservices very useful to encapsulate independent concerns and for experimentation (don't want to rewrite the whole app using some new tech, but the billing service is small enough to give it a shot). The main problem points are the glue that holds it all together, duplicating code shared between services, and changing apis / data schema.
Ultimately, it's best to know what you and your team is/will be most comfortable with managing based on everyone's skillsets and the product at hand. If you spend time to understand the differences between the patterns in practice, and remain realistic about the advantages and disadvantages of both, you can arrive at an informed decision that works well for your team.
And lastly, make sure you pick something and then build your product. These details don't mean anything to your customers. If you made the wrong choice, you'll know when it's the right time to switch.
I think most people who flock to Micro Services are looking for a better design / architecture choice for software. The thing that Micro Service can teach you is Single Responsibility Principle and learning to segregate responsibility. You need to define a clear scope of the 'modules' in your system.
At the end of the day if you don't architect your system correctly Monolith / Micro Services won't help you.
For me and my team now I have 1 ideaology regarding this topic. I don't care whether it's monolith or micro service. As long as I can have clear segregation of responsibility between the different modules. Our company now has a monolith (core banking app) that has modules that handle their own responsibility and communicate whether its over http or internal communication bus we developed it doesn't matter. We can easily move modules out into a separate service if we need.
What determines the factor of whether we move things into it's own service? A few things. If we need to deploy / scale something independently we will decide to take on the overhead and move things out into their own external service. Or if something has a specific security requirement that will increase the complexity of the overall system we will isolate that and deploy it separately. Otherwise we keep things as a monolith. For example in Banking there are many things like the ledger / transaction data that are highly sensitve that require certain security requirements like being hosted on a cloud that has certain standards. We will deploy this part on GCP. But
People seem to love to stereotype and find a one solution fits all. There is no such thing. Everything in engineering requires a deep level of understanding of the problem and making choices and the problems present itself.
I believe most apps can start their life out as a monolith, and can grow and divide as needed. There just isn't a one size fits all for anything in tech. That's what I've learned.
I think the monolith vs microservice question is more about organizational needs than anything else. Technically neither is superior, and the debate pales compared to the need to only hire quality developers. Good developers can make either pattern work, and bad developers can break either pattern.
The argument I have always heard is that microservices are a way to solve problems arising in large organizations with multiple teams trying to push code all in to the same repository and managing the deployments from such an organization. I get why separating the different teams' services in to separate silos helps, but the fact that the solution puts a network between the different parts where there was none feels wrong. You are trading one problem for another problem. Networks are slower and less reliable than calling a function in the same application.
Microservices may be the solution to this problem right now, but I believe someone is going to come up with some other solution (tooling etc) that allows you to get the benefits of migrating to microservices without having to add a unnecessary network layer just to solve an organizational problem.
Last monolith I worked on was 6.5M LOC FinTech SaaS product.
It was extremely hard to work with from development perspective.
- Every change you made could break things elsewhere in a surprising way.
- Deploying changes was a nightmare - we had volounteer teams be on daily rotations of merging because merging was so incredibly difficult.
- Different teams and groups of teams would acquire this tribal knowledge of how to do things in their corner of the system. You needed to acquire the tribal knowledge before you could start to work in that region of code.
- Build times were atrocious! Sometimes folks would come up with a way to only build a part of the application and that would be considered innovation. "hey, 15 minute build instead of 1h!"
- Our QA's were STRESSED
Today I am convinced that the product was several applications masquerading as one. I am not willing to subject myself to that again.
Modular Monoliths would be a healthy middle-ground between monoliths and micro-services.
My comment on the same topic of modular monoliths is here: https://news.ycombinator.com/item?id=21853902
I'm happy to see this article on the front page it certainly resonates with my perspective - however I'm not sure it's really an either-or situation.
One thing I've noticed is that big tech is taking advantage of giant mono-repos, while everyone else is stuck with 10s-100s of git repositories haphazardly connected and managed. For example - most off-the-shelf CI systems and VCS platforms smaller organizations are using are per-repository (GH, GH Issues, CircleCI, etc).
Managing micro-services would be a far easier task when all of the services (and infrastructure as code) live in the same repository, changes can be staged across multiple services at once, and tests are automatically ran for only the necessary dependencies.
Are there solutions for effective mono-repo management outside of FAANG? Am I wrong? :)
Not sure where in history people started to believe that micro-service architecture is a simpler and easier-to-operate architecture. Each service needs the same operation overhead as one monolith.
The big blocker most monolith faces as the application gets bigger and is deployed into more and more machines is that _releases becomes a bottleneck_. Scaling monolith's applications are difficult because partial rollout is usually not possible as "services" are often tightly coupled.
Micro-service architecture forces services behind a set of APIs. While the APIs may have breaking changes, each can be independently deployed. In other words, teams can do releases at their own pace.
The main cost-benefit analysis here is how important is independent releases vs the cost of operational overhead?
If you’re choosing to hop the network as a means of decoupling code that’s absurd because you can decouple much more cheaply with PL and build facilities — without introducing a network hop.
But there are problems that justify a service. Decoupling code is just not one of those problems.
I call B.S.. He's conflating two things. One is a deployment infrastructure, another is an application pattern. I can deploy a monolith in k8s and I can deploy micro-services on a single server or a fleet of on-prem servers using any of the legacy dev-ops deployment automation tools. Sure k8s makes doing micro-services easier because it’s got a lot of the raw building blocks necessary to handle the CD process but in no way are the two concepts related. They’re tackling different problems. He's making the same mistake that he's criticizing others for by equating the two. What he is really describing is an organization doing a “big bang” refactor with poor planning and execution.
I was trying to find that article someone wrote about how everyone doing microservices should have about 500 databases now. (actually I can't remember fully what that article was) But I was just hoping that whoever wrote it could provide details as it's clear the pendulum is definitely swinging back to monoliths.
Also I feel like all tech goes this way. Years ago to do "big data" you had drill, kafka, HDFS, pick your cloudera or hortonworks, roll up your HBase, your storm, spark - hire a team to install it.
It seems like now all we do is purchase Elastic cloud, and write a one-off spark script or pandas job and call it a f*cking night.
90% couldn't write a good monolith. 90% can't write good micro-services. Both share the same problem of modularity and separation of concerns. One comes with much lower development and operational costs for the competent. The other hides and creates new problems. I've developed on both architectures and there are use cases for each. It's never all or nothing. I can't even imagine how many engineers would be needed now to maintain the solid money making monoliths I've worked on with teams up to 8 over a decade ago. When the money dries up we'll be back on the monolith train.
I wish we could appreciate the middle ground more & stop evangelising extremes. Our work is complex & deserves considered approaches that don't always fall within the current zeitgeist of our industry.
So many problems arise from the idea that "we are going to do this thing to the maximum extreme and exclusion of other possibilities."
Just look at the US legislature right now. Anyway, it doesn't have to be Monoliths vs. Microservices. It can be a compromise. Perhaps the microservices are a bit less segmented than we have been imagining. It might be OK for a microservice to do more than one job. As the highlight shows, the fundamental ingredient is Engineering Discipline. If we strive for that it might work out in a Monolith, Microservice, or somewhere in between.
We have a similar thesis in Dark. People do microservices because they have to (well, some people have to, some people do them cause they're shiny). But in terms of actually understanding your application, microservices really don't help. And they bring so much complexity with them.
Our concept is to allow the decoupling of microservices, with the tooling of a monolith. Kinda hard to describe and we haven't done it yet, but basically give you the ability to write it as a monolith, but also have the separate scalability/deployment of a microservice.
I think the greater takeaway is, "there aren't many technical recommendations that can be made in a broad, sweeping way". Anyone who says differently is just selling you a trend. The right solution is based on the use-case. This is the third or fourth time we've oscillated between monoliths and microservices (under different names, of course). It won't be the last. It happens in every corner of the industry. There is no one-size-fits-all solution to anything, as seductive as that idea is.
I would agree with this article a lot more if it said that most people don't understand the problem microservices are trying to solve, but instead I think it contributes to the confusion.
It's true that a microservice doesn't magically create cleaner code, better designs, or anything like that. It can actually make all those things harder. Designing good remote APIs is hard, maintaining consistent code quality over lots of different codebases is hard.
All a microservice does is give you a way to independently release the code that lives behind a small chunk of your larger API (e.g. http://apis.uber-for-cats/v2/litter-boxes). This is why a good API gateway that's built for microservices is one of the first tools you actually need, and can get you surprisingly far.
It turns out that despite the complexity, this is an enormously valuable capability in a lot of different situations. Say you have a monolith that you can only release once every six months and you urgently need to get a new feature out the door. Or maybe half your code can't change very fast because it's mission critical for millions of users, but the other half wants to change really fast because you're trying to expand your product.
Of course the big bang refactor into microservices that he describes isn't really going to help you in any of these situations, but then again big bang refactors don't tend to help in much of any situation regardless of whether microservices are involved. ;-)
That is a rather pessimistic view... While it's true that microservices aren't the solution to all problems it does simply extend the separation of concerns into infrastructure which is practically a perpetuation of good software design. (just like if you do it right, you should have loose coupling and high cohesion)
Microservices make a lot of sense when you release often, run multiple versions, have a lot of people working on independent components or have a lot of different scalability needs. A monolith can only scale in its entirity and often only vertically. That means that even if just one component cannot be locally optimised the whole application has to scale up.
If you only have a single application or task to build software for (i.e. a CRUD system for a CMS) then it makes no sense to split that out. Just like it makes no sense to build your own crypto, do your own CRM, do your own RDBMS, or do your own filesystem for that matter. That would just be adding overhear and engineering complexity where none is required.
while bad engineering will be bad engineering no matter how it's engineered, that doesn't make a whole pattern bad just because a lot of people apply it wrong. Goes for microservices as well as monoliths. (and XaaS)
I dunno if I'm taking crazy pills, but it seems like the organizational/deployment-level concerns supposedly solved by microservices are much better solved by continuous deployment, and it's weird to me that this seems to be such a contrarian viewpoint.
Continuous deployment alleviates merge/coordination issues by integrating small changes frequently, which makes conflicts rare. Deploys are safer, again because you're deploying small changes often. And if something bad does go out, you can "roll forward" instead of rolling back, by reverting the bad commit. This is less harmful to velocity, because it doesn't require rolling back the other good commits in the deploy along with the bad ones.
I have less experience with microservices than with continuous deployment, but they seem to bring a lot of problems. Microservices take the fixed costs of deploying an application and multiply them by the number of services. Instead of centralizing one team to update dependencies and infrastructure for the whole application, every team has to spend 10-20% of their time doing that work. In the monolith case, everyone on the engineering team is familiar with the single codebase and architecture. But in microservices land, there are often more microservices than engineers. So when an engineer leaves, they pass off a whole pile of code, infrastructure, and architecture patterns that almost no one has any familiarity with. I do think you could avoid these problems, but overall microservices seem very high risk for little reward.
The one case I really see for services is when you have tasks with different load characteristics. But in that case, you can still have N monoliths (for small N), rather than the massive proliferation of microservices.
I think it is about autonomy and planning as well. You own your API, you plan your features and your backlog will be filled and you will deliver it. You will sunset it and when you on-board new people into the team they only need to learn your part.
This requires quite a big application though for it to be worth it in my opinion.
You might want to always breaker up a monolith but there is indeed little reason to do it with micro services. You could just use modules. Or better breaker it into a number of libraries with well defined interfaces, which you then compose into one monolith binary.
But there are very good reasons to split out some code into services (which might or might not be micro services, just not in the same process).
One is that it (easier) allows you to use more than one programming language. Normally you should avoid that, but there are sometimes reasons for it for example if 80% if what you need is implemented in a library available in that language.
Another one is you can have different reliability constraints for different parts of the system. (Like number of instances handling load parallel).
Another one is reuse between different systems (e.g. sharing of user management by e.g. using OpenId Connect).
Another one is that you can upgrade part of the system without stopping other parts.
....(a bunch more)
So in the end I would brake it in parts and compose that parts into a number of services but I would not bother with the whole "micro" part and other cloud marketing bs (because that's what it degraded to).
Clear boundaries and contracts are just a side effect of working in a service architecture, I've found. If this is your reason for doing it - just define a style guide on how to isolate code and make sure people (or procedures) keep it.
But there are many other valid reasons for services - different deployment cycles, better resource utilization, faster and safe deploys, etc. It's just about using the right tool and thinking about implications.
I don't know if such a framework exists, but I really want a system that abstracts this to a certain degree - while the contracts between parts of the system are defined, whether any module works as a service with its own deployment policy over network or as part of a monolith is not expressed in application code but as a configuration, and code generation handles the underlying logic. So you can write your app as a modular monolith, but when you think that for operational reasons there is a reason to spin off some part of it as a service, you reconfigure your build rules instead of your code.
All designs have trade-offs. When trade-offs appear, you either accept them or mitigate them...
If it's important to know how many blue widgets are bought at night in Europe, vs. how many blue watcha-ma-call-its are bought in the evening in the US, and your location, orders and product data are in separate micro-services, you are kinda out of luck.
And, as mentioned by others, replication and API wrappers on micro-services suck for reporting.
If you built an eventing system, you'd be better off tapping into that to update the central reporting data store (warehouse/lake/etc.) I've used this myself to "good" effect. (some chance of failures, a little behind the times, etc.)
The central database may be "monolithic" in nature, but at least you'd be able to report on the data. If you expect to modify data in the feeding micro-service's databases, then yes, you do have a monolith. But, if it's "just" for reporting, it's like a dynamically-updated replica of the pertinent data for your reporting.
We’re gonna break it up and somehow find the engineering discipline we never had in the first place.”
This, in a single sentence, captures all of my misgivings/discomfort/etc with the mad rush to micro-services in my organization. To whit...if we had the operational maturity to really effectively take on micro-services, we wouldn't need to rush into it.
That spiral thing just keeps spining and spining. Of course monolith is and easier thing to work with now, after we figured how to do write big multithreaded services in half reliable way. We also have large boxes that do not fail too often and do not cost more than 10 smaller boxes orchestrated with all the crap you need to run microservices based systems.
It was only ever useful for massive deployments that only massive systems like facebook and others needed. However their engineering teams dominated the discourse and others followed, pretending if they too had the same requirements even though their engineering teams were small and their systems far simpler.
The way I look at it a microservices architecture is going to be the equivalent of a monolith with a number of unreliable network connections, being held together at the devops layer. You are pushing complexity from the application layer to the devops layer. Personally I don't see that as an advantage in itself.
Distributed monoliths (or micro services) do have some advantages:
1. Easier for users to see "who owns what" (albeit a module pattern could fix this as well).
2. Different hardware resources or scaling for different parts of a monolith really isn't possible. If one module requires 16GB then everytime you scale a horizontally you must have at least 16GB, you're at the mercy of your worst module in the monolith.
3. Deploys are very difficult, and as you scale to over 10 developers increasingly becomes difficult to push up (it takes one persons bad commit to hold everyone in the organization from deploying).
4. Security boundaries are easier to define, each "module" in a monolith effectively has access to all resources for all modules.
5. Poly-languages are easier, albeit depending on the base language, you could do a lot of transpiling on a monolith but.. ew.
6. HTTP status codes and request paths can give you a clear view of how calls are happening in your system; in a monolith you'll only get stack traces generally on errors, not on successes, usually you need to invest more in static analysis and APM stuff for a monolith.
7. Microservices can be cheaper when you scale, you don't have the GCD of memory/CPU/disk requirements as you do in a monolith.
8. GCD of implementation details, if one request requires a sticky session, all of your requests require stick sessions...
9. More complex and long builds, most monoliths have component-based hot reloads, but even those can take 30s to a minute in my experience, and a full build, that's at least 20.
10. Harder to unit test, this can vary by language but without clear boundaries and resource definitions monoliths can be very tricky to unit test, microservices/distributed monoliths are inherently smaller with clearly declared resources so it becomes easier to find where and how data flows in them.
>3. Deploys are very difficult, and as you scale to over 10 developers increasingly becomes difficult to push up (it takes one persons bad commit to hold everyone in the organization from deploying).
I feel that is a way too low number. If don't have good enough engineering practices with branching, pull requests/code reviews, unit/integration tests to handle 10 people then microservices will be painful as well.
I would say like 3+ teams at least to really justify it? You can do it earlier but I don't see it as a necessary benefit.
>7. Microservices can be cheaper when you scale, you don't have the GCD of memory/CPU/disk requirements as you do in a monolith.
Yes, but by default they are more expensive until you reach a certain scale and it needs to be a specific type of scaling.
These are good points, I agree, while I generally don’t think the microservices I’ve worked on have been designed well, they’ve still offered advantages over the monoliths that I’ve worked on(and yep you guessed it, I also didn’t think the monoliths were designed well either).
One more thing I really appreciate:
No more massive juggling acts to upgrade the language or libraries. I’ve spent way too much time having to worry about how to upgrade a monolith to the next major version or three of .NET and C++, worrying about major version incompatible in libraries etc.
With smaller services you will have to fight this battle many times, but each battle will be manageable and lower risk.
The title mentions that Monolith is the future, but fails to explain why. Either way, let's focus in two items:
> We’re gonna break it up and somehow find the engineering discipline we never had in the first place.
Indeed. I worked in a company that had a monolith, but the project was structured in modules. Every module had a Facade, which was the official way of communicating. Although in practice you could access other modules' entities, you weren't allowed to do that. As you can imagine, this rule was broke many, many times. Developers would look at the entity and see the data they want was there and ignore the facade right way, plain and simple.
If you split your project into separate services, and those services aren't in the same runtime application, there is no way to break this rule anymore. The team that didn't follow the rules has no other choice, it has to go through the APIs. Even better, they won't design the API themselves most of the time. Whoever maintain the service will want it to be cohesive, and will not care that much about the other team need to an urgent fix. Putting workaround becomes way harder, and this change alone improves design a lot.
The second point is that anyone that shared the same service/application with another team probably faced the situation where you couldn't deploy (or was too afraid to do so) because the other team pushed a lot of new code to master. You suddenly don't know if the deploy will break everything or not. When you see, you're spending a lot of time coordinating with many people about whether you can deploy it or not. Something that should be in production if a few minutes sometimes get delayed for days.
Of course that microservices are not a silver bullet, and there are teams that will benefit a lot from a monolith. With that said, I find hard to believe that monollith will come back in companies where the development team grew to be more than a few developers, because the trade-offs are not worth it.
I'm living this nightmare now. The service I work on now is split up into a score of micro-services. Information about a single object is splattered across several services, databases, and caches. Little thought given to consistency, less to how these different bits of information will be combined in ways they obviously need to be. Coordination problems everywhere, creating bugs and sapping performance. Global resource/load control is practically impossible, as is rigorous testing.
Some of these are general distributed-system problems. Some would be less severe with a better but still microservice-based architecture. But in practice the microservice message that a lot of people get is that you should make every trivial bit of functionality its own service, and that road leads to disaster.
The truth in this is that these things just depend heavily on both culture (e.g. communication patterns, "This is infra team's problem", etc) and technology choice limitations (e.g. dependencies, api boundary safety, general safety, etc). And probably a ton of other things.
The underlying technology is a bigger deal than people give it credit I think. I've written frameworks and complex applications at prior workplaces to try to manage microservices well. Now (cloudsynth), we use go + grpc + typescript and everything feels like it can be isolated/sharded if and when it needs to. Golang and webpack have great tooling for splitting things off, isolating dependencies, etc.
Sometimes you don't have to live in the bimodal world of MicroServices vs Monolith.
1, back-end services with clear boundary, that decouple concerns based on dev teams' domain responsibilities, with less dependency among each other,and respected source of record. This is very much the "micro-service" is for.
2, middle tier services to consolidate or aggregate back-end APIs to serve the front-ends (especially the mobile apps) and take care of the business logic. Back-end guys all love micro-services, but someone must put them all together....GraphQL so far seems to fit this bill
3, Analytics and reporting, this is a totally different animal from the product development, and have almost opposite requirements. This is where whatever your ETL or Data Lake or Data Pipeline is used, along with your preferred BI or analytics tooling.
Do not really agree. Monoliths aren't the answer for bad engineering practices any more than micro-services are.
The fundamental problem a lot of companies have, especially fortune 500 type legacy shops, is that they haven't accepted(at the c-suite level) that they need to become tech companies to compete with startups that are eating their lunch .
Switching to micro-services to try and deploy more features while starving your development teams of talent and funding won't make you a tech company. If you want faster development + more features then you need lots of development teams, and large development teams means micro-services so that you don't have a slow to change, interdependent mess after a year or two.
This blog post and the top response is an example of an echo chamber.
Microservices, when done right (driven by well defined bounded contexts) are simpler to develop and iterate against; but that's not why we do Microservices!
You should not do Microservices without considerable experience in authoring integration tests, a clear understanding of the domain, observability tools, and a team that can handle debugging distributed system.
Bonus: You do not need a distributed system if you are working out of a single data center.
You should not do Microservices if you think they're cool. You should not title your blog post claiming Monoliths are the future. If your future has a horizon of never scaling out then yes I guess they are ...
I had trouble making the jump here from SOA to k8s. Those are orthogonal things. What a monolith solves and what microservices solve are also orthogonal things.
Many companies move to microservices so that they can evolve different parts of their platform at different rates, and invest differently in different business domains and product applications. Attracting talent for a problem in higher demand is one example of the lever you can pull, but so is writing a part of the application in R for data science or Java for stream processing, and hiring from a richer or different talent pool as a result.
Yet another extremely vague Hacker News article building controversy due to its failure to communicate anything tangible. Is this article talking about monolith deployment architectures or monolith code bases?
If it's about monolith deployment architectures, the use case is really important.
If it's about monolith code bases, you need to define what a monolith code base even means, because that could mean anything. Are we talking storage size, custom written code lines, framework architecture, or just the number of people and/or teams building the underlying technology?
It depends a lot on the app your building. If you are a startup, it definitely is much easier to build a monolith and focus on the product features. Micro-Services may not have that much upfront cost of building but as your product grows it requires lot of engineering effort and budget to maintain it. You can have an engineering team of 20 people maintaining a monolith which can serve few million customers. The same product broken into micro-services will require 4-5 teams of 8 people. It is much easier to hire for a single skillset and grow the team.
Microservices are only useful when your engineering org is too large and you need to enable the engineering process to scale. If you're at the point where you can't deploy code because the engineering team is stepping on each other, that's the point where microservices come in handy.
But it also requires a heavy investment in dev-ops and on-call issues. Because when one small thing fails, it becomes catastrophic in ways you can't imagine. So there's a huge tradeoff between engineering convenience and actually customer impact and uptime risk.
You know, at some point, I get tired of these rehashed ideas. This isn't a new thought, this isn't a unique perspective. Yes, we fully realize that you shouldn't jump on the newest fad just because you want to. Microservices aren't a silver bullet. All the things that have been talked about for years/decades now.
Monoliths aren't the future. They never left. Rather, they are still an option, along with microservices.
Blindly adopting anything is silly and error prone.
It gets tiring hearing the same advice preached every few years about a new techonology.
Perhaps there is a problem where people are splitting a perfectly good monolith into microservices, but I do wonder how do you deal with large scale machine learning without microservices? I am a rather small operation and I still have models taking several gigabytes worth of memory and ANN indexes of about the same size, which clearly couldn't operate in a monolith unless it was a massive machine, and even if it could not every request would necessitate such power. How does a dogmatic monolith approach solve these problems?
Is there any problem with separating the ML into a different service/machine, and everything else together? Then you can treat the ML in the same way you treat an external service, or your DB or Redis (if external). While no longer a pure monolith that certainly doesn't qualify as a microservices architecture.
Note: no idea what I'm talking about, I'm genuinely curious if that's a valid solution.
You could do that, but then I wonder if that isn't going down the road of microservices? The ANN services for example would need to interface to the database if you want some kind of real time ANN service.
Each to their own. I've worked with both, success with micro-services is largely a function of the organisation. My current company started transitioning 18 months ago, I wouldn't say it was easy but we are in a far better position now than we were with our monolithic architecture. Our success is largely due to the company understanding that big changes would be necessary - things like org charts, engineering culture, engineer responsibilities and job descriptions (you would be forgiven for thinking we only hire SREs).
My experience is that devs, devops and admins are striving for better and more robust operations and software development workflow that in the end will help out company in multiple ways. So they read what's new and learn and adopt new approaches and tools. But BI department just want to keep working with Excel because they don't know and don't want to spend any effort learning anything new. So now you have a conflict. Scary microservices vs good old comfy db select queries and excel.
As someone in the middle of destroying a monolith I hope the title is not true. Of course distributed-yet-still-deeply-coupled systems are possible they are at least harder to create.
For me though the most important thing is grokability. Our monolith is to a point literally no one on earth can understand the whole thing.
Even if the system is complex, the individual deployables being fully understood by some number of engineers is extremely valuable and drastically reduces search space for the cases where things don’t go as planned
Introducing boundaries between problem domains should drastically reduce cross-cutting concerns. It also makes issues easier to find.
Not that this isn't possible in a "well engineered" monolithic system, but design constraints are usually better than hoping for engineering discipline.
I personally quite like the "double monolith", one for the website and one for the api, but I think there are some really large deployments like AWS for instance, where using microservices makes a lot of sense, but it's tricky to pull off operationally because as the organisation changes shape, the code base needs to drastically change too, and probably a lot of teams just can't keep up, monoliths are more rigid but they don't crumble as easily.
Microservices can work, and fwiw, here are 3 things I've learned to the hard way:
1. Use a package manager, and export interfaces to a given service in each of the consumers. It's great that GitHub now offers it for node.
2. Create a DAL library for I/O to a given database (e.g. PostgreSQL, or Mongo) that be consumed by other services.
3. Enforce styling company-wide with one source of truth. We use GitHub Actions, so we can enforce styling with a shared GitHub Action.
Monoliths are "part of the future". Data shows that there is dissatisfaction with using both monolithic and microservices architectures at the same time. In 2017, 66% were using hybrid approach to approach, but only 54% in 2019. See the interactive graphic at the bottom of https://thenewstack.io/observability-and-elk/.
Monoliths and Microservices are technical expressions of organizational structure. Talking about them in only a technical sense misses the forest for the trees, IMO.
Completely agree and actually think this is the root of the problem.
People look at what really big successful companies are doing and draw inspiration from that. Problems come when they mix up cause and effect, and then view things through the wrong lens.
As you say, microservices are a mostly organisational, partly technical, effect of having to scale a huge techincal org. But then when taking this end state and viewing it purely through a technical lens (as, naturally, technical people are wont to do) it's rather easy to convince yourself that it's actually the cause of this huge technical org's success.
Yes, I had a discussion about a very senior (but not developer) why we he thought we must enforce microservices in our applications from a technical perspective. He thought it was incredibly important to be able to scale parts of it independently and it is a normal CRUD internal application with a couple of thousand users.
Of course he looks at Netflix and Google as best practices but that discussion probably spent more money in salary than our server costs.
I feel a lot of the excitement for microservices came from fatigue and frustration with overly complex monoliths. Turns out that the complexity was just brought out of microservices, one abstraction layer up into the architecture.
I think a lot of the principles in classic OOP design (SOLID) can be applied to microservice systems: Classes/Objects <> Services.
I don't think the microservice vs monolith process is as cut and dry as people make it out to be. Microservices are hard because abstractions are hard. I've found working on abstracting code within a monolith usually results in better microservices. Haven't seen a good process for starting a project with a series of microservices.
We need AI-driven microservices to manage all the versioning, deployment, fault-tolerancy, distributed log-debugging mess... Humans should not be forced to write anything complex in microservices. The only good thing is that it brings many engineering jobs and managers can build huge teams, looking great to their superiors.
Code is meant to be thrown away. Not to mention there exists industrial-strength automated code refactoring which will just get cheaper and smarter.
So write code you can deploy today, monolith or micro serves, in the not-to-distant future we'll be able to cheaply refactor it at scale into any style you want.
From what I understand the author is against a bad implementation of microservices. I don't see a good argument against microservices but rather "We just chopped up a monolith and things are not better therefore microservices are bad". Am I missing something?
I can guess that following pattern will emerge
"Develop like microservice but deploy like a monolith"
So a container will have group of services rather than hosting single service. Similar will be happen for databases where different databases will on same host.
If you can’t write well functioning monoliths, then you most definitely will fail implementing the same system using micro-services. Micro-services have all the complexity of the monolith plus the complexity of a distributed system added on top.
The adoption of a new technology can be an occasion to reorganize, reassign, and create new effective teams in the organization. It may allow the solution of people problems whether or not it actually addresses any technical problems.
You move to microservices now you are in the land of distributed systems so unless you are at a scale that leaves you no choice be very aware of this. You are multiplying edge cases by a factor of 100 in many cases.
This is like saying microservices are the future. Both are incorrect, of course this is case by case bases and one should keep an open mind and should know the pros and cons of both. Apply either one when needed.
Eh I think 5 years from now you’ll be able to deploy and manage a distributed architecture like a monolith. There is so much energy in this space and the problems are being solved we’re just not there yet.
The problem is not that micro-services/monoliths/serverless are good or bad, it's people picking something for the wrong reasons. Everything has pros and cons.
Totally agree with this article. Split the application when necessary. But creating 10/100s of mirco-services is a maintenance as well as runtime nightmare.
FTA “You know what we should do? We should break it up. We’re gonna break it up and somehow find the engineering discipline we never had in the first place.”
Both arguments are wrong. There is no substitute for engineering discipline and no paradigm will save you from a lack of it.
I care not whether you build microservices or monoliths, but please sir, do not blame the paradigm when your team can't do anything right.
Depends on how big that monolith gets. Microservices can definitely be overkill, but at some point its a real liability to not have well isolated abstractions. Yes, you can do that within a monolith, but the temptation to break isolation is just greater when its all in the same folder/repo.
Obligatory mention that services are mostly an enabler for organizational scaling and, in my opinion, shouldn't generally be something considered for only technical reasons. Having hundreds of engineers working on a monolith is at least as challenging as having ten engineers working on ten separate services.
There is also a clear distinction, in my mind, between microservices philosophy and 'macroservices', as I call it. Buying into a system with more services running than engineers is very different than having a number of teams, each working on their own single or handful of services.
I would argue that the organizational scaling derived from microservices resembles diminishing returns somewhere in the domain between a single service (monolith) and more services than engineers (microservices).
The fact this even reached Hacker News front page surprises me, maybe because the writer is "famous"? Pure sensationalist tittle with a very poor and marketing driven content.
Yep. I call this resume oriented development, which I think is in the Gotime podcast he links at the bottom here. Martin Fowler wrote a good article about the same thing https://www.martinfowler.com/bliki/MonolithFirst.html
tl;dr: If you don't understand the problem domain, build a monolith following sensible engineering principles to get going ASAP and then split it out when you understand where the functional lines actually are.
Microservices are really about engineering around things. It's to make your team's stuff resistant to everyone else deploying things that breaks stuff. It's turning developing into a wired biological system-hybrid than I've written about before:
I agree with the author on a lot of points. You shouldn't start out with bricks. You should build the house first, and one you get that figured out, only then should you turn the individual rooms into modular building blocks.
Perhaps the solution is a single server where one can drop a zip file containing their microservice and have it automatically deploy without the hassle of setting up a new pod/server/container. Congratulations you have reinvented java servlets 1.0 circa 1996.
My prediction: I think soon we will see public clouds provide WebAssembly hosting, where you upload a single WebAssembly binary and it just works. Your whole application is compiled and bundled to that WebAssembly. There will even be glue code that helps migration from a microservices architecture by stitching them together in parallel containers, all compiled down to WebAssembly of course.
Pretty much microservices happened because of managers. Managers are promoted to high level jobs such as Director or VP based on number of direct reports. This won't happen with a small team of devs. So they need an infrastructure team, a devops team, a SRE team, a QA team, a microservices core team, an access management team, network engineering team, AWS integration team, and the list goes on. What was once a four person team is now a 50 person team costing 10 million per year, but, hey, the guy/girl gets his VP promotion for setting all this up.
I don't know why you were downvoted, the importance of projects (and their leaders) is measured on the team sizes and budgets. So increasing costs makes the project and its leaders more important and valued.
Its essentially the same argument the article is making: managers want to spend and hire. But there are lots of managers at work right now reading HN so downvotes.
This is how Military promotions work for officers. No one wants to work smaller projects because performance is measured by "Led X number of troops". I see it as a contractor.
I think I need to explain this joke. I don’t mean that Monoliths are bad engineering, just that, as the article suggests, microservices don’t prevent bad engineering. And in many regards are inevitable without the right culture.
Most people think a micro-service architecture is a panacea because "look at how simple X is," but it's not that simple. It's now a distributed system, and very likely, it's a the worst-of-the-worst a distributed monolith. Distributed system are hard, I know, I do it.
Three signs you have a distributed monolith:
1. You're duplicating the tables (information), without transforming the data into something new (adding information), in another database (e.g. worst cache ever, enjoy the split-brain). [1]
2. Service X does not work without Y or Z, and/or you have no strategy for how to deal with one of them going down.
2.5 Bonus, there is likely no way to meaningfully decouple the services. Service X can be "tolerant" of service Y's failure, but it cannot ever function without service Y.
3. You push all your data over an event-bus to keep your services "in-sync" with each-other taking a hot shit on the idea of a "transaction." The event-bus over time pushes your data further out of sync, making you think you need an even better event bus... You need transactions and (clicks over to the Jepsen series and laughs) good luck rolling that on your own...
I'm not saying service oriented architectures are bad, I'm not saying services are bad, they're absolutely not. They're a tool for a job, and one that comes with a lot of foot guns and pitfalls. Many of which people are not prepared for when they ship that first micro service.
I didn't even touch on the additional infrastructure and testing burden that a fleet of micro-services bring about.
[1] Simple tip: Don't duplicate data without adding value to it. Just don't.