Take the Orders/Customers example of the article. Assume the Customers microservice has some notion of identity, and that the Orders microservice contains a "foreign" reference to Customers identity. If the microservices are truly independent and autonomous, then the Customer microservice should be able to change its scheme for Customer identities, or a delete a Customer. But then what happens to the Orders that reference that now old and outdated Customer identity? If changing the Customer microservice breaks the Orders microservice, then what's the point of the separate encapsulation? Traditional shared database systems have ways to deal with this kind of referential integrity. As far as I can tell, this issue is ignored by microservice architectures. As is the larger issue of cross-microservice constraints (of which referential integrity is just one instance).
The thing that really gets my goat is the lack of cross-microservice transaction coordination. In its place, we get all sorts of hand-waving about "eventual consistency" and "compensating transactions". Growl. Eventual consistency has a meaning that's related to distributed/replicated storage and the CAP theorem. Maybe its principles can apply to microservices, but the onus is on the microservice proponents to actually connect those dots. And most importantly, eventual consistency is implemented and supported by the DBMS, not by application developers. The thought that each microservice will independently implement the transactional guarantees of consistency and isolation should fill everyone with dread. That job should belong to the overarching system, not to individual microservices. It's too hard to get right.
So for now, shared database in microservicew is not an anti-pattern. It may be the only workable pattern. When microservice frameworks grow up and offer the capabilities of shared database systems to microservice developers, then we can talk about anti-patterns.
So, in this customers / orders example what if you delete a customer? Ideally you don't, you keep the customer as long as you need the orders. You could perhaps anonymize it. And you define what happens if you cannot find the customer. Perhaps the orders should just be deleted? You could have a service running nightly that prune orders from customers that does not exist anymore. A cleanup service that checks for external dependencies and prune them as needed (note, this can be rather dangerous if done wrong).
For people that have formal education in distributed systems, or work with them, this seems very familiar, because it's some of the same things that make distributed systems hard. And that is what a microservice architecture is - a distributed system, which is why it scales well when done right.
This is also a problem we've mostly solved with CQRS and event sourcing, but it requires a lot of manual work and orchestration. It's hard and I can only say - you probably do not need a microservice architecture, and if you do, hire people with real experience in distributed system architecture. They're expensive, and if you can't afford it you do not need microservices.
It could also be a possibility that we need to delete customer records pr. GDPR, but need to keep the order records due to other laws, perhaps in an anonymized form.
The denormalization and distribution of redundant data is required for it to scale. If you make the order component query the customer component, you haven't solved the problem from the other way around, and suddenly you have a hard coupling where a transient failure in one component automatically fails the other.
It might not be a tradeoff you're willing to make, but then you probably do not need the scaling - at least not along that vector.
First you have to decide what trade offs you want. Ideally you will not expose any events from a service, but that is not realistic. And since the two services have some degree of connection, let's make the trade off that we want to expose events of creation and deletion of customers so other systems can keep track of a current list of customers. We utilize an at-least-once delivery mechanism of events.
The orders service would subscribe to the two events, and maintain an internal list of currently active customers. You cannot create orders for customers that does not exists, and when a customer is deleted, you can do what you need to do with the orders.
Part of the promise of microservice is that they're small modular independent components that you can connect and combine into higher-level services.
If you need to read information and write information to two or more microservices, then you have transaction issues. If you need to relate information across two microservices, you have reference integrity issues. Just comes with the terrain.
At no point will the writes insta-percolate. Instead you’ll push the writes into an event queue, they’ll execute eventually, and when the consuming services eventually update they’ll read the new result.
Sounds to me that the marketing copy for microservices is missing a crucial observation: the reason independent components aren't hard to work with in regular software is partly because everything runs single-threaded, or if you end up multithreading, the response is predictable and near real-time, the environment is reliable and under your control. These conditions essentially mask transactional and integrity issues, which only become apparent as you scale to multiple machines connected over a network.
You might eventually have to break out the microservice entirely and build it out such that the shared datamodel has to be fully represented in api and not the database. That's fine too. That's a lot of work but you're only closer to that goal when you start with SOA and microservices not further away. Sure, you can get into trouble if you build a large interconnected monolith that you happen to be running in parts across many servers. I would, uh... advise against that. Try to separate your services in natural ways that won't cause massive headaches.
Basically, there are still some benefits and this critique, while valid, only points out what you aren't getting for free. Its not actually a negative.
Isn't that what append only event sourcing and CQRS is designed to solved?
This assertion completely misses the whole point of microservices: have highly specialized services that have a single and very limited responsibility, which clients can query independently and enable systems to be scaled (even horizontally) on their performance bottlenecks alone.
Describing a microservice architecture as a bunch of monoliths is simply missing the whole point.
If one shared database can serve your system well then you don't need microservices. You should build a monolith instead.
The main reason for using microservices is scale. First ability to scale development teams and secondly ability to scale the infrastructure. With microservices you get vertical sharding out of the box. Yes it means dealing with eventual consistency for at least a few usecases. But any system that wants to serve millions of concurrent users needs to deal with eventual consistency as some sort of vertical or horizontal sharding is necessary at that scale anyways.
You should really build your monolith in a similar fashion to microservices, just leave out the remoting layer.
However, many times it makes sense to just do the join in the database rather than make an n^2 join in user space. This becomes problematic in collection cases - 100 cases belonging to 70 users. It's a quick left hand join but a slowdown in application space.
No, there are enough other valid cases, e.g. display last customers and their orders or some simple "others have also bought". In practice you will need to join a lot spontaniously and can not just defer that to a data warehouse.
Except the scaling breaks down because suddenly everyone is bound by tight apis making change very very hard. The potential for services written in entirely different languages makes reallocating dev power less efficient.
> Secondly ability to scale the infrastructure
Anyone who's run a bunch of microservices in production will tell you this is a lie, it becomes a constant whack the bottleneck microservice mole. Not to mention the operational overheads of managing multiple microservices.
> Yes it means dealing with eventual consistency for at least a few usecases
> With microservices you get vertical sharding out of the box
Exactly, nobody is even thinking about it anymore because "Twitter has microservices, Netflix uses MSA".
Microservice architecture comes with a cost, like more complex deployment strategy (there are more dependencies) and overhead in transaction management, not to mention debugging a problem which hits multiple domains/services.
What enables this scale is the encapsulation you get with microservices. You lose this with a shared database. You get a similar problem with multitenancy because "isolated" microservices are impacted by what unrelated services do, so there's a loss of resiliency.
I'm not saying this is necessarily the way to go, I'm just saying that this is my reading of the microservices design pattern.
I can't imagine microservice architecture will ever work in a complex domain without every service having most of its dependent data in its own store.
What would help if people don't look at it as 'duplication' but rather 'caching'.
Each microservice maintains its own database, with a model that is tailored to its own domain. This is called a projection, to illustrate that it is really the service's own view on the world, often with redundancy of data.
The service sources just the events it needs, and stores only the data it needs, to maintain this projection. It handles domain object lifecycle events in the way that makes sense for that particular service.
In DDD land this is referred to as bounded context.
In this case the breakdown actually seems the same to me as with the common unit testing misunderstanding - breaking down by syntactic form instead of conceptual form.
In that situation, the other services sharing your database is madness. Of course they shouldn't use your database. If they did, it would mean you're responsible for their load; you'd need to balance your needs with their needs. When your database goes down, their service and its clients (that you don't need to know about) would also go down. No good at all. Don't do that.
And if you're not in a situation where different teams are responsible for different microservices, you probably shouldn't be using microservices.
* storing complex data and supporting multiple access patterns efficiently (like joins) is hard and involves trade-offs (e.g. normalisation vs denormalisation, benefit vs overhead of indices).
* data consistency is hard, particularly once you are at a scale that requires distributed storage
* durability and disaster recovery is hard
* failover and availability, particularly for a stateful system, is hard
The right solution is completely application dependent. Usually the best approach is to avoid hard problems to the extent your application allows it, e.g. by accepting relaxed consistency, partitioning the data, or limiting the use cases the system supports. Then you can outsource the hard problems that remain to an existing solution (e.g. a transactional database).
Microservices help address some of those problems, but you need to recognize when they don't (e.g. you need strong consistency across multiple services) and adjust your designs accordingly
It’s hard to see this until you’ve built domain driven micro service based systems properly.
Monolithic architectures have tended to last for decades in mature industries. I don't think we have a good handle on the lifetime of microservice architectures yet - typically, when they come into being out of necessity rather than fashion, they're part of a startup that hit a major hockey-stick and needed multiple teams working concurrently without stepping on each other, and microservices serve an organizational purpose rather than an architecture purpose.
I will second what other people say: the database tends to stick around. Code may rot, but the data is reused by the next generation.
I so hard agree with this.
The result would be the same features/guaranteed you have with current DBMS, but you get independent deployability, elastic scalability, redundant availability, and other advantages of microservices that are missing with traditional DBMS.
I'd say it's exactly the other way around: the programming languages and technologies change every five years or so, but the data stays forever, so why couple the data with the technology du jour? I'd rather have my data managed in one place so I can adapt the technology layers that sit on top whenever requirements change, rather than writing joins and transactions in my application code. I'd rather not bother with all the complexity and keep it simple as long as I don't need an Amazon or Netflix scale.
Relational databases are tools to be used where it makes sense. NoSQL tables are tools. Graph databases are tools. Configuration files are tools.
Given your statement, Linux’s entire configuration footprint should be in a relational database.
As any good architect will say in response to a design question, “It depends.”
MSA’s aren’t a panacea either. Event Streams are an interesting development and new patterns are likely to emerge.
But today? I’d focus on reducing complexity with domain-driven design and choosing patterns that enable agility and support separation of concerns. MSA fits very well into that philosophy.
I absolutely agree with that. However I don't like being strawmanned. I never said linux configuration "should be in a relational database". I said I'd like to have my data in one place to reduce complexity and decoupled from the layers above.
You can optimize that away later if necessary.
Relational DBs are really just fantastic at high-performance sorts, merges, and joins. Why do you want to pay the continual ongoing cost of replicating that poorly and inefficiently in your application logic? For each microservice that refers to another in some way?
As to the shared db aspect, while spitting a miniservice into micro ones I had this situation. All the code was split first and the db changes being the most difficult to was done last as a point.of we don't expect to revert this.choice. We weren't at a performance limit so that was fine, any schema changes had to be clearly communicated and coordinated. In all not too bad. I don't think I'd want to leave it in that state as a normal state. The point of micro is Independence and isolation of changes and sharing a db leaves a sensitive area. Don't let that stop you if it's only a temporary state. Just get commitment as to how temporary that is, since in absolute terms everything's temporary.
I feel the OPs pain but I’ve built MSAs from DDD principals and it really is a better architecture for complex systems.
So, no silver bullet. You need to use something that makes sense, not what is fancy today.
Moreover, the real dependency is from the report to the orders and the users; the article takes for granted that the report is shoehorned into the orders component, but it's clearly arbitrary.
There’s a clear demonstration that MSA and domain oriented data stores are based on a decade of service oriented architecture design and development.
No, unless you dont want (or care about having) a single source of truth. If state is distributed its harder to backup/restore/rewind or even query atomically and reproducibly.
You could still make the single database sharded and duplicated though, but still: in most cases its still one shared database. Even storing Files outside of a DB is just sharding, important thing is the DB refers to the file and still is the single source of truth.
And when you want vertical segmentation see designs like multi-tenant, but its still not a database per microsevice: quite the opposite.
When you dont want a single source of truth you could do without it ofcourse.
But since most of the devs and architects fail to identify when you are allowed to do it, the rule of thumb is to not do it as it is a safer option.
Yes, you can design around it. Good discipline around libraries with schemas, and such. These are almost always more code than letting a service own all communication with an infrastructure.
That is basically what is happening when using a service such as S3 or many distributed databases...
Microservices can be an antipattern by themselves, btw.