There's a bit of an emperor's clothes problem with microservices. If microservices are never combined together, they are essentially monoliths under a cooler name. But as soon as you combine them, you run into the same problems that are solved by traditional shared database systems. Here are two: cross-microservice referential integrity, and cross-microservice transaction coordination, but there are plenty more.
Take the Orders/Customers example of the article. Assume the Customers microservice has some notion of identity, and that the Orders microservice contains a "foreign" reference to Customers identity. If the microservices are truly independent and autonomous, then the Customer microservice should be able to change its scheme for Customer identities, or a delete a Customer. But then what happens to the Orders that reference that now old and outdated Customer identity? If changing the Customer microservice breaks the Orders microservice, then what's the point of the separate encapsulation? Traditional shared database systems have ways to deal with this kind of referential integrity. As far as I can tell, this issue is ignored by microservice architectures. As is the larger issue of cross-microservice constraints (of which referential integrity is just one instance).
The thing that really gets my goat is the lack of cross-microservice transaction coordination. In its place, we get all sorts of hand-waving about "eventual consistency" and "compensating transactions". Growl. Eventual consistency has a meaning that's related to distributed/replicated storage and the CAP theorem. Maybe its principles can apply to microservices, but the onus is on the microservice proponents to actually connect those dots. And most importantly, eventual consistency is implemented and supported by the DBMS, not by application developers. The thought that each microservice will independently implement the transactional guarantees of consistency and isolation should fill everyone with dread. That job should belong to the overarching system, not to individual microservices. It's too hard to get right.
So for now, shared database in microservicew is not an anti-pattern. It may be the only workable pattern. When microservice frameworks grow up and offer the capabilities of shared database systems to microservice developers, then we can talk about anti-patterns.
If you back your microservice architecture with a shared RDBMS and expect transactional and referential integrity, you've only scaled some parts of your system. Instead you should think about if A: you really need that scale ability, and B: the real implications of it.
So, in this customers / orders example what if you delete a customer? Ideally you don't, you keep the customer as long as you need the orders. You could perhaps anonymize it. And you define what happens if you cannot find the customer. Perhaps the orders should just be deleted? You could have a service running nightly that prune orders from customers that does not exist anymore. A cleanup service that checks for external dependencies and prune them as needed (note, this can be rather dangerous if done wrong).
For people that have formal education in distributed systems, or work with them, this seems very familiar, because it's some of the same things that make distributed systems hard. And that is what a microservice architecture is - a distributed system, which is why it scales well when done right.
This is also a problem we've mostly solved with CQRS and event sourcing, but it requires a lot of manual work and orchestration. It's hard and I can only say - you probably do not need a microservice architecture, and if you do, hire people with real experience in distributed system architecture. They're expensive, and if you can't afford it you do not need microservices.
The system can simply mark the "deleted" customer as a former customer and add records of their dismissal without any referential integrity problems.
"Deleting" an entity doesn't mean that it should immediately vanish without a trace from the database, leaving a wake of destruction.
It's only a business-level change: we won't accept further orders from deleted customers, there might be something nasty to do to their outstanding orders, and so on.
Doesn't address the problem of a customer somehow vanishing from the system without notifying dependent services. With at least once message delivery we know that the order service will know at some point, but we need to handle the case in the mean time.
It could also be a possibility that we need to delete customer records pr. GDPR, but need to keep the order records due to other laws, perhaps in an anonymized form.
Are you advocating sending and processing notifications about customer changes so that the order component can maintain redundant stale copies of customer data? Why would one do that instead of an appropriate and selective query when e.g. a new order is being entered?
For a MSA system, yes it should maintain just enough knowledge about customers to work, not everything. For instance it does not need to know the customers name, just the id. Systems that query the orders can query the customer component for additional data.
The denormalization and distribution of redundant data is required for it to scale. If you make the order component query the customer component, you haven't solved the problem from the other way around, and suddenly you have a hard coupling where a transient failure in one component automatically fails the other.
It might not be a tradeoff you're willing to make, but then you probably do not need the scaling - at least not along that vector.
You have to use event sourcing as well, CQRS alone does not solve it.
First you have to decide what trade offs you want. Ideally you will not expose any events from a service, but that is not realistic. And since the two services have some degree of connection, let's make the trade off that we want to expose events of creation and deletion of customers so other systems can keep track of a current list of customers. We utilize an at-least-once delivery mechanism of events.
The orders service would subscribe to the two events, and maintain an internal list of currently active customers. You cannot create orders for customers that does not exists, and when a customer is deleted, you can do what you need to do with the orders.
I'm just an ignorant nobody, but why would you split a microservice off along a facet that would require transaction and referential integraty? Wouldn't you want to pick off pieces that could be truly independent and not need to care about transactions or referential integrity? otherwise, what problem are you actually solving by splitting the piece off?
If you never need to coordinate two or more microservices, then sure. But that means you're dealing with a monolith. In the original article, you could combine Orders and Users into a single microservice. That would resolve all the issues that are raised ... except that you might want reference the Users in a different context. At that point, you either have to split Users into their own microservice, or duplicate the User information. Either way, you're backed into consistency issues.
Part of the promise of microservice is that they're small modular independent components that you can connect and combine into higher-level services.
If you need to read information and write information to two or more microservices, then you have transaction issues. If you need to relate information across two microservices, you have reference integrity issues. Just comes with the terrain.
Well, that isn’t a problem when you don’t care about the situation right now. For instance, let’s say you have VideoDescription, VideoSubtitling, and VideoContent as different microservices serving a description, the subtitles, and a handle to a list of content chunks. You need these things to line up (in that you may want all this for a single video). If VideoCentral says that you no longer have a movie it doesn’t matter. You can still serve video descriptions, subtitles, and content chunks until your view of the world changes. No big deal. If it is a big deal, then maybe the model doesn’t fit your problem. But for lots of software it really doesn’t matter. You don’t need instant consistency. If it’s consistent at some point that will do.
At no point will the writes insta-percolate. Instead you’ll push the writes into an event queue, they’ll execute eventually, and when the consuming services eventually update they’ll read the new result.
> Part of the promise of microservice is that they're small modular independent components that you can connect and combine into higher-level services.
Sounds to me that the marketing copy for microservices is missing a crucial observation: the reason independent components aren't hard to work with in regular software is partly because everything runs single-threaded, or if you end up multithreading, the response is predictable and near real-time, the environment is reliable and under your control. These conditions essentially mask transactional and integrity issues, which only become apparent as you scale to multiple machines connected over a network.
You're right that a shared resource like a database does endanger one of the microservice benefits and I think we're on the same page when I say the reality is "this is fine." You still get the benefit of being able to deploy, version, scale and manage a single microservice separately.
You might eventually have to break out the microservice entirely and build it out such that the shared datamodel has to be fully represented in api and not the database. That's fine too. That's a lot of work but you're only closer to that goal when you start with SOA and microservices not further away. Sure, you can get into trouble if you build a large interconnected monolith that you happen to be running in parts across many servers. I would, uh... advise against that. Try to separate your services in natural ways that won't cause massive headaches.
Basically, there are still some benefits and this critique, while valid, only points out what you aren't getting for free. Its not actually a negative.
>But then what happens to the Orders that reference that now old and outdated Customer identity? If changing the Customer microservice breaks the Orders microservice, then what's the point of the separate encapsulation?
Isn't that what append only event sourcing and CQRS is designed to solved?
If you’re putting a foreign key from customers into orders from the customer service into the orders service then the customer service key definition used as a foreign key cannot change.
> If microservices are never combined together, they are essentially monoliths under a cooler name.
This assertion completely misses the whole point of microservices: have highly specialized services that have a single and very limited responsibility, which clients can query independently and enable systems to be scaled (even horizontally) on their performance bottlenecks alone.
Describing a microservice architecture as a bunch of monoliths is simply missing the whole point.
Yes, shared database is an anti-pattern in micro services architecture.
If one shared database can serve your system well then you don't need microservices. You should build a monolith instead.
The main reason for using microservices is scale. First ability to scale development teams and secondly ability to scale the infrastructure. With microservices you get vertical sharding out of the box. Yes it means dealing with eventual consistency for at least a few usecases. But any system that wants to serve millions of concurrent users needs to deal with eventual consistency as some sort of vertical or horizontal sharding is necessary at that scale anyways.
The issue isn't about microservices, its about isolating dependencies between various components of the system. When any part of the system can reach into the user table then you've created a big ball of mud. Even with a monolith, you should have a defined contract between the user service and the rest of the system. That way you can start with a monolith and move to microservices later as needed. You can also change the underlying persistence layer without disrupting the rest of the system. You could for instance start with mongodb for user information while other system components use postgres. If sometime later you think postgres handle json just as well, no problem switch user over to postgres and the rest of the system is unaffected.
You should really build your monolith in a similar fashion to microservices, just leave out the remoting layer.
However, many times it makes sense to just do the join in the database rather than make an n^2 join in user space. This becomes problematic in collection cases - 100 cases belonging to 70 users. It's a quick left hand join but a slowdown in application space.
I don't disagree but I think in practice you're probably dealing with a single user at a time when working with orders. A query that referenced n users plus their orders is probably a reporting type query that could be supported in a data warehouse where all the tables could be joined together.
> query that referenced n users plus their orders is probably a reporting type query
No, there are enough other valid cases, e.g. display last customers and their orders or some simple "others have also bought". In practice you will need to join a lot spontaniously and can not just defer that to a data warehouse.
Except the scaling breaks down because suddenly everyone is bound by tight apis making change very very hard. The potential for services written in entirely different languages makes reallocating dev power less efficient.
> Secondly ability to scale the infrastructure
Anyone who's run a bunch of microservices in production will tell you this is a lie, it becomes a constant whack the bottleneck microservice mole. Not to mention the operational overheads of managing multiple microservices.
> Yes it means dealing with eventual consistency for at least a few usecases
> With microservices you get vertical sharding out of the box
"If one shared database can serve your system well then you don't need microservices".
Exactly, nobody is even thinking about it anymore because "Twitter has microservices, Netflix uses MSA".
Microservice architecture comes with a cost, like more complex deployment strategy (there are more dependencies) and overhead in transaction management, not to mention debugging a problem which hits multiple domains/services.
> The main reason for using microservices is scale
What enables this scale is the encapsulation you get with microservices. You lose this with a shared database. You get a similar problem with multitenancy because "isolated" microservices are impacted by what unrelated services do, so there's a loss of resiliency.
My understanding is that the point of microservices isn't to just move what would have been a relational join in SQL to an equivalent join in the service layer, which is what the author is implying. Instead, the order service should already have the subset of user data it needs to perform its function. If a user name changes, for example, that would trigger an event that gets consumed by a handler in the order domain. You embrace some denormalization and duplication of data in order for each service to have as much of what it needs to perform its function without a ton of other dependencies.
I'm not saying this is necessarily the way to go, I'm just saying that this is my reading of the microservices design pattern.
you're right, I'm in the middle of a big project where the lead engineer decided 'data duplication is forbidden'. The mess it created is terrible, both in complexity and in performance.
I can't imagine microservice architecture will ever work in a complex domain without every service having most of its dependent data in its own store.
What would help if people don't look at it as 'duplication' but rather 'caching'.
This is exactly correct and it took me a good while to understand this due to many people either not implementing this pattern or not knowing about it and then giving their 2 cents on microservices. Event sourcing and CQRS is the way to go.
Yeah, microservices and event sourcing go hand in hand.
Each microservice maintains its own database, with a model that is tailored to its own domain. This is called a projection, to illustrate that it is really the service's own view on the world, often with redundancy of data.
The service sources just the events it needs, and stores only the data it needs, to maintain this projection. It handles domain object lifecycle events in the way that makes sense for that particular service.
Without sounding condescending, but as a very late 'bloomer' - only working in development/software for 3 years and aiming towards an application architect role - I always thought this was the consensus.
It's similar to the "unit" in "unit testing": people will relate the most basic meaning to something they already understand, resulting in different, subtle distinctions.
In this case the breakdown actually seems the same to me as with the common unit testing misunderstanding - breaking down by syntactic form instead of conceptual form.
If your problem could be solved with buy rather than build, for services that you buy (e.g. could conceivably exist via AWS in your perfect world), then microservices may be a smart idea - if you can fork off a team to own that ideal service, you can eat the integration costs because you don't need to deal with the complexities of keeping that service running, it doesn't add to the total cost of ownership of getting your code running in production, it's just another remote API with an SLA.
In that situation, the other services sharing your database is madness. Of course they shouldn't use your database. If they did, it would mean you're responsible for their load; you'd need to balance your needs with their needs. When your database goes down, their service and its clients (that you don't need to know about) would also go down. No good at all. Don't do that.
And if you're not in a situation where different teams are responsible for different microservices, you probably shouldn't be using microservices.
Completely agree with the article. If each piece of code has its own data store, you lose all the advantages of the DBMS: you have to handwrite your joins, your transaction system and get all sorts of problems with cache invalidation, data inconsistencies and n+1 fetching. Encapsulation is important, but there are better ways to do it, such as using the authorization system of your DBMS.
Agreed, I think the issues you're touching on are fundamentally difficult problems that don't have a single solution. For example.
* storing complex data and supporting multiple access patterns efficiently (like joins) is hard and involves trade-offs (e.g. normalisation vs denormalisation, benefit vs overhead of indices).
* data consistency is hard, particularly once you are at a scale that requires distributed storage
* durability and disaster recovery is hard
* failover and availability, particularly for a stateful system, is hard
The right solution is completely application dependent. Usually the best approach is to avoid hard problems to the extent your application allows it, e.g. by accepting relaxed consistency, partitioning the data, or limiting the use cases the system supports. Then you can outsource the hard problems that remain to an existing solution (e.g. a transactional database).
Microservices help address some of those problems, but you need to recognize when they don't (e.g. you need strong consistency across multiple services) and adjust your designs accordingly
Foreign key constraints seems like an easy example. Not every service needs the power to create and manage user accounts but its nice know that every user id in your datamodel is valid.
The only thing that needs a relational data store are reports. Operational data stores should only be concerned with their own domain and expose behaviors and events. There are exceptions, but if you’re building a complex system, monolithic architectures have a known lifetime while MSA’s tend to mitigate long term coderot.
It’s hard to see this until you’ve built domain driven micro service based systems properly.
The trouble is, if you push state into different services behind APIs, you end up reinventing 50 to 80% of RDBMS anyhow. You need transactions, replication, indexing, integrity, backups etc. etc. You can use an event stream (a substitute for a commit log) but then you need to migrate your event stream, be able to undo events in the stream, etc. It's complexity you really ought to avoid unless you need it for serious business reasons.
Monolithic architectures have tended to last for decades in mature industries. I don't think we have a good handle on the lifetime of microservice architectures yet - typically, when they come into being out of necessity rather than fashion, they're part of a startup that hit a major hockey-stick and needed multiple teams working concurrently without stepping on each other, and microservices serve an organizational purpose rather than an architecture purpose.
I will second what other people say: the database tends to stick around. Code may rot, but the data is reused by the next generation.
> multiple teams working concurrently without stepping on each other, and microservices serve an organizational purpose rather than an architecture purpose.
Agree you need all of these things and more... but if you could do this above the microservice level, as a piece of K8S or whatever the MSA, then you might have an advance over RDBMS.
The result would be the same features/guaranteed you have with current DBMS, but you get independent deployability, elastic scalability, redundant availability, and other advantages of microservices that are missing with traditional DBMS.
> monolithic architectures have a known lifetime while MSA’s tend to mitigate long term coderot
I'd say it's exactly the other way around: the programming languages and technologies change every five years or so, but the data stays forever, so why couple the data with the technology du jour? I'd rather have my data managed in one place so I can adapt the technology layers that sit on top whenever requirements change, rather than writing joins and transactions in my application code. I'd rather not bother with all the complexity and keep it simple as long as I don't need an Amazon or Netflix scale.
Because business complexity modeled relationally becomes stale over time, reduces agility, and develops massive amounts of code rot/tech debt.
Relational databases are tools to be used where it makes sense. NoSQL tables are tools. Graph databases are tools. Configuration files are tools.
Given your statement, Linux’s entire configuration footprint should be in a relational database.
As any good architect will say in response to a design question, “It depends.”
MSA’s aren’t a panacea either. Event Streams are an interesting development and new patterns are likely to emerge.
But today? I’d focus on reducing complexity with domain-driven design and choosing patterns that enable agility and support separation of concerns. MSA fits very well into that philosophy.
> As any good architect will say in response to a design question, “It depends.”
I absolutely agree with that. However I don't like being strawmanned. I never said linux configuration "should be in a relational database". I said I'd like to have my data in one place to reduce complexity and decoupled from the layers above.
You can for the start use a single database as long as the different services don't use / know about each others tables. Then again you could just use separate db's.
Sure, but as I said, if you split your database, you have to deal with transactions, joins, consistency, over- / underfetching. All this leads to complexity and performace issues you wouldn't have otherwise.
Reliable messaging and eventual consistency eliminates the need for transactions. Performance can be a mitigating factor. Joins are for reporting. In the cloud, performance is just a cost factor.
With a sane database design, the Report service can read user details of all users of interest from the user service and ask the Orders service for the orders of each user (a presumably efficient query), without replicating data.
Relational DBs are really just fantastic at high-performance sorts, merges, and joins. Why do you want to pay the continual ongoing cost of replicating that poorly and inefficiently in your application logic? For each microservice that refers to another in some way?
This is more likely to be a problem if you split your services up by noun (Users,Orders) rather than by function (makeOrder, loginService, etc). This isn't bulletproof either but I've seen it helps reduce this occurrence a lot. Pardon the poor examples
This is the most important point when making microservices. There's lots of talk about bounded contexts but very little about how to draw the boundaries. Most are superficial and repeat bad examples like OrderService (order is both a verb and boun, OrderingService sounds ok). Using two part service names (ContentComposition, MessageDelivery) usually works out well.
As to the shared db aspect, while spitting a miniservice into micro ones I had this situation. All the code was split first and the db changes being the most difficult to was done last as a point.of we don't expect to revert this.choice. We weren't at a performance limit so that was fine, any schema changes had to be clearly communicated and coordinated. In all not too bad. I don't think I'd want to leave it in that state as a normal state. The point of micro is Independence and isolation of changes and sharing a db leaves a sensitive area. Don't let that stop you if it's only a temporary state. Just get commitment as to how temporary that is, since in absolute terms everything's temporary.
We never had the intention to make miniservices. Some were microservices that acquired abilities that grew to become separate microservices. On other projects they were prototypes that lasted long enough and became small monoliths to get split up.
From my expirience, at some scale (system and organization) it became a easier to solve technical issues of having multiple DBs than dealing with bottlenecks on 1 shared DB.
We had a fleet mucroservices that used 3 semishared DBs (1 for 1 domain, 1 for another domain and 1 shared). It worked ok for a year and now we run 40 DBs. It is harder to maintain and harder to develop against, but we are not constantly stuck in redeploying half of the stack because scheme just changed.
So, no silver bullet. You need to use something that makes sense, not what is fancy today.
I don't understand the problem. If the report needs order details and user details, and current data isn't available because some piece of the system has a problem, there's going to be no report today regardless of what the database is like and what concerns are separated or not. C'est la vie.
Moreover, the real dependency is from the report to the orders and the users; the article takes for granted that the report is shoehorned into the orders component, but it's clearly arbitrary.
"Then it sends request to users service to get missing information about users" What information would an order need to know about the user except for its ID? Shipping address? That could have been stored in the order db. Usually there is an orchestration layer above these services such as graphql, that merges all the data together for presenting to the UI. Somewhere there needs to be a defined contract between service boundaries so that internals can be changed without affecting all dependent systems. The db is not a good place for this contract.
A shared database offers the allure of transactions and persistent state management. But using a traditional RDBMS is not very scalable. Schema changes have hurt us many times. And then what of foreign keys and such. It is nice to be able to consistently open a single transaction and get an answer from the one true source of truth but it quickly becomes the single biggest bottleneck. I prefer the database per service model and leaving a global transaction manager to be implemented if needed. I think it adds much more in flexibility and scalability.
If you can't split your databases then maybe you don't have a correctly established domain?
There is no point of doing microservices when you still have tightly coupled data and you don't know how to split them efficiently. Monolith is not bad if it is well structured, tested and maintainable. Otherwise, you'll have to change multiple services for adding a new field of data.
One pattern is to subscribe to an event bus and retain a read-only replica of your dependency's data. So the CustomerService publishes a CustomerModify operation which is picked up by the OrderService which then knows the new value of the customer's shipping address. If CustomerService wants to make backwards incompatible changes to its schema it should be handled by versioning so the CustomerService has to broadcast both versions of its message until such a time as all subscribers are using the updated version and then it can 'deprecate' the old message version. Ideally OrderService is using a library provided by CustomerService to handle applying CustomerModify messages so the next time OrderService is deployed (which should be continuously right?) it automatically picks up the new schema so this is a painless and automatic affair.
> Is a shared database in microservices actually an anti-pattern?
No, unless you dont want (or care about having) a single source of truth. If state is distributed its harder to backup/restore/rewind or even query atomically and reproducibly.
You could still make the single database sharded and duplicated though, but still: in most cases its still one shared database. Even storing Files outside of a DB is just sharding, important thing is the DB refers to the file and still is the single source of truth.
And when you want vertical segmentation see designs like multi-tenant, but its still not a database per microsevice: quite the opposite.
When you dont want a single source of truth you could do without it ofcourse.
My take is yes. Shared infrastructure is always a risk. It seems to encourage other bad choices.
Yes, you can design around it. Good discipline around libraries with schemas, and such. These are almost always more code than letting a service own all communication with an infrastructure.
Anti-pattern is strong but I’d go with premature optimization because most developers grossly overestimate their ability to create the correct architecture ahead of time and underestimate the performance, maintenance, reliability, and security costs. Most of the times I’ve seen a $$$ app struggle to match 90s single-server app performance it’s been because the team was struggling under the weight of unnecessary architecture.
Take the Orders/Customers example of the article. Assume the Customers microservice has some notion of identity, and that the Orders microservice contains a "foreign" reference to Customers identity. If the microservices are truly independent and autonomous, then the Customer microservice should be able to change its scheme for Customer identities, or a delete a Customer. But then what happens to the Orders that reference that now old and outdated Customer identity? If changing the Customer microservice breaks the Orders microservice, then what's the point of the separate encapsulation? Traditional shared database systems have ways to deal with this kind of referential integrity. As far as I can tell, this issue is ignored by microservice architectures. As is the larger issue of cross-microservice constraints (of which referential integrity is just one instance).
The thing that really gets my goat is the lack of cross-microservice transaction coordination. In its place, we get all sorts of hand-waving about "eventual consistency" and "compensating transactions". Growl. Eventual consistency has a meaning that's related to distributed/replicated storage and the CAP theorem. Maybe its principles can apply to microservices, but the onus is on the microservice proponents to actually connect those dots. And most importantly, eventual consistency is implemented and supported by the DBMS, not by application developers. The thought that each microservice will independently implement the transactional guarantees of consistency and isolation should fill everyone with dread. That job should belong to the overarching system, not to individual microservices. It's too hard to get right.
So for now, shared database in microservicew is not an anti-pattern. It may be the only workable pattern. When microservice frameworks grow up and offer the capabilities of shared database systems to microservice developers, then we can talk about anti-patterns.