
Is a shared database in microservices actually an anti-pattern? - kiyanwang
https://hackernoon.com/is-shared-database-in-microservices-actually-anti-pattern-8cc2536adfe4
======
quizotic
There's a bit of an emperor's clothes problem with microservices. If
microservices are never combined together, they are essentially monoliths
under a cooler name. But as soon as you combine them, you run into the same
problems that are solved by traditional shared database systems. Here are two:
cross-microservice referential integrity, and cross-microservice transaction
coordination, but there are plenty more.

Take the Orders/Customers example of the article. Assume the Customers
microservice has some notion of identity, and that the Orders microservice
contains a "foreign" reference to Customers identity. If the microservices are
truly independent and autonomous, then the Customer microservice should be
able to change its scheme for Customer identities, or a delete a Customer. But
then what happens to the Orders that reference that now old and outdated
Customer identity? If changing the Customer microservice breaks the Orders
microservice, then what's the point of the separate encapsulation? Traditional
shared database systems have ways to deal with this kind of referential
integrity. As far as I can tell, this issue is ignored by microservice
architectures. As is the larger issue of cross-microservice constraints (of
which referential integrity is just one instance).

The thing that really gets my goat is the lack of cross-microservice
transaction coordination. In its place, we get all sorts of hand-waving about
"eventual consistency" and "compensating transactions". Growl. Eventual
consistency has a meaning that's related to distributed/replicated storage and
the CAP theorem. Maybe its principles can apply to microservices, but the onus
is on the microservice proponents to actually connect those dots. And most
importantly, eventual consistency is implemented and supported by the DBMS,
not by application developers. The thought that each microservice will
independently implement the transactional guarantees of consistency and
isolation should fill everyone with dread. That job should belong to the
overarching system, not to individual microservices. It's too hard to get
right.

So for now, shared database in microservicew is _not_ an anti-pattern. It may
be the only workable pattern. When microservice frameworks grow up and offer
the capabilities of shared database systems to microservice developers, then
we can talk about anti-patterns.

~~~
hvidgaard
If you back your microservice architecture with a shared RDBMS and expect
transactional and referential integrity, you've only scaled some parts of your
system. Instead you should think about if A: you really need that scale
ability, and B: the real implications of it.

So, in this customers / orders example what if you delete a customer? Ideally
you don't, you keep the customer as long as you need the orders. You could
perhaps anonymize it. And you define what happens if you cannot find the
customer. Perhaps the orders should just be deleted? You could have a service
running nightly that prune orders from customers that does not exist anymore.
A cleanup service that checks for external dependencies and prune them as
needed (note, this can be rather dangerous if done wrong).

For people that have formal education in distributed systems, or work with
them, this seems very familiar, because it's some of the same things that make
distributed systems hard. And that is what a microservice architecture is - a
distributed system, which is why it scales well when done right.

This is also a problem we've mostly solved with CQRS and event sourcing, but
it requires a lot of manual work and orchestration. It's hard and I can only
say - you probably do not need a microservice architecture, and if you do,
hire people with real experience in distributed system architecture. They're
expensive, and if you can't afford it you do not need microservices.

~~~
HelloNurse
The system can simply mark the "deleted" customer as a former customer and add
records of their dismissal without any referential integrity problems.
"Deleting" an entity doesn't mean that it should immediately vanish without a
trace from the database, leaving a wake of destruction. It's only a business-
level change: we won't accept further orders from deleted customers, there
might be something nasty to do to their outstanding orders, and so on.

~~~
hvidgaard
Doesn't address the problem of a customer somehow vanishing from the system
without notifying dependent services. With at least once message delivery we
know that the order service will know at some point, but we need to handle the
case in the mean time.

It could also be a possibility that we need to delete customer records pr.
GDPR, but need to keep the order records due to other laws, perhaps in an
anonymized form.

~~~
HelloNurse
Are you advocating sending and processing notifications about customer changes
so that the order component can maintain redundant stale copies of customer
data? Why would one do that instead of an appropriate and selective query when
e.g. a new order is being entered?

~~~
hvidgaard
For a MSA system, yes it should maintain just enough knowledge about customers
to work, not everything. For instance it does not need to know the customers
name, just the id. Systems that query the orders can query the customer
component for additional data.

The denormalization and distribution of redundant data is required for it to
scale. If you make the order component query the customer component, you
haven't solved the problem from the other way around, and suddenly you have a
hard coupling where a transient failure in one component automatically fails
the other.

It might not be a tradeoff you're willing to make, but then you probably do
not need the scaling - at least not along that vector.

------
pritambarhate
Yes, shared database is an anti-pattern in micro services architecture.

If one shared database can serve your system well then you don't need
microservices. You should build a monolith instead.

The main reason for using microservices is scale. First ability to scale
development teams and secondly ability to scale the infrastructure. With
microservices you get vertical sharding out of the box. Yes it means dealing
with eventual consistency for at least a few usecases. But any system that
wants to serve millions of concurrent users needs to deal with eventual
consistency as some sort of vertical or horizontal sharding is necessary at
that scale anyways.

~~~
ec109685
How many services actually have millions of concurrent users? That’s 100k req
sec or so, which is a crapton.

~~~
zaarn
Probably less than 1% of corporations deploying microservices actually have a
load that justifies microservices.

------
thoman23
My understanding is that the point of microservices isn't to just move what
would have been a relational join in SQL to an equivalent join in the service
layer, which is what the author is implying. Instead, the order service should
already have the subset of user data it needs to perform its function. If a
user name changes, for example, that would trigger an event that gets consumed
by a handler in the order domain. You embrace some denormalization and
duplication of data in order for each service to have as much of what it needs
to perform its function without a ton of other dependencies.

I'm not saying this is necessarily the way to go, I'm just saying that this is
my reading of the microservices design pattern.

~~~
aedron
Yeah, microservices and event sourcing go hand in hand.

Each microservice maintains its own database, with a model that is tailored to
its own domain. This is called a projection, to illustrate that it is really
the service's own view on the world, often with redundancy of data.

The service sources just the events it needs, and stores only the data it
needs, to maintain this projection. It handles domain object lifecycle events
in the way that makes sense for that particular service.

~~~
geezerjay
> Each microservice maintains its own database, with a model that is tailored
> to its own domain. This is called a projection

In DDD land this is referred to as bounded context.

------
barrkel
If your problem could be solved with buy rather than build, for services that
you buy (e.g. could conceivably exist via AWS in your perfect world), then
microservices may be a smart idea - if you can fork off a team to own that
ideal service, you can eat the integration costs because you don't need to
deal with the complexities of keeping that service running, it doesn't add to
the total cost of ownership of getting your code running in production, it's
just another remote API with an SLA.

In that situation, the other services sharing your database is madness. Of
course they shouldn't use your database. If they did, it would mean you're
responsible for their load; you'd need to balance your needs with their needs.
When your database goes down, their service and its clients (that you don't
need to know about) would also go down. No good at all. Don't do that.

And if you're not in a situation where different teams are responsible for
different microservices, you probably shouldn't be using microservices.

------
taffer
Completely agree with the article. If each piece of code has its own data
store, you lose all the advantages of the DBMS: you have to handwrite your
joins, your transaction system and get all sorts of problems with cache
invalidation, data inconsistencies and n+1 fetching. Encapsulation is
important, but there are better ways to do it, such as using the authorization
system of your DBMS.

~~~
ChicagoDave
The only thing that needs a relational data store are reports. Operational
data stores should only be concerned with their own domain and expose
behaviors and events. There are exceptions, but if you’re building a complex
system, monolithic architectures have a known lifetime while MSA’s tend to
mitigate long term coderot.

It’s hard to see this until you’ve built domain driven micro service based
systems properly.

~~~
barrkel
The trouble is, if you push state into different services behind APIs, you end
up reinventing 50 to 80% of RDBMS anyhow. You need transactions, replication,
indexing, integrity, backups etc. etc. You can use an event stream (a
substitute for a commit log) but then you need to migrate your event stream,
be able to undo events in the stream, etc. It's complexity you really ought to
avoid unless you need it for serious business reasons.

Monolithic architectures have tended to last for decades in mature industries.
I don't think we have a good handle on the lifetime of microservice
architectures yet - typically, when they come into being out of necessity
rather than fashion, they're part of a startup that hit a major hockey-stick
and needed multiple teams working concurrently without stepping on each other,
and microservices serve an organizational purpose rather than an architecture
purpose.

I will second what other people say: the database tends to stick around. Code
may rot, but the data is reused by the next generation.

~~~
NicoJuicy
> multiple teams working concurrently without stepping on each other, and
> microservices serve an organizational purpose rather than an architecture
> purpose.

I so hard agree with this.

------
jzoch
This is more likely to be a problem if you split your services up by noun
(Users,Orders) rather than by function (makeOrder, loginService, etc). This
isn't bulletproof either but I've seen it helps reduce this occurrence a lot.
Pardon the poor examples

~~~
karmakaze
This is the most important point when making microservices. There's lots of
talk about bounded contexts but very little about how to draw the boundaries.
Most are superficial and repeat bad examples like OrderService (order is both
a verb and boun, OrderingService sounds ok). Using two part service names
(ContentComposition, MessageDelivery) usually works out well.

As to the shared db aspect, while spitting a miniservice into micro ones I had
this situation. All the code was split first and the db changes being the most
difficult to was done last as a point.of we don't expect to revert
this.choice. We weren't at a performance limit so that was fine, any schema
changes had to be clearly communicated and coordinated. In all not too bad. I
don't think I'd want to leave it in that state as a normal state. The point of
micro is Independence and isolation of changes and sharing a db leaves a
sensitive area. Don't let that stop you if it's only a temporary state. Just
get commitment as to how temporary that is, since in absolute terms
everything's temporary.

~~~
sigi45
Why would you make Miniservices?

~~~
karmakaze
We never had the intention to make miniservices. Some were microservices that
acquired abilities that grew to become separate microservices. On other
projects they were prototypes that lasted long enough and became small
monoliths to get split up.

------
LaserToy
From my expirience, at some scale (system and organization) it became a easier
to solve technical issues of having multiple DBs than dealing with bottlenecks
on 1 shared DB. We had a fleet mucroservices that used 3 semishared DBs (1 for
1 domain, 1 for another domain and 1 shared). It worked ok for a year and now
we run 40 DBs. It is harder to maintain and harder to develop against, but we
are not constantly stuck in redeploying half of the stack because scheme just
changed.

So, no silver bullet. You need to use something that makes sense, not what is
fancy today.

------
HelloNurse
I don't understand the problem. If the report needs order details and user
details, and current data isn't available because some piece of the system has
a problem, there's going to be no report today regardless of what the database
is like and what concerns are separated or not. C'est la vie.

Moreover, the real dependency is from the report to the orders and the users;
the article takes for granted that the report is shoehorned into the orders
component, but it's clearly arbitrary.

------
eweise
"Then it sends request to users service to get missing information about
users" What information would an order need to know about the user except for
its ID? Shipping address? That could have been stored in the order db. Usually
there is an orchestration layer above these services such as graphql, that
merges all the data together for presenting to the UI. Somewhere there needs
to be a defined contract between service boundaries so that internals can be
changed without affecting all dependent systems. The db is not a good place
for this contract.

------
gigatexal
A shared database offers the allure of transactions and persistent state
management. But using a traditional RDBMS is not very scalable. Schema changes
have hurt us many times. And then what of foreign keys and such. It is nice to
be able to consistently open a single transaction and get an answer from the
one true source of truth but it quickly becomes the single biggest bottleneck.
I prefer the database per service model and leaving a global transaction
manager to be implemented if needed. I think it adds much more in flexibility
and scalability.

------
mbrodersen
Don't ask if X is an "anti-pattern". Ask if X makes sense for YOUR project in
YOUR context.

------
ris
Microservices are an anti-pattern.

~~~
he0001
I’d say microservices for the sake of microservices is an anti pattern.
Microservices does have their usages.

------
ChicagoDave
I’ll take the hit on my karma and just suggest readers of this thread read the
comments of the original post on hackernoon.

There’s a clear demonstration that MSA and domain oriented data stores are
based on a decade of service oriented architecture design and development.

------
sigi45
If you don't even need to split out the database you probably don't need
Microservices.

------
Oras
If you can't split your databases then maybe you don't have a correctly
established domain? There is no point of doing microservices when you still
have tightly coupled data and you don't know how to split them efficiently.
Monolith is not bad if it is well structured, tested and maintainable.
Otherwise, you'll have to change multiple services for adding a new field of
data.

------
siliconc0w
One pattern is to subscribe to an event bus and retain a read-only replica of
your dependency's data. So the CustomerService publishes a CustomerModify
operation which is picked up by the OrderService which then knows the new
value of the customer's shipping address. If CustomerService wants to make
backwards incompatible changes to its schema it should be handled by
versioning so the CustomerService has to broadcast both versions of its
message until such a time as all subscribers are using the updated version and
then it can 'deprecate' the old message version. Ideally OrderService is using
a library provided by CustomerService to handle applying CustomerModify
messages so the next time OrderService is deployed (which should be
continuously right?) it automatically picks up the new schema so this is a
painless and automatic affair.

------
ozim
Looks like author does not know you can split business domain without making
microservices...

------
barrystaes
> Is a shared database in microservices actually an anti-pattern?

No, unless you dont want (or care about having) a single source of truth. If
state is distributed its harder to backup/restore/rewind or even query
atomically and reproducibly.

You could still make the single database sharded and duplicated though, but
still: in most cases its still one shared database. Even storing Files outside
of a DB is just sharding, important thing is the DB refers to the file and
still is the single source of truth.

And when you want vertical segmentation see designs like multi-tenant, but its
still not a database per microsevice: quite the opposite.

When you dont want a single source of truth you could do without it ofcourse.

------
stunt
The real answer: It depends.

But since most of the devs and architects fail to identify when you are
allowed to do it, the rule of thumb is to not do it as it is a safer option.

------
taeric
My take is yes. Shared infrastructure is always a risk. It seems to encourage
other bad choices.

Yes, you can design around it. Good discipline around libraries with schemas,
and such. These are almost always more code than letting a service own all
communication with an infrastructure.

------
etaioinshrdlu
If you have database that can scale its QPS linearly with number of machines
in the cluster, maybe it stops being a bad idea?

That is basically what is happening when using a service such as S3 or many
distributed databases...

------
est
Database is just another service which does not speak protobuf. Use it as a
dependency or use multiple ones depending on your context.

------
rejschaap
TLDR: How to get all of the disadvantages of microservices and none of the
benefits

------
alanfranz
Shared database is an antipattern.

Microservices can be an antipattern by themselves, btw.

------
eouw0o83hf
Yes, yes it is.

------
suff
Yes, because any artificial linkage that prevents one service from scaling
independently from another violates the separation of concerns principal.

------
jrochkind1
What if microservices are actually an anti-pattern?

~~~
acdha
Anti-pattern is strong but I’d go with premature optimization because most
developers grossly overestimate their ability to create the correct
architecture ahead of time and underestimate the performance, maintenance,
reliability, and security costs. Most of the times I’ve seen a $$$ app
struggle to match 90s single-server app performance it’s been because the team
was struggling under the weight of unnecessary architecture.

