I agreed, I like these patterns but I never encourage anyone to start with these patterns, First build a simple monolith to handle the situation, if there is a hard problem then and only then these should be applied, But these days I am seeing quite opposite though, I don't see enough evidence to start with these patterns and always use them as rule of thumb .
There's no question that it adds complexity, but I wouldn't agree that it adds a "huge amount." The Saga Pattern is actually very straightforward.
But as with anything, the question is "is this complexity worth the price you pay for it?" For me, given the advantages that microservices offer in many contexts, I find that using the Saga Pattern to maintain consistent state is totally worth it. But that won't be true for everybody in every situation.
Any library/SDK that allows you to implement these patterns should have sufficient testing scaffolding available as well. We use MassTransit for a large, distributed .NET Core + RabbitMQ service layer and unit tests are no more trouble than they usually are with the build in Bus and Consumer test harnesses.
Yes, this is an all but solved problem. Sagas handle this as gracefully as you can in a distributed transaction, and with even a modicum of foresight a lot of these problems can be avoided.
Do sagas help with rolling back in response to errors? This seems like the nastiest aspect of any distributed transaction approach: step A succeeds, step B succeeds, step C fails, call to rollback A and B fails... and now?
Or you do a two-phase commit: A, B, and C tentatively succeed, but then one of the commit calls fails, and now?
It seems like inconsistencies are inevitable no matter what you do.
The post you're replying to does list compensating transactions (a form of rollback)
One gotcha that is not covered by Sagas (I could be wrong) is when one or many of the network paths involved in the distributed tx become unreachable (network partition event) and you have no idea of the state of that part of the tx. Do you re-try that part and risk sending the same instruction twice (ok in some cases but not all) vs risk of having sent no instruction? If I had to implement a distributed tx I would first verify my mental model using TLA+ and use a (persistent) transactional messaging system with at-least-once delivery as the backbone, and make other accommodations for such scenarios.
Do you re-try that part and risk sending the same instruction twice (ok in some cases but not all) vs risk of having sent no instruction?
If you can make your compensating action idempotent, then yes, you can just keep retrying it. If it can't be made so for whatever reason, then a failure at that point demands manual intervention.
I suppose redundant communication channels (that go over different network modalities, e.g, data center native, satellite, 5G, etc) can be used to recover from network partition. Still, having a protocol with at-least-once delivery guarantee is important as it assures that no messages are lost due to unexpected crash of sender/caller or receiver/callee.
It seems like inconsistencies are inevitable no matter what you do.
At some level, barring guaranteed message delivery (which is effectively non-existent in any distributed system) you always reach a level where you can't guarantee consistency. It's the Byzantine General's Problem, basically.
But based on empirical evidence, you can work out that a certain measure of effort dedicated to fault tolerance will yield correct results in X% of cases, and you can tune the value of X based on how much time/energy/money/effort you're willing to expend... up to a point.
Accounting has been doing that for centuries already, so it's not new by any means. It's also not free, it imposes severe restrictions on your system's architecture and the kinds of problems it can solve.
This speaks volumes about how fraught with peril building microservices is. For me the prevailing message of this post is "don't do it at all unless you are willing to absorb this much complexity for the promised benefits".
Services can talk to other services. i.e. In your example, the order service can speak to the inventory service to change the stock level. The 'API controller' (this is also a service!) can speak to the inventory service to get the current inventory - but it shouldn't be responsible for handling the transactions of other services.
I think the premise of using a monolithic service (your API "Controller") to handle transactions that it simply shouldn't be concerned with is the main problem here. i.e the problem is that the transaction should not be distributed, not that it is handled through multiple services.
I'm further confused at the reconciliation that seems to occur later in your example. Why is it bypassing the services and writing directly to the service DB itself?
Yep. The article seems to muddle up and conflate a bunch of poor practices with "microservices." All the "problems" given are really just poor decomposition of the system into non-discrete components. The "audit" solution is a bit kludgy too.
in the first place, the components should not be decomposition like this, But you can't control everyone , But if It happen and went to production, the only thing you can do it is make it better, that what I have done in many products (using auditor or arbitrator )
Management doesn't care. Management doesn't read this or really even understand what a microservice is. They only know their VP told them to use them and they are the current hot stuff.
I'll be honest I get pretty sick of these types of "management doesn't care" comments. Not because they're wrong but because they ignore the obvious solution.
If someone at VP-level is making low-level tech decisions, GTFO. If your non-technical executive management even wants to know what the low-level tech driving their business is, GTFO. If your manager, Director, VP, execs, etc will not listen to honest, calm "we really don't need ________ because {5 rational, evidence-backed reasons}," GTFO.
This, but also - these arguments tend to be very one-sided. The implication is that engineers would always do the right thing if pesky management just got out of the way. Is that true sometimes? Of course, there are plenty of shit management teams in the world. There are also plenty of engineers that, left undirected, would add unnecessary scope, introduce unnecessary technology, and create different types of problems.
If there really was one answer, and the answer was just as simple as "get rid of the whole management team, and you'll have a much better product at the end" then I have to imagine companies would have started doing this already. My experience being on both sides of this coin in my career is that: it's just not that easy.
Management shouldn't care about the stack because that isn't management's job. That's the IT Department's job. The scope of the project is management's job, as is the budget; the friction comes when they want to eat elephant on a dormouse budget, which is a good time for an IT staffer to leave if management is intransigent about refusing to understand trade-offs, but it's also the IT Department's job to explain costs and benefits in a way the non-technical can understand.
If the CEO is the lead developer and the head of IT and the CFO what signs all them checks, that's obviously different, and in that case everyone should probably understand everything. Otherwise, ask yourself how deep management gets into the minutiae of washing the toilets.
GTFO is easier said than done during these times. Jobs are about picking the set of awfulness one can tolerate. Microservices are one of the lesser awful because they multiple the work by 10x. Management is probably picking microservices because they want a huge team so they can get a promo. Believe me, I know the awful I am putting up with. I just hit 20 years and management doesn't want to know what I think. In fact four months into the job I've had probably 2 opportunities to talk tech with the manager, but I won't, because they picked the architecture and told me about it through Jira tasks.
yes they will not mostly care and will insist on microservices, but they will not ask for the distributed transaction, I will always try to get the thing done by a batch process instead of services calling each other.
I feel like the e-commerce inventory example isn't a great one because the problem is generally solved by avoiding it.
Either:
Set the inventory amount in your e-commerce system to be less than the actual inventory (which is rarely accurate anyway). This is your safety stock and depends on how fast moving the item is and if it's a close-out that you are really trying to sell to zero. Then just handle exceptions at allocation-time when you're able to commit stock.
Or:
Avoid a two-phase commit problem by allocating stock at add-to-cart time with a ticketing system that allows a hold to be placed with a timeout. This is a more customer friendly approach that handles stampedes better, such as caused by marketing emails.
Either way, inventory management is like banking, aiming to be eventually consistent is a lot more realistic than being always consistent.
"Either way, inventory management is like banking, aiming to be eventually consistent is a lot more realistic than being always consistent."
Yeah I love those basic DB examples of transactions and why it's important using bank accounts when in real life banks are all eventually consistent because they had to solve the problem before distributed transactions were available.
Thanks for sharing, If you include scale then the situation might be tricky, you may need to have different DBs and endpoints there will be a limit on how much vertically you can scale, with that thing in mind this problem is not avoidable, but the distributed transaction is still avoidable using eventual inconsistency or a batch proces.
At the end of the day I don't think anyone should be coding distributed transactions into their app's. If you need use a solution that abstracts them for you. Eventually we will get one into Vitess ( https://vitess.io ) when we have figured out something general purpose enough to work for lots of workloads
in one of the situation, we had a similar problem of scaling and consistency not going hand in hand, at that time batch process solves Lot of our problems.
Hmm there are some reasons this isn't directly applicable. This mode of behavior is possible due to relying on payment processors ability to reliably transfer money, and the business's ability to verify that potential. I don't think Starbucks would with if they accepted hand written IOUs.
If you can't rely on an actors intent to such a degree, your actions need to also be less costly to be efficient.
Correlated Ids and idempotent end points with retry is pretty much a two phase commit. Something eventually checks and deals with exceptions.
Accounting for loss and developing ways to operate with it make a process viable.
A long time ago in a galaxy far away, I remember wishing that our “SOA” stack had support for WS-Transaction. Having come from an Oracle DB experience, I couldn’t understand why anyone would willingly go “backwards” and give up the ability to elegantly manage distributed transactions.
It seems the term scaling changes everything, SOA is good but in many cases scaling the SOA might not solve the problems and you need to pick the path of optimization, and microservices are the possible way to optimize the service, Now if you need transaction (not in all cases) as well then It becomes tricky.
Speaking of micro transactions - this was a recent serendipitous discovery that I’m still digesting but thoroughly enjoying: https://platformdesigntoolkit.com/
> Don’t try to build two-phase commit, instead go for an arbitrator pattern which essentially supports resiliency, retry, error handling, timeout handling, and rollback.
https://blog.couchbase.com/saga-pattern-implement-business-t...
https://microservices.io/patterns/data/saga.html
https://developers.redhat.com/blog/2018/10/01/patterns-for-d...
https://www.enterpriseintegrationpatterns.com/patterns/conve...
https://en.wikipedia.org/wiki/Compensating_transaction