What if, in a given use case, multiple microservices are involved but the operations must be transactional : if one of the services fails, all previous operations must rollback. What are the recommended ways of implementing this kind of transactional behavior in a modern HTTP/REST microservices architecture?
I know the pattern is called "distributed transactions" and is often related to two-phase commit protocol. But there doesn't seem to be a lot of practical information available about this topic!
I found this recent presentation that talks about it, but I'd like to learn more on the subject. Also, I'm looking for practical tutorials, not highly academic ones! I'd really love to see code samples, for instance.
Any links, suggestions?
Transactions can be cleanly replaced with reservations in most cases i.e. "I'll reserve this stock for 10 minutes" after which point the reservation is invalid. So a typical flow for a order pipeline payment failure would be:
1. Client places order to order service.
2. Order service calls ERP service and places reservation on stuff for 10 minutes.
3. Order service calls payment service (which is sloooow and takes 2-3 mins for a callback) and issues payment.
4. Payment service fails or payment fails.
5. Order service correlation times out.
6. Order service calls notification service and tells buyer that their transaction timed out and cancels the order.
7. ERP service doesn't hear back from the order service and kills reservation.
At step (4) you have an option to just chuck the message back on the bus to try again after say 2 minutes. If everything times out, meh.
Also if the ERP says "in ten days you can have that amount of potatoes" you can ask for a longer reservation and issue the payment later.
Its all about careful ordering and atomicity at the service level and determining what must be done synchronously and what can be done asynchronously.
1. reservation is placed.
2. Payment succeeds, but either success is not known, the process requesting payment crashes before the response, etc.
Since we never got to telling ERP "hey, that reservation will be permanent because the payment succeeded", but the payment succeeded… what do you do? Does the reservation expire (but my potatoes!)? How do you even know that the payment succeeded, if perhaps a network connection goes dark and requires 2h to fix?
What happens when your payment processor succeeds in processing the transaction, but you don't get the success code? You either retry/confirm/correct... One would assume you would, upon not getting confirmation that your reservation was made permanent, retry the commitment, if it was already committed, then the erp service can return the appropriate response.
At step 3 in your list above the payment would time out and a refund would be issued. Usually payments time out as well so you can usually reserve cash (pre-auth in banking terms). So we end up with stacks of reservations.
If something breaks you can retry within a reasonable limit or wait for everything to drop all the reservations.
Overall, you have to learn to love eventual consistency, but small portions of the domain should absolutely be clustered together around transactional consistency needs that are absolutely necessary.
Check out "Implementing Domain Driven Design" by Vaughn Vernon; chapter 10 in particular talks about this.
1) Use a linearizable data store to store transaction metadata
2) Each step must have a complementary rollback step
3) Rollback steps must be idempotent. Depending on the type transaction, sometimes 'rollback' is too strong and you instead implement other types of recovery (e.g. 'roll-forward')
Didn't get many useful replies. http://ws-rest.org/2014/sites/default/files/wsrest2014_submi... looked promising.
Though it probably needs some JSON and more use of HTTP-specific features in order to be acceptable today.
A concrete example we've faced. A certain operation requires writing data to N flaky services. You successfully write to N-1 of them, but the Nth fails. Now what do you do?
If these N things were just database writes to the same DB, transactions would save you, as you could just rollback. Without that, the answer has to be handled in code -- do you reverse the previous changes (if possible) by sending delete events, or leave the system in some sort of half-baked state and rectify things later via some other process? (I'm interested in hearing of other options...)
The answers I got were:
1) apologetic computing (Amazon)
2) consensus algorithms / paxos
The problem I see is that these may be non-trivial to implement and/or not fully understood or standardized.
This one really works well.
There are a lot of approaches to this. I've explored these ideas with Obvious Architecture (http://retromocha.com/obvious/) and the talk I gave at MWRC 2015 on Message Oriented Programming (http://brianknapp.me/message-oriented-programming/).
I think the big lesson is that the Erlang stuff was WAY ahead of its time and it already solved a lot of the problems of large networked systems decades ago. Now that we are all building networked systems, we are relearning the same lessons telcos did a long time ago.
"Almost all the successful microservice stories have started with a monolith that got too big and was broken up.
Almost all the cases where I've heard of a system that was built as a microservice system from scratch, it has ended up in serious trouble."
Also good: https://rclayton.silvrback.com/failing-at-microservices
There may be good ways of managing that (refactoring at the architectural level, essentially; microservices are in some ways like OOP on a different level), but the practices to do so probably haven't yet been developed.
That said, I feel the quote should be: "As of 2015, almost all successful microservice stories..."
As the tooling and knowledge around microservices builds up over the next few years, I could imagine a world where starting with microservices makes sense. For a new company, the flexibility you get with microservices to try new tech and throw out failed experiments could result in much faster iterations, helping to nail the product-market fit.
Breaking a monolith into services is difficult, but it's much hard to "rebalance" microservices once your product grows and you realize you have gotten the interface wrong. The monolith stage is important because it helps you figure out what the hell you're building. Establishing service boundaries overly early risks getting you "stuck" in the wrong architecture.
Looking at the monolithic architecture, it just took each feature within the monolith and created it as a microservice. Just because you have a monolith doesn't mean you can't have well thought out features and separations of concern.
Before coding, before deciding on architecture, I like to think in these terms. What features make the most sense together? Far apart? It should be a prerequisite of any project, regardless of architecture. If you're building a monolith, each one just goes in a different module or package rather than having its own service.
And I was working on the premise that you are already doing microservices, so presumably you are already taking the overhead hit.
In the context of our little conversation here, Microservices is not an either-or choice. There's quite a penalty you take to productivity/agility/cost with a Microservices architecture just like there was with SOA. It's not free, even if you believe it is "right". Take a look at this: http://martinfowler.com/bliki/MicroservicePremium.html
So, YAGNI certainly does apply here, and I do toss the acronym around lightly on purpose because that is the blunt response we programmers need to hear and give WAY more often. You ain't gonna need it!!!
Architecture astronauts are everywhere and they are mostly a-holes that create chaos for the rank-and-file. You want hell? Ok, go smash your head against the wall implementing another BDFL's pipe dream.
We developers are most to blame in this and we need to cut it out with all the fun meta-work we like to create for ourselves. Run a tight ship, be professional, deliver precisely the product that our customers ask for with no extra bells or whistles.
When you have Mt. Everest size workloads like Netflix has, and you need maximum isolation and monitoring and deployment flexibility then, yeah, you're in another league and Microservices is a really awesome approach. I'm guessing you're in my league though, so, I'm doing you a favor here, you can thank me later: YAGNI.
Separating everything out into little APIs all with their own datastores that all talk to each other sounds great, but I would not to do this on a three person team. Just give me an old fashioned monolithic API, a large database, and then I can spend 80% of my time programming and 20% on maintenance. One app is hard enough to run, why consciously choose to run 10 of them if you don't have the capacity?
I don't think microservices are a bad idea at all...I love the architecture. But the hype makes it hard to see that this architecture probably isn't for you unless you have the capacity for it.
I agree they're not for every team, but it definitely allowed us to move, grow and scale faster than other dev environments I've worked in.
Consider the following that need to be done here.
- A main library that needs to load up a few gigs of data in memory
- A process that communicates with a queue of messages coming in
- A process that interfaces with mobile app (port x)
- A process that interfaces with a different kind of app (port y)
The goal is - every incoming message needs to go through to the main library and back to the app via the queue.
Monolith option - main.cc which contains all this, takes a while to start, can't queue up incoming messages till everything starts up and loads in memory, et al. Even using threads and whatnot.
Now with microservices,
- I can build a service that exposes my big-data-load library through a port. This can be loaded and restarted at will.
- Queue is running as a separate process. Messages queue when main lib is down and processed later.
- Server A and server b run separately
- A bug in one won't crash all the others
- I can manage each service independently (run them via supervisor or whatnot)
- Scaling it is easy - I can deploy each service behind load balancers, on different machines in the future without ever needing to change anything but the urls in a config file
- Monitoring - I have latencies for each service available via haproxy and the like.
If you're building a REST interface to all your services, and something consumes them - they might be slower than a monolithic app unless you have something like a TCP or HTTP level keep-alive built in. Connections need to be long standing - otherwise the overhead of creating a new connection is pretty high.
Question here - what is a good way to make this long standing connection happen? Eg, if you use python urllib3 and nginx - can you keep these connections alive enough (with pings or whatever) that your latency is lower than bundling that service within the code itself as a library?
What I am more worried about with microservices is the data serialization overhead. Transforming data to some encoding that is robust against version changes (say using protobuf) can be quite costly on both sender and receiver, especially in languages with relatively slow object creation (e.g. python). This is highly application specific, but I'd love to hear others' thoughts on this trade-off.
That said, to my point of keep alives - lets say you've got process A talking to process B on localhost, which is making web service calls to the internet. Every 5ms delay is hurting your total response time, especially when your connection pool is waiting on a dropped connection to be reinstated.
"On the surface, the Microservice architecture pattern is similar to SOA. With both approaches, the architecture consists of a set of services. However, one way to think about the Microservice architecture pattern is that it’s SOA without the commercialization and perceived baggage of web service specifications (WS-) and an Enterprise Service Bus (ESB). Microservice-based applications favor simpler, lightweight protocols such as REST, rather than WS-. They also very much avoid using ESBs and instead implement ESB-like functionality in the microservices themselves. The Microservice architecture pattern also rejects other parts of SOA, such as the concept of a canonical schema."
So SOA implies the existence of some heavy enterprise tools like WSDL and SOAP or other RPC type systems. Microservices favor RESTful interfaces.
If microservices catch on, expect five years from now we'll be talking nano-services, and how microservices imply a whole stack of enterprise services that will have grown up around the microservices architecture.
> I cannot think of the "monolithic" variation they depicted as a "service-oriented architecture" of any kind.
I think you're on the right track here. Note that the article uses "monolithic application", not "monolithic services"; it really is the lack of services.
> unless "services" is synonymous with "API" and for my part that is too general a definition to be of much use
I agree that that is too general. To me, a microservice would of course expose an API, but whereas a monolithic application exposes the entire application (or, perhaps, the entire API for everything your company does…), a microservice is exposing a small facet of the overall application, and is only responsible for that facet.
I probably agree with you that "services" is probably what most of us are after; I think the "micro" may just be an attempt to re-emphasize that it shouldn't be one huge thing, and that perhaps you should split services off sooner, rather than later, as it only ever gets harder.
Then again, I've never worked with in a microservices-oriented architecture.
The Microservice architecture pattern significantly impacts the relationship between the application and the database. Rather than sharing a single database schema with other services, each service has its own database schema.
Is this a necessary prerequisite? One of the problems I'm dealing with now (and have been in the past) is the tyranny of multiple data stores. At any reasonable scale, this quickly leads to a lack of consistency, no matter how much you'd like to try.
It feels like most of the gain in a microservices architecture is from functional decomposition of code, with limited benefit from discarding the 'Canonical Schema' of SOA. I'd be interested to hear others' experiences with this, though.
Each of our services is a separate django app, and the database name is <consistent prefix>_<app name>. Originally, this meant we had 5-6 database schemas named <something>_friend, <something>_invite, <something>_news, etc., all one one database.
What ended up happening was some services rapidly outgrew the capacity of a single database server, such as our 'news' service, which handles chat services, private messages, and so on (and thus grows nonlinearly with community growth), unlike other services which grow linearly (like our 'identity' service). As a result, the 'news' database had to move to its own server. Thanks to this database schema separation, however, this was a trivial task. Dump the schema, restore the schema, change the DB host in the django config, and you're done.
If we had our data intermingled in the same schema, it would have been far, far harder to do this.
Fundamentally, your 'microservices' style architecture should be designed in such a way that you could take any of your services, tar up the code, and e-mail it to someone else, and they could use it in their architecture. For obvious reasons this isn't actually feasible (e.g. service interdependencies), but conceptually you should be able to draw firm, hard lines down your stack showing where each service starts and ends; this includes frontend services (nginx/haproxy/varnish/whatever configs), code (including interface definitions/client libraries), data persistence (database schemas, MongoDB collections, etc), and caching (Redis/Memcached/etc. instances).
The more interdependencies you have, the more problems you'll encounter down the road. If you intermingle MySQL data then any maintenance is downtime, any slowdown slows everything, any tuning is across your entire dataset, etc.
Consistency should be maintained at the application level if you want to build a robust service, because doing it in the database leads to a single point of failure (the database)
For something to really be using a microservice architecture? Yes.
Of course, real world systems don't have to use pure architectural styles, though its worth understanding why a named architectural style combines certain features before deciding to use some but not others.
> One of the problems I'm dealing with now (and have been in the past) is the tyranny of multiple data stores. At any reasonable scale, this quickly leads to a lack of consistency, no matter how much you'd like to try.
Honestly, I think if you have real inconsistency (rather than differences in data of similar form but different semantic meaning) with microservices with separate data stores, it means that you have designed your services improperly, such that they have overlapping responsibility.
If consistency is a necessary concern, and you have tightly coupled data, it's not a terrible idea to make the services a little bigger.
But also, if you have a service that depends on multiple other services to do work, I don't think it's so bad to get used to using the API for the other services (rather than trying to access their databases directly) -- despite the introduced latency overhead
Conceptually, though, I don't think it is a requirement.
Not only will there be resistance to the idea of splitting out that datastore, but major investment will be required to do it - implementing all of that disconnected messaging stuff you're going to need, reworking applications/services to communicate that way, and handling eventual consistency - which is a tough sell when the app works "perfectly fine" except for that scaling problem.
It's just a matter of time until someone writes a "nano-service" manifesto...
Prefix Symbol Size Example
yocto y 1 bit Theoretical minimum
zepto z 1 byte (close enough to 10 bits) Really small APL program
atto a 10 chars nc -l 8080
femto f 1 line (roughly 100 chars) netcat piped into something else
pico p 10 lines tiny python service
nano n 100 lines small python service
micro μ 1000 lines typical "smallish" service
milli m 10,000 lines about as big as "microservices" would go these days, or a small monolithic app
centi c 100,000 lines decent-sized monolithic app
deci d 1 million lines large monolitihic app
none n/a 10 million lines roughly OS-level app
deca da 100 million lines god help you beyond here
hecto h 1 billion lines
kilo k 10 billion lines
mega M 100 billion lines
giga G 1 trillion lines
tera T 10 trillion lines
peta P 100 trillion lines
exa E 1 quadrillion lines
zetta Z 10 quadrillion lines
yotta Y 100 quadrillion lines
googol NYSE:GOOG ??? lines Google
I've considered using ZeroMQ Request/Response interfaces with a defined JSON/UTF8->GZ instead of REST layer... My testing worked pretty well, and it could even be used behind http requests (packaged).. with 0mq, you can setup layers of distribution for service points.
At one level or another micro services architecture trades complexity of an application as a whole for complexity in the system as a whole. In the end, most of the services being used in practice could handle the few ms of overhead that http/tcp rest services had over 0mq...
The hardest thing for me was simplifying things as much as possible, I worked really hard to avoid SPOF, that many tened to go to. In the end, instead of the likes of etcd, table storage with a caching client was sufficient... instead of complicated distribution patterns, for a very small load, having a couple instances of each service on each machine was easier.
It really comes down to what you really need, vs. what's simple enough to get the job done, and not lock you down. In then end, love docker (and dokku-alt), but things like coreos, etcd, and fabric turned out to be overkill for the needs.
There are a couple of different ways that are obvious:
1) The act of scheduling a trip requires the trip service to get information from the driver service related to the driver for the trip (the action might be triggered from either service). While information about the driver in the driver service might change, the information about the driver that was recorded with that trip is fixed. All the information necessary to answer queries about involved drivers that are within the scope of the trip service is stored in the datastore for that service. The same thing is generally true of all services.
2) For generalized reporting, information required to support that function is sent by various services to a separate reporting service, which aggregates historical data for reporting purposes. (Even non-microservice architectures often involve this, having transactional operational databases export data into a analytical database, with different schema and capabilities, for reporting purposes, rather than using one datastore for operational and reporting use.)
I wrote about it here: http://productivedetour.blogspot.com.es/2014/12/connecting-t...
Notice his use of Kafka and 0mq to create an SOA for microservices. All of the stuff I had seen previously had the services communicate with each other via REST. So everything is synchronous. In the talk whose link I posted above, communication is asynchronous via messages. Is it reasonable to do both?
Also, how the heck does one deal with transactions across services?
Compare that to synchronous services where the application has be aware of all of them and coordinate calling them all. It is easier to use synchronous when the application requires an output from the service to respond.
Transactions across services are certainly not easy:
"Distributed transactions are notoriously difficult to implement and and as a consequence microservice architectures emphasize transactionless coordination between services, with explicit recognition that consistency may only be eventual consistency and problems are dealt with by compensating operations." 
The same thing is happening with microservices. Engineers are microservcing ALL THE THINGS at such a fine grained level that it becomes a nightmare to maintain, orchestrate and manage. Therefor "microservices suck!".
Most of the time, you can model your domain into a few key areas, say 'customers/security', 'interface' and 'processing'. That's a good 3 service start. You may never need to go beyond that. However, as your needs grow or change, you can start to refine your model based on changing business needs or scale/performance/infrastructure issues.
In my experience it's a completely logical way to design a system and is really no different than making 'libraries' of code all housed under a single master application. The only real difference is the underlying communication infrastructure.
For several reasons, I think our move toward microservices is a good one. But in our case I have seen complexity move from code to coordination.
Are you referring to mashups in that the application is using external APIs? I'd say that's similar with the difference being you don't have to coordinate the deployment part. With mashups you still have the testing challenge and reliance on another system to be running that comes with a microservice architecture.
* All RESTful services must be up and running for application to be fully up and running
* application must have all knowledge of RESTful services it calls
With messaging (and PubSub), services don't need to be running at all times and you can add as many services you'd like without the application needing to know. Applications just says "hey, something happened" and services go to work.
I agree with you, in either case, it is important to code defensively and be aware of possible request version mismatches. Deployment coordination is probably awash between the 2 approaches. I think testing is more of a challenge with message oriented too, as most integration testing tools are geared towards HTTP interactions.