Most people think a micro-service architecture is a panacea because "look at how simple X is," but it's not that simple. It's now a distributed system, and very likely, it's a the worst-of-the-worst a distributed monolith. Distributed system are hard, I know, I do it.
Three signs you have a distributed monolith:
1. You're duplicating the tables (information), without transforming the data into something new (adding information), in another database (e.g. worst cache ever, enjoy the split-brain). 
2. Service X does not work without Y or Z, and/or you have no strategy for how to deal with one of them going down.
2.5 Bonus, there is likely no way to meaningfully decouple the services. Service X can be "tolerant" of service Y's failure, but it cannot ever function without service Y.
3. You push all your data over an event-bus to keep your services "in-sync" with each-other taking a hot shit on the idea of a "transaction." The event-bus over time pushes your data further out of sync, making you think you need an even better event bus... You need transactions and (clicks over to the Jepsen series and laughs) good luck rolling that on your own...
I'm not saying service oriented architectures are bad, I'm not saying services are bad, they're absolutely not. They're a tool for a job, and one that comes with a lot of foot guns and pitfalls. Many of which people are not prepared for when they ship that first micro service.
I didn't even touch on the additional infrastructure and testing burden that a fleet of micro-services bring about.
 Simple tip: Don't duplicate data without adding value to it. Just don't.
We learned and is continuing learning that.
I think the naming decision of the concept has been detrimental to its interpretation. In reality, most of the time what we really want is a"one-or-more reasonably-sized systems with well-enough-defined responsibility boundaries".
Perhaps "Service Right-Sizing" would steer people to better decisions. Alas, that "Microservices" objectively sounds sexier.
We're currently translating a 20 year old ~50MLOC codebase into a distributed monolith (using a variety of approaches that all approximate strangler). I have far less motivation to go to work if I know that I will be buried in the old monorepo. I can change, build and get a service changed in less than an hour. Touching the monorepo is easily 1.5 days for a single change.
We seem to be gaining far more in terms of developer productivity than we are losing to operational overhead.
I also should have mentioned that it's definitely more pleasant for those in purely development roles. Troubleshooting, resiliency and system effects don't impact everyone (and I actually like those types of hard problems). I'd also suggest that integrating tracing, metrics, and logging in a consistent way is imperative. If you're on Kubernetes, using a proxy like Istio (Envoy) or LinkerD is a great way to get retries, backoff, etc established without changing coded.
Finally, implementing a healthcheck end-point on every service and having the impact of any failures properly degrade dependent services is really helpful both in troubleshooting and ultimately in creating a UI with graceful degradation (toasts with messages related to what's not currently available are great). I have great hopes for the healthcheck RFC that's being developed at https://github.com/inadarei/rfc-healthcheck.
Whereas if you write a single monolithic program its connections are described in code, preferably type-checked by a compiler. I think that gives you at least theoretically a better chance of understanding what are the things that connect, and how they connect.
So if there was a good programming language for describing micro-services then that would probably resolve many management difficulties, and then the question would be simply do we get performance benefits from running on multiple processors.
The runtime system has built-in support for concurrency, distribution and fault tolerance. Because of the design goals for Erlang and it's runtime, you get services that can all run on one system or be distributed across a network, but the code that you actually write is relatively simple; the entire distributed system acts as a fault-tolerant VM that has functional guarantees.
If your startup node fails, then other nodes are elected. If a node crashes while in the middle of a method, another node will execute the method instead.
The runtime itself has some analogies to functional coding styles. It runs on a register machine rather than a stack machine. The call and return sequence is replaced by direct jumps to the implementation of the next instruction.
Not much is said about S3's design principles in public, but that one was one of them.
Disclaimer: Recalling from memory.
1. Microservices do not have some inherent property of having to duplicate data. You can have data in single source and deliver that data to anyone who needs it through an API. There are infinitely many caching solutions if you are worried about this becoming a bottleneck
2 and 2.5. There are tools for microservice architectures that counter these problems to a degree, mainstream example being containers and container orchestration (e.g. Docker and Kubernetes). One can even make an argument that microservices force you to build your systems so they are more robust than your monolith would be. If the argument for monoliths is that it's easier to maintain reliability when every egg is in one basket then you are putting all your bets into that basket and it becomes a black hole for developer and operations resources, as well as making operations evolution very slow
3. There are again tools for handling data and "syncing" (although I don't like the idea of having to "sync") the services, for example message queues / streaming processing platforms (e.g. Kafka). If some data or information might be of interest to multiple services, you should then push such data to a message queue and consume it from services that need it. The "syncing" problem sounds like something that arises when you start duplicating your data across services which shouldn't happen (see my argument on 1.)
Again not to say microservices are somehow universally better. Just coming into defense of their core concepts when they get unfairly accused
Now, there is a pattern called Event Sourcing (ES) which proposes that the source of truth should be the event bus itself and microservice databases are mere projections of this data. This is all good and well except it's very hard to implement in practice. If a microservice needs to replay all business events from months or years in the past it may take hours or days to do this. What about the business in the meantime? If it's a service that significantly reduces the usability of you application you effectively have a long downtime anyway.
Transactional activity becomes incredibly hard in the microservices world with either 2 phased commit (only good for very infrequent transactions due to performance) or with so called Sagas (which are very complex to get right and maintain).
Any company that isn't delivering its service to billions of users daily will likely suffer far more from microservices than they will benefit.
In my experience it's not hard to implement, but of course it depends on the problem domain (and probably also on not splitting things up willy-nilly because of the microservices fad). I think the key to event sourcing and immutability in general is to not overdo it. For example you will likely need to redact certain data (e.g. for legal compliance), so zero information loss is out. Systems like Kafka are a poor choice for long term data storage, the default retention is 1 week for a reason.
But the things that are wonderful about event sourcing (the ability to inspect, replay and fix because you haven't lost information) mostly materialize over a 1 week timeframe.
If you need to recover a lot of state from the event log you will need store aggregated event data at regular intervals to play back from to have acceptable performance. But in practice, in many cases the data granularity you need goes down as the data ages anyway, and you do some lossy aggregation as a natural part of your business process (as opposed to to deal with even sourcing performance problems). I.e. for the short timeframe kafka is the source of truth, but for the stuff you care long term it's some database, and this happens kinda naturally. So often you don't need to implement checkpointing.
Once you add in this consolidated data service, every other service is dependent on the "data service" team. This means almost an change you make, requires submitting a request. Are their priorities your priorities? I would hate to be reliant on another team to do every development task.
Theoretically you could remove this issue by allowing any team to modify the data service, but then at that point you've just taken an application and added a bunch of http calls between method calls.
This same problem ops up with resiliency. If you have a consolidated data service, what happens if your data service goes down? How useful are your other services if they can't access any data?
Re. teams: For any project above a certain size, you'll have teams. If that's a network boundary, a process boundary, or a library boundary doesn't change that you'll have multiple teams for a large project.
I'm not sure I get the resiliency point. I worked on a project where the dependent data service was offline painfully frequently. We used async tasks and caching to keep things running and were able to let the users do many tasks. For us our tool was still fairly useful when dependencies went down. If we used monolith then everything would be down, right? That doesn't sound better.
For sure, and one of the big selling points for micro-services is you can split those teams by micro service, with each team having an independent service they are responsible for. But when a big chunk of everyone's development is done on one giant service everyone shares you don't get the same benefits you would if the services were independent. Or put another way, splitting micro-services vertically can yield a bunch of benefits, but splitting them horizontally introduces a lot of pain with few benefits.
> I'm not sure I get the resiliency point. I worked on a project where the dependent data service was offline painfully frequently. We used async tasks and caching to keep things running and were able to let the users do many tasks. For us our tool was still fairly useful when dependencies went down. If we used monolith then everything would be down, right? That doesn't sound better.
I'm not saying to never spin off services. If you have a piece of functionality that you just can't get stable for the life of you, splitting it off into it's own service, and coding everything up to be resilient to it's failure makes a lot of sense. (I am very curious what the cause of the data service crashing was that you couldn't fix.)
But micro-services aren't a free lunch for resiliency. You're increasing the numbers of systems, servers, configurations, and connections which by default will decrease up-time until you do a ton of work. Not to mention tracking and debugging cross service failures is much more difficult than a single server.
What about running multiple versions of a microservice in parallel -- don't each need their own yet separate databases that attempt to mirror each other as best they can?
The short answer is "no," as succinctly stated by the, I assume from the name, majestic SideburnsOfDoom. The versions shouldn't _EVER_ be incompatible with each other.
E.g. you need to rename a column.
Do not: rename the column, e.g. `ALTER TABLE RENAME COLUMN...`. Because, your systems are going to break with the new schema.
Do: Add a new column, with the new name, and migrate data to the new column, once it's good upgrade the rest of your instances, then drop the old column. Because, you can use both versions at the same time now without breaking anything. Yes, it can be a little tricky to get the data sync'd into the new column, but that's a lot less tricky than doing it for _every_ table and column.
1) "A" Component
2) "B" Component
3) "A" Server
4) "A" Client
Now you have to worry about network reliability, back-pressure, queuing, retries, security, bandwidth, latency, round-trips, serialization, load-balancing, affinity, transactions, secrets storage, threading, and on and on...
A simple function call translates to a rats nest of dependency injection, configuration reads, callbacks, and retry loops.
Then if you grow to 4 or more components you have to start worrying about the topology of the interconnections. Suddenly you may need to add a service bus or orchestrator to reduce the number of point-to-point connections. This is not avoidable, because if you have less than 4 components, then why bother to break things out into micro services in the first place!?
Now when things go wrong in all sorts of creative ways, some of which are likely still the subject of research papers, heaven help you with the troubleshooting. First, it'll be the brownouts that the load balancer doesn't correctly flag as a failure, then the priority inversions, then the queue filling up, and then it'll get worse from there as the load ramps up.
Meanwhile nothing stops you having a monolithic project with folders called "A", "B", "C", etc... with simple function calls or OO interfaces across the boundaries. For 99.9% of projects out there this is the right way to go. And then, if your business takes off into the stratosphere, nothing stops you converting those function call interfaces into a network interface and splitting up your servers. However, doing this when it's needed means that you know where the split makes sense, and you won't waste time introducing components that don't need individual scaling.
For God's sake, I saw a government department roll out an Azure Service Fabric application with dozens of components for an application with a few hundred users total. Not concurrent. Total.
Nit: if service X could function without service Y, then it seems to follow service Y should not exist in the first place. And equivalently, the functionality of service Y before some microservice migration.
Recommendations are not necessary, and not showing them will significantly affect the bottom line, so you don't want to skip them if possible. But not showing the product page (in a timely manner), because the recommendation engine has a hiccup, is even worse for your bottom line.
This line of argument fails to take into consideration any of the reasons why in general microservices are the right tool for the right job.
Yes, it's challenging, and yes it's a distributes system. Yet, with microservices you actually are able to reuse specialized code, software packages, and even third-party services. That cuts down on a lot of dev time and cost, and makes the implementation of a lot of of POCs or even MVPs a trivial task.
Take for example Celery. With Celery all you need to do to implement a queuable background task system that's trivially scalable is to write the background tasks, get a message broker up and running, launch worker instances, and that's it. What would you have to do to achieve the same goal with a monolith? Implement your own producer/consumer that runs on the same instance that serves requests? And aren't you actually developing a distributes system anyway?
that's a little bit of a straw man because that's not the "microservice" architecture this post is talking about. I personally wouldn't call that a "microservice" architecture, I'd call it, "a background queue", although strictly speaking it can be described as such.
what this post is talking about are multiple synchronous pieces of a business case being broken up over the http / process line for no other reason than "we're afraid of overarchitecting our model". This means, your app has some features like auth, billing, personalization, reporting. You start from day one writing all of these as separate HTTPD services rather than just a single application with a variety of endpoints. Even though these areas of functionality may be highly interrelated, you're terrified of breaking out the GOF, using inheritance, or ORMs, because you had some bad experience with that stuff. So instead you spend all your time writing services, spinning up containers, defining complex endpoints that you wouldn't otherwise need...all becuase you really want to live in a "flat" world. I'm not allowed to leak the details of auth into billing because there's a whole process boundary! whew, I'm safe.
Never mind that you can have an architecture that is all separate process with HTTP requests in between, and convert it directly to a monolithic one with the same strict separation of concerns, you just get to lose the network latency and complex marshalling between the components.
Independent services that create a coherent whole enforce isolation barriers. I don't believe in microservices. These things don't need to be micro. They can be just normal-sized services. I don't even particularly care internal to those barriers how bad things get. There are programmers that write shit I think is hot garbage and will cause all kinds of bugs as time goes on. But if they are confined to their space space, and they're solving their problem and are happy and making management happy then w/e. That piece of the whole will collapse and die eventually, but it won't take everything with it. It's just a piece, and it provided value for a while, so probably worth it from a business sense.
But when you have a monolith? And Developer Dave and Programmer Pete tell Big Boss Bob that they could sure get that feature out quick if only it weren't for those pesky rules preventing them from putting in a few mutexes in that one module so they can just read some data directly from it, and boy wouldn't it that be swell? Well Big Boss Bob says we need to get this feature SHIPPED BOYES so buckle up and put in those mutexes, and now the fucking thread goes into a deadlock state every so often but it's really intermittent so you spend hours debugging the damn things and late nights because yeah features gotta ship but shit gotta work, and you trace it back to Developer Dave and Programmer Pete and their change but what do you do? Big Boss Bob said do it that way and what? You gonna whine to Director Dan about it? Is that gonna get you back your late night spend figuring out what was going on? Nah.
Make systems that are just small that when people break them completely it doesn't mean the end of the world to throw it out and start over.
After my team had taken that project and split it into a bunch of smaller packages, it's not like everyone magically became better programmers. The same people who were introducing fucking spaghetti code that did who-knows-what in overly complex ways were still around, and they were given effectively their own sandboxes. In fact quite a few more teams within the company began using those packages, which just became modules they shipped with. We no longer had to deal with people screwing with core packages because they no longer had ownership, they could no longer make merges into that area of the code base, so we could keep things stable.
So I'm a hard sell on monoliths. Like, I'm not actually pro micro-services, per se. I'm mostly just anti-monolith. Giant code bases with multiple teams simultaneously contributing are doomed to fucking disaster.
What is there about microservices that makes them better at this? Most of the stuff I have seen has been highly coupled.
Also, like I said, not a fan of the "micro" terminology, it just confuses the issue.
And keep in mind that coupling is not the opposite of encapsulation. Encapsulation just means hiding internal state. So unless you're going completely out of your way to break things, you'll have services that communicate through some message passing protocol, which means services are inherently encapsulated. It's not that you can't break that, it's that generally it's hard.
Here's an example of breaking it, though:
You have an API that talks to the database. You need to get users, groups, and preferences. Instead of writing different endpoints to access all of these things individually, clever you decides to simply make an endpoint that is "postQuery" and you give it a sql string and it will return the results. Great. You have now made the API effectively pointless.
Another example: you have an API that needs to perform standard accounting analysis on an internal dataset. Instead of adding endpoints for each calculation (or an endpoint with different inputs), you create an endpoint "eval" that will internal eval a string of the language of your choice. Congrats, you can now use that API to execute arbitrary code! No need to define any more pesky endpoints.
So yeah, absolutely people can make shit decisions and build garbage. It's entirely possible. But hey, at least it's pretty obvious this way and if you see your team do this you can look for another job.
Now that said, with the new jigsaw module system in the jvm, and the multi-language support that is constantly getting better, a disciplined enough senior management team could enforce module boundaries within the process. It means any change to jvm command line flags would require the approval of the most senior tech lead, because that's how module boundary enforcement can be disabled, but if you have that and it works you would get significant performance and simplicity benefits.
So you have encapsulated services... boss comes and says "we need feature X right away". What if feature X spans all of your microservices? The bad programmer will hack together a monstrosity between multiple services. It's the micro-lith problem: instead of a monolith, you have a monolith disguised as micro services. Now their poor choices are spread across a lot of services and distributed, it's not really confined to just one service.
Wait, what is a good programmer supposed to do in this scenario?
It is not a strawman; it's a concrete example of the technical, practical, operational, and economical advantages of microservices architecture, more specifically service reuse, specially managed services provided by third parties.
While you're groking how a multihreadig library is expected to be used, I already fired a message broker that distributes tasks across a pool of worker instances. Why? Because I've opted not to go with the monolith and went with the microservices/distributed architecture approach.
I'm so confused by that statement... because I can't for the life of me figure out how you got there.
You absolutely can have a monolith which is multi-threaded or asynchronous and "resource/task pools." The JVM for instance has threads, I use BEAM (Elixir) personally, and it's even pre-emptively scheduling my tasks in parallel and asynchronously... but, I still don't get what multi-threading has to do with microservices.
Microservices and monoliths are boundaries for your application they aren't implementation details (i.e. all microservices must be asynchronous is strictly not true) in and of themselves, they're design details. That design can influence the implementation but they are separate.
Ex. there are plenty of people who use Sidekiq and redis like you're using Celery but don't call it a microservice. It's just a piece of their monolith since it's largely the same depdencies.
I mean consider a gen server mounted in your supervisor tree. its 5 min of work; tops. Doing the same with kubernetes would require coordinating a message broker, picking a client library, creatig a restart strategy and networking. all of which would add considerably to your development time
You would take Celery, get a message broker up and running and launch some worker instances.
There are many benefits to having microservices that people seem to forget because they think that everyone interested in microservices is interested in splitting their personal blog into 4 different services.
They take coordination, good CICD, and a lot of forethought to ensure each service is cooperating in the ecosystem properly, but once established, it can do wonders to dev productivity.
I think if the database gets too overloaded I'll partition certain tree nodes across multiple masters (this is feasible because the framework doesn't rely on a single timestream).
With the level of shared code (the framework) and the single database, it's somewhat monolithic but the actors themselves are quite well-behaved and independent on top of it.
Analysts expect to be able to connect to one system, see their data, and write queries for it. They were never brought into the microservices strategy, and now they're stumped as to how they're supposed to quickly get data out to answer business questions or show customers stuff on a dashboard.
The only answers I've seen so far are either to build really complex/expensive reporting systems that pull data from every source in real time, or do extract/transform/load (ETL) processes like data warehouses do (in which the reporting data lags behind the source systems and doesn't have all the tables), or try to build real time replication to a central database - at which point, you're right back to a monolith.
Reporting on a bunch of different databases is a hard nut to crack.
1. Pay a ton of money to Microsoft for Azure Data Lake, Power BI, etc.
2. Spend 12 months building ETLs from all your microservices to feed a torrent of raw data to your lake.
3. Start to think about what KPIs you want to measure.
4. Sign up for a free Google Analytics account and use that instead.
Okay, sounds reasonable enough for a complex enterprise.
> to feed a torrent of raw data to your lake
Well, there's the problem. Why is it taking a year to export data in its raw, natural state? The entire point of a data lake is that there is no transformation of the data. There's no need to verify the data is accurate. There's no need to make sure it's performant. It's just data exported from one system to another. If the file sizes, or record counts match, you're in good shape.
If it's taking a year to simply copy raw data from one system to another, the enterprise has deeper problems than architecture.
Yes, it makes life harder for the data engineers in the future, but it might turn out that analysts only ever need 5% of the data in the lake, and dealing with these schema changes for 5% of the data is easier than carefully planning a public schema for 100% of it.
It can be helpful to include some small amount of metadata in the export though, with things like the source system name, date & time, # of records, and a schema version. Schema version could easily be the latest migration revision, or something like that.
But if I haven't spent the effort to extract it, do I really own it?
Then you just don't keep red data for longer than 30 days.
Source Data -> Data Lake -> ETL Process -> Reporting DataWarehouse(s)/DataMart(s) -> User Queries
Source Data -> Data Lake -> User Queries
MonolithDB -> User queries
MonolithDB -> ETL Process -> Reporting DataWarehouse(s)/DataMart(s) -> User Queries
A schema change in the source data should be easily updated in the ETL process in example 1. Most changes are minimal (adding, removing, renaming columns). And for a complete schema redesign in the source data, a new entry in the data lake should be created and the owners of the ETL process should decide if the new schema should be mangled to fit their existing reporting tables or to build new ones. Across the four models I outlined above, the first is by far the easiest to update and maintain, IMO.
Another strategy is that the service has an explicit API or report specification; that way, the team that owns the services also owns the problem of continuing to support that while changing their internal implementation.
Of course, whether the benefits are worth the cost is probably organization specific, just like microservices in general.
The reasons for it are hard to explain succinctly in HN comment but if you look up data lake there will be a lot of explanation. But it basically comes down to "is it better for the data integrators or the data consumers to massage the data?" And data lakes was the insight it's really great when the consumers decide how to massage the data.
To me this sounds great. And honestly you should do the same thing with a monolith. Nothing worse than "oh you can't make that schema change because a customer with a read-only view will have broken reports".
The main benefit of a dumb copy is that the production service is not impacted by reporting, only a copy is. This relates to performance (large queries) but also implementation time.
On the other hand, the notion that “microservices == completely independent of everything else” is an unrealistic one to hold.
But yeah, it's funny how these projects get complicated in larger organizations. Personally I would have rolled something even simpler on gnu/posix tools and scripts, in rather less than a month.
Use off the shelf stuff but be prepared to have to move in a (relative) hurry.
“Data lake” may not be the right answer, but GA certainly isn’t.
One of the key concepts in microservice architecture is data sovereignity. It doesn't matter how/where the data is stored. The only thing that cares about the details of the data storage is the service itself. If you need some data the service operates on for reporting purposes, make an API that gets you this data and make it part of the service. You can architect layers around it, maybe write a separate service that aggregates data from multiple other services into a central analytics database and then reporting can be done from there or keep requests in real time, but introduce a caching layer or whatever. But you do not simply go and poke your reporting fingers into individual service databases. In a good microservice architecture you should not even be able to do that.
Most APIs are glorified wrappers around individual record-level operations like- get me this user- or constrained searches that return a portion of the data, maybe paginated. Reporting needs to see all the data. This is a completely different query and service delivery pattern.
What happens to your API service written in a memory managed/garbage-collected language when you ask it to pull all the data from its bespoke database, pass it through its memory space, then send it back down the caller? It goes into GC hell, is what.
What happens when your API service when it issues queries for a consistent view of all the data and winds up forcing the database to lock tables? It stops working for users, is what.
There are so many ways to fail when your microservice starts pretending it is a database. It is not. Databases are dedicated services, not libraries, for a reason.
It is also true that analysts should not be given access to service databases, because the schema and semantics are likely to change out from under them.
The least bad solution? The engineering team is responsible for delivering either semantic events or doing the batch transformation themselves into a model that the data team can consume. It's a data delivery format, not an API.
Its not perfect but what we do is create a bunch of table views that represent each of the core data types in the system. We can then do all of the complex joins to collect the data analysts want in to an easy to query table as well as trying to keep the views consistent even as the db changes.
Can you exapand on this a little? Or a paper that I can read?
You could say but oh, why not just return the underlying data without making objects? Well now you are exposing the underlying data format, which is what we’re trying to avoid by giving this job to the service.
Now I try to think about problems as "I have input data of shape X, I need shape Y" and fractally break it down into smaller shape-changes. I am kinda starting to get what those functional programmers are yammering on about.
Since a micro service deals with only its own data and reporting is then across services, we’d need to query across services to get data and make sense of it. If we’d ever need to query all records, then such records would become domain objects in the micro services first before being passed along. A large number of domain objects would require a large amount of memory. Processing and releasing domain objects will result in GC on the released objects.
It’s not like they come up with every report they think they might need while the micro service is being architected. They come up with a new report long after engineers have moved on. If it’s a SQL database, no problem. If it’s some silly resumeware data store, then what?
If you can't ask questions you didn't think of in advance, you didn't collect data.
Why pull it into memory like that? Why not just pump it through a stream?
We have a whole industry around Analytics and Data and the tools and processes to build this reporting layer is well established and proven.
Nothing will give you as many nightmares as letting your analysts loose on your micro service persistence layers!
Our databases are open to way too many people. What's worse, they are multi tenant making refactoring really hard.
We used to have a few of those, especially on exadata clusters. Finally carted them out of the local dc after moving to RDS Aurora databases with strict policies. Might have caused 3 or 4 people to quit, but totally worth it for the 500+ people that stayed who now can own their data, schema and development (and be held responsible for it! -- another issue of multi-db-access, it's always someone else's fault). Went from deploying once a day with a 'heads up' message to no-message deploying multiple times per hour.
I cannot imagine people not doing that and having need to have stats in real time. For most shopping/banking stuff you can get away with once in 24 hours dumps and then analytics can be done on that.
This is why I distrust all of the monolith folks. Yes, it's easier to get your data, but in the long run you create unmaintainable spaghetti that can't ever change without breaking things you can't easily surface.
Monoliths are undisciplined and encourage unhealthy and unsustainable engineering. Microservices enforce separation of concerns and data ownership. It can be done wrong, but when executed correctly results in something you can easily make sense of.
In a microservice architecture it's harder to pretend you're doing it right.
After seeing a few of them, I'd say: "it's less embarrassingly obvious that you're doing it wrong."
But dig into the code for a few endpoints and it usually don't take long to find the crazy spaghetti and the poorly-carved-out separation of responsibilities breaches.
The trick with microservices is that the ecosystem is maturing and there are still lots of ways to screw up other things that are harder to screw up with monoliths. In time 95% of those will go away (my specific prediction is that one day we will write programs that express concurrency and the compiler/toolchain will work out distributing the program across Cloud services--although "Cloud" will be an antiquated term by then--including stitching together the relevant logs, etc and possibly even a coherent distributed debugger experience).
Quite the self-fulfilling prophecy there.
> Yes, it's easier to get your data, but in the long run [...]
Systems can and should be evolved and adapted over time. E.g. deploying components of the monolith as separate services. You can't easily predict what the requirements for your software going to be in say 10 years.
And depending on the stage a company is, easy access to data for business decisions outweighs engineering idealism.
I think there are different levels of sophistication of "engineering idealism". GP talks about "data ownership", and I get the desire to keep the data a microservice is responsible for locked in tightly with it. But let's be precise why it's good: because isolating responsibility reduces complexity. Not because code has some innate right to privacy.
In my own engineering idealism, there's no internal data privacy in the system. Things should be instrumentable, observable in principle. If an analyst wants to take your carefully designed internal NoSQL document structure and plug it into an OLAP cube for some reason, there must be a path to doing that; if that's an expected part of the business, the service needs to have it on the feature list, that this should be doable without degrading the service.
Software needs to be in boxes because otherwise we can't handle it mentally, but the boxes really shouldn't be that black.
YMMV, but the tradeoff is less complexity at the SWE/prod department, and more at the analytics team.
The thing is, it just shifts around complexity. Once you have microservices, you have to deal with a bunch of new failure modes, plus a bunch of extra code whose only purpose is to provide an interface to other services. And in terms of separating data, the worst part is that you've prevent access this data with some other data within the same transaction.
Microservices require your organization to have an engineering culture. I would be afraid of introducing them at, say, Home Depot where (I've heard) your average programmer doesn't even write tests.
If you have engineering talent within a small multiplicative factor of Google (say 0.5), then you can pull off Microservices at your org.
Edit: I'm being downvoted, but I don't think it's a dangerous assumption or point to make that it takes a certain amount of discipline and experience to implement microservices correctly. When you have that technical capacity and the project calls for it, the benefit is tremendous.
I've seen good and bad in each approach. It's certainly possible to enforce good SOCs and proper boundaries in monorepos, and also possible to plough a system into the ground with microservices.
They're all just tools in your toolbox and both have a part to play in modern development.
As the article says though, you can't fix a people problem (bad engineering practices and discipline) by going from one technology to another (monolith to microservices).
The same folks aren't going to magically learn how to do distributed computing properly, rather they will implement unmaintainable spaghetti network calls with all the distributed computing issues on top.
It's about code quality, microservices are easy replaceable. Modules are too.
With both systems, the core part ( eg. mesh, Infrastructure, ... ) Is crucial.
I think experienced developers can see this, the ones that actually delivered products and had big code changes. The ones that handled their "legacy" code.
Microservices are just a way to enforce it, there are others. None are perfect or bad, both have their use-case.
But to speak directly to your concern, you have to think about service boundaries and granularity correctly. Nobody is saying make a microservice out of every conceivable table. Think about the bigger picture, at a systems level. Wherever you can draw boxes you might have a service boundary.
Why would you need to join payment data to session and login data?
Do you need to compare employee roles and ACLs against product shipping data?
These things belong in different systems. If you keep them in the same monolith, there's the danger that people will write code that intertwines the model in ways it shouldn't. Deploying and ownership become hard problems.
The goal is to keep things that are highly functionally related together in a microservice and expose an API where the different microservices in your ecosystem are required to interact. (Eg, your employees will login.)
When the data analytics folks want to do advanced reporting on the joins of these systems (typically offline behavior), you can expose a feed that exports your data. But don't expose an internal view of it to them or they'll find ways of turning you into a monolith.
Also then what also happens is microservices are created using different languages which in turn adds so much complexity to understand what is going on on the whole big picture level.
And code gets repeated a lot more. If there is change in a microservices or update everyone will need to figure out what services depend on and how they will have to adapt. With monolith you can just use your IDE to see what will break if you make a change. So much repeated business logic. Creating a new feature involves having to have many meetings to figure out what services in which way have to be updated.
It is crazy mess in my opinion.
I have been with a company that had monolith application which they split up to more than 15 services (some python, some js, Scala, Java, etc...). Monolith still is used for some parts that are not migrated. I was working on single service having no idea how the whole system worked together. Then I had to do something in the old parts and I very quickly got an understanding how everything works together.
This is what people mean when they say "distributed monolith" vs. microservices.
And holy fuck is debugging that stuff difficult. HUUUUUGE waste of time, but management looooooooooves their blasted microservices...
With microservices, without a good documentation how it connects, it's going to leave a very bad impression.
It is still nowhere close to ability to jumping around with IDE.
It might be in a different language, different design patterns and to get to the details you have to check out that project anyway because you can't document absolutely everything out of code base. And if you do you will end up with multiple sources of truth.
It is so much more likely that for every little issue which you otherwise might be able to find an answer to yourself very easily you will have to contact the team owning that microservices.
It is not only mentally exhausting. It is time consuming, it requires so much back and forth. It creates so much dependence on other people because figuring out how things are related is so much more difficult.
Sometimes I have 8 or more different IDE windows open to understand what is going on.
This is the hardest part.. I'd argue that this is almost impossible to do correctly without significant domain modeling experience.. also microservices by nature make this hard to refactor these boundaries (compared to monoliths where you'd get compile time feedback)
I prefer to make a structured monolith first (basically multiple services with isolated data that are squished together into a single deployable) and pull them out only if I really need to... Also helps with keeping ms sprawl under control
That's what SOA and microservices is supposed to solve.
At that scale you do reporting from a purpose-built service.
There's going to be a relationship between data in your services, but it shouldn't be directly referential.
You can enforce separation of concerns and data ownership in a monolith just as much as you can not enforce these two characteristics in a micro service architecture. Microservices and monoliths are a discussion about deployment artifacts, full stop.
How does creating a tangle of microservices (effectively distributed objects) really solve the problem?
So do modules/classes/interfaces etc. You don't need a layer of HTTP in between components to have abstraction.
In addition, it feels like microservices solve a problem that very few people really have. I've never run into a case where I though "boy, I'd sure like to have a different database for this one chunk of code". If that did happen, then sure, split it out, but I can hardly believe that splitting your entire code base into microservices has a net benefit. The real problem in nearly every project I've worked on is complexity of business logic. A monolith is much easier to refactor, and you can change the entire architecture if you need to without having to coordinate releases of many different applications.
My understanding of microservices is a bunch of loosely connected services that can be changed with minimal impact to the others
Problem with the ideal is in reality this never works as complexity grows the spaghetti code moves to spaghetti infrastructure ( Done a network map of a large k8s / istio deployment lately ? )
Depending on where you work, it can be a problem, because the separation is not always appropriate, and can for political reasons be much harder to revert when visible at the service level (for example because the architect doesn't understand consistency, or because your manager tells you that the distributed architecture documentation has been sent to the client so it cannot be modified).
In case of undue separation, reworking the internals of the enclosing monolith should have less chance to cause frictions.
Then your monolith is just all the modules glued together.
This really makes sense to me. I love the idea that part of a microservice team's responsibility is ensuring that a sensible subset of the data is copied over to the reporting systems in such a way that it can be used for analysis without risk of other teams writing queries that depend on undocumented internal details.
At what point in time?
I agree. In a monolith architecture, though, you CAN do that (and many shops do.) That's where their pains come from when they migrate from monolith to microservice: development is easier, but reports are way, way harder.
Not even that -- that idea is still highly debatable.
I would argue that it absolutely isn't easier, and the stepping-back-in-time of developer experience is one of the biggest problems with microservices.
Microservices in general, are way, way harder.
Side point: This is a needlessly hostile and unprofessional way to refer to a colleague. Remember that you and the reporting/analytics people at your company are working towards the same goals (the company's business goals). You are collaborators, not combatants.
You can express your same point by saying something like "The habit of directly accessing database resources and building out reporting code on this is likely to lead to some very serious problems when the schemas change. This is tantamount to relying upon a private API." etc.
We can all achieve much more when we endeavor to treat one another with respect and assume good intentions.
Anyway, what’s the reason not to treat people on hackernews with the same respect you’d treat a coworker with?
And for data scientists working on production models used within production software, most inference is packaged as containers in something like ECS or Fargate which are then scaled up and down automatically. Eg, they are basically running a microservice for the software teams to consume.
Real time reporting, in my opinion, is not the domain of analysts; it's the domain of the software team. For one, it's rarely useful outside of something like a NOC (or similar control room areas) and should be considered a software feature of that control room. If real-time has to be on the analysts (been there), then the software team should dual publish their transactions to kinesis firehouse and the analytics team can take it from there.
Of course, all of this relies heavily on buy-in to the world of cloud computing. Come on in, we all float down here.
The complexity you and the OP seem to be describing are more in the management and prioritization of analytics projects than in the actual "this is a hard technical problem" domain. It's just a lot of it is tedious especially compared to "everyone just put all your data in the Oracle RACs and bug the DBA until they give you permission" model of the past.
What ended up happening is each application uses its own database, nobody offered applications that could be configured to an existing data base, and all of our data is in silos.
I think one reason to avoid this approach is because SQL and other DB languages are pretty terrible (compared to popular languages like C#, Python, etc...) But why has no one written a great DB language yet?
* Testing is god-awful. To test a simple thing you had to know how the whole application worked, because there's validation in triggers, which triggers other triggers, which require things to be in a certain state. This made refactoring really hard/risky so it rarely got done.
* There's a performance ceiling, and when you hit that, you're done. We did hit a ceiling, did months of performance tuning, then upgraded to the biggest available box at the time, 96 cores, 2TB ram, which helped, but next time the upgrade won't be big enough. You're limited in what one box can do (and due to stored procedures being tied to the transaction there's limits to what you can do concurrently as well)
Debugging PL/SQL without the PL/SQL debugger is nearly impossible. Unfortunately a lot of shops cheap out on developer tools after they buy the server licenses. I never liked the idea of Java on the database. The good thing about PL/SQL is that nobody wants to write it so it has a tendency to not be overused by most developers.
The performance ceiling probably wouldn't be too low if there weren't too much extra activity with each update. As with all databases, sharding and replication are your friend.
In any case, designing a single schema that encompassed all the needs of the organization and could grow and change as the organization did was nearly always too much to ask.
This was in the days of enterprise data modeling where people believed there really was just one data model or object model that could represent the whole org, independent of the needs of any given application. I don’t think anybody believes that any more.
Perhaps refactoring, had it been better understood around 1970, could have gone a long way toward harmonizing diverse schemas, allowing experimentation with eventual refactoring into the common database.
Our current environment makes this impossible. There's no way that Salesforce is going to ship a version that works with your company's database schema. You're going to have to supply that replication yourself. Same for Quickbooks. To get that kind of customization you need to be spending hundreds of thousands for enterprise software.
Edit - I just saw that you addressed this in your original post: "nobody offered applications that could be configured to an existing data base"
SQL is actually hard to compare to programming languages, because in SQL you say what you want, while in iterative language you say how. I only know one language that was competing with SQL (and lost) it is QUEL (originally it was used by Ingress and Postgres).
BTW for triggers and stored procedures you actually can use traditional language, I know that PostgreSQL supports Python, you just need to load a proper extension to enable it.
For write-heavy workloads, best of luck to you :)
Where did you get that from?
Analytics has very different workloads and use cases than production transactions. Data is WORM, latency and uptime SLAs are looser, throughput and durability SLAs are tighter, access is columnar, consistency requirements are different, demand is lumpy, and security policies are different. Running analytics against the same database used for customer facing transactions just doesn't make sense. Do you really want to spike your client response times every time BI runs their daily report?
The biggest downside to keeping analytics data separate from transactions is the need to duplicate the data. But storage costs are dirt cheap. Without forethought you can also run into thorny questions when the sources diverge. But as long as you plan a clear policy about the canonical source of truth, this won't become an issue.
With that architecture, analysts don't have to feel constrained about decisions that engineering is making without their input. They're free to store their version of the data in whatever way best suits their work flow. The only time they need to interface with engineering is to ingest the data either from a delta stream in the transaction layer and/or duplexing the incoming data upstream. Keeping interfaces small is a core principle of best engineering practices.
Now databases themselves are different stories, they are the persistence/data layer that microservices themselves use . But it's actually doable and I'd even say much easier to use microservices/serverless for ETL because it's easier to develop CI/CD and testing/deployment with non-stateful services. Of course, it does take certain level of engineering maturity and skillsets but I think the end results justify it.
In the end everything involves tradeoffs. If you need to partition your data to scale, or for some other reason need to break up the data, then reporting potentially becomes a secondary concern. In this case maybe delayed reporting or a more complex reporting workflow is worth the trade off.
DataWarehousing has drastically improved recently with the separation of Storage & Compute. A single analyst's query impacting the entire Data Warehouse is a problem that will in the next few years be something of the past.
No, because as soon as you change your schema, you have to plan ahead with the reporting team for starters. The reports still have to be able to work the same way, which means the data needs to be in the same format, or else the reports need to be rewritten to handle the changes in tables/structures.
IMO the devops folks should define some standard containers that include facilities for reporting on low-level metrics. Most of the monitoring above that should be managed by the microservice owner. The messages that are consumed for BI and external reporting should not have breaking changes any more than the APIs you provide your clients should.
This is a great point. The way to make backward-compatible changes to an API is by adding additional (JSON) keys, not changing / removing keys. The same approach works for a DB -- adding a new column doesn't break existing reporting queries.
That's exactly how that problem has been solved successfully for the past 20 years.
It is extremely expensive and slow to move all the data to a data warehouse. Ignoring the cost element, the latency from when data shows up in the operational environment to when it is reflected in the output of the data warehouse is often unacceptably high for many business use cases. A large percentage of that latency is attributable solely to what is essentially complex data motion between systems.
For it to work, online OLAP write throughput needs to scale like operational databases. This is not the case in practice, so operational databases can scale to a point where the OLAP system can't keep up. The technical solution is to scale the operational system to absorb the extra workload created by the data warehousing applications, but current database architectures are not really designed for it so it isn't trivial to do.
Right, that's the data warehouse method that I described. "Put them into their system" is a lot harder work than just typing in "put Kafka on top of your database."
Which all gets back to the point of the OP.
Disclaimer: working on Debezium
In a non-append-only scenario, Debezium tracks each source operation (insert, update, delete) from the replication log (oplog, binlog, etc) as an individual operation that's emitted into a kafka topic. How does one efficiently replicate this to a Data Warehouse in an efficient manner?
I have not been able to use Debezium as way to replicate to a Data Warehouse for this very reason. At least not without having to resort to very complicated data warehouse merge strategies.
Note, there exist Data Warehouses that allow tables to be created in either OLAP or OLTP flavor. I understand that Debezium could easily replicate to an OLTP staging table. But are there any solutions if this isn't an option?
Your casual tone is at odds with what I've seen when teams run Kafka clusters in production. Not a decision I would take so lightly.
The article talks about micro services being split up due to fad as opposed to deliberate, researched reasons. Putting Kafka over the database also makes the data distributed when in most cases, it’s not necessary!
ETL is really database focused and batch focused , Extract, Transform, Load.
Data pipeline, is a combination of streams and batch. For example, you can implement a Data Capture using something like https://debezium.io/
Here's how Netflix solves it https://netflixtechblog.com/dblog-a-generic-change-data-capt...
"Change-Data-Capture (CDC) allows capturing committed changes from a database in real-time and propagating those changes to downstream consumers . CDC is becoming increasingly popular for use cases that require keeping multiple heterogeneous datastores in sync (like MySQL and ElasticSearch) and addresses challenges that exist with traditional techniques like dual-writes and distributed transactions ."
Then it doesn't seem to be solved. Seems like teams operating at a lean scale would have an issue with this, especially teams with lopsided business:engineering ratios
Right, that's the data warehouse method I described, keeping a central database in a reporting system. But now you just have to keep that database schema stable, because folks are going to write reports against it. It's a monolith for reporting, and its schema is going to be affected by changes in upstream systems. It's not like source systems can just totally refactor tables without considering how the data downstream is going to be affected. When Ms. CEO's report breaks, bad things happen.
It can access all those different databases.
You can also make your own connectors that make your services appear as tables, which you can query with SQL in the normal way.
So if the new accounts micro-service doesn't have a database, or the team won't let your analysts access the database behind it, you can always go in through the front-door e.g. the rest/graphql/grpc/thrift/buzzword api it exposes, and treat it as just another table!
Presto is great even for monoliths ;) Rah rah presto.
Presto and SparkSQL are SQL interfaces to many different datasources, including Hive and Impala, but also any SQL database such as Postgres/Redis/etc, and many other types of databases, such as Cassandra and Redis; the SQL tools can query all these different types of databases with a unified SQL interface, and even do joins across them.
The difference between Presto and SparkSQL is that Presto is run on a multi-tenant cluster with automatic resource allocation. SparkSQL jobs tend to have to be allocated with a specific resource allocation ahead of time. This makes Presto is (in my experience) a little more user-friendly. On the other hand, SparkSQL has better support for writing data to different datasources, whereas Presto pretty much only supports collecting results from a client or writing data into Hive.
I know Hive can definitely query other datasources like traditional SQL databases, redis, cassandra, hbase, elasticsearch, etc, etc. I thought Impala had some bit of support for this as well, though I'm less familiar with it.
And SparkSQL can be run on a multi-tenant cluster with automatic resource allocation - Mesos, YARN, or Kubernetes.
Recently the Netflix engineering blog mentioned a tool called DBLog too, but I don’t believe they’ve released it yet.
It's easier to join tables in databases that live on a single server, in a single database platform, than it is to connect to lots of different data sources that live in different servers, possibly even in different locations (like cloud vs on-premises, and hybrids.)
1) what if your data set is larger than you can practically handle in a single DB instance?
2) nothing about a monolith implies you have a single data platform, let alone a single DB instance
I have clients at 10TB-100TB in a single SQL Server. It ain't a fun place to be, but it's doable.
That all depends on the API.
Simple enough. Surely you wouldn't run analytics directly on your prod serving database, and risk a bad query taking down your whole system?
This is not a big deal if your system is designed for it, sophisticated database engines have good control mechanisms for ensuring that heavy reporting or analytical jobs minimally impact concurrent operational workload.
Uhh, yep, that's exactly how a lot of businesses work.
There are defenses at the database layer. For example, in Microsoft SQL Server, we've got Resource Governor which lets you cap how much CPU/memory/etc that any one department or user can get.
Also, you complicate locking down access to the database. A reporting database can typically contain less sensitive info, reporting would not have password (hashes) for user accounts for example.
Reading other comments Brent’s made here, I’m not so sure.
No no, it's not a good idea, but it's just the reality that I usually have to deal with. I wish I could lay down rules like "no analyst ever gets the rights to query the database directly," but all it takes is one analyst to be buddy-buddy with the company owner, and produce a few reports that have really high business value, and then next thing you know, that analyst has sysadmin rights to every database, and anybody who tries to be a barrier to that analyst's success is "slowing the business down."
I work on a monolith that does this, but its usually not even necessary a single db server on modern hardware with proper resource governing can handle quite a bit.
Just because a lot of businesses do it, doesn’t mean it’s a good idea. A lot of businesses don’t do source control of any kind, so should we all do that too?
I'd argue that (given a large enough business) "reporting" ought to be its own own software unit (code, database, etc.) which is responsible for taking in data-flows from other services and storing them into whatever form happens to be best for the needs of report-runners. Rather than wandering auditor-sysadmins, they're mostly customers of another system.
When it comes to a complicated ecosystem of many idiosyncratic services, this article may be handy: "The Log: What every software engineer should know about real-time data's unifying abstraction"
It will lag behind by some extent, roughly equal to the processing delay + double network delay, but can include arbitrary things that are part of your event model.
Though, it's not a silver bullet (distributed constraints are pain in the ass yet), and if system wasn't designed as DDD/CQRS system from the ground up, it would be hard to migrate it, especially because you can't make small steps toward it.
Yes, but data lakes don't fill themselves. Each team has to be responsible for exporting every transaction to the lake, either in real time or delayed, and then the reporting systems have to be able to combine the different sources/formats. If each microservice team expects to be able to change their formats in the data lake willy-nilly, bam, there breaks the report again.
Schemas evolve as the needs of the product change, and that evolution will always outpace the way the business looks at the data.
The best way I've seen to deal with this is to handle this at report query-time (e.g. pick a platform that can effectively handle the necessary transformations at query-time, rather than at load-time).
A quick fix might be to split different customers onto different databases, which doesn't require too many changes to the app. But now you're stuck building tools to pull from different databases to generate reports, even though you have a monolithic code base.
(SRE here, but I work on databases as well all day)
Maybe, but your business analyst already needs to connect to N other databases/data-sources anyway (marketing data, web analytics, salesforce, etc, etc), so you already need the infrastructure to connect to N data sources. N+1 isn't much worse.
It's not necessarily a bad idea though :-/
Maybe I'm wrong.
You need to use something like Spark, Presto or Drill if you want to run queries across different data sources.