Hacker News new | past | comments | ask | show | jobs | submit login

This is what I tell engineers. Microservices aren't a performance strategy. They are a POTENTIAL cost saving strategy against performance. And an engineering coordination strategy.

Theoretically If you have a monolith that can be scaled horizontally there isn't any difference between having 10 replicas of your monolith and having 5 replicas of two microservices with the same codebase. UNLESS you are trying to underscale part of your functionality. You can't underscale part of your app with a monolith. Your pipe has to be big enough for all of it. Generally speaking though if you are talking about 10 replicas of something there's very little money to be saved anywhere.

Even then though the cost savings only start at large scales. You need to have a minimum of 3 replicas for resiliency. If those 3 replicas are too big for your scale then you are just wasting money.

The place where I see any real world benefit for most companies is just engineering coordination. With a single repo for a monolith I can make 1 team own that repo and tell them it's their responsibility to keep it clean. In a shared monolith however 0 people own it because everyone owns it and the repo becomes a disaster faster than you can say "we need enterprise caching".




You can scale those things with libraries also. The very browser that you're using to read this comment is an example of it. FreeType, Hurfbaz, Pango, Cairo, Uniscribe, GDI, zlib this that and a deep deep dependency tree built by people who never have talked to each other directly other than the documentation of their libraries - works perfectly well.

I assure you 98% of the companies have simpler and shallower code base than that of a modern A class browser.

Microservices was a wrong turn in our industry for 98% of the use cases.


> Hurfbaz

This made me chuckle. :) It's HarfBuzz. Yours sounds like a goblin.


I could't recall the correct name but originally, it is in Persian حرف باز and in that باز would be transliterated as Baz.

But yes, you're right.


I wasn't even aware that this came from Persian! Thanks for educating me.


Services, or even microservices, are more of a strategy to allow teams to scale than services or products to scale. I think thats one of the biggest misconceptions for engineers. On the other end you have the monorepo crew, who are doing it for the same reasons.

On your note about resiliency and scale - its always a waste of money until shit hits the fan. Then you really pay for it.


> Services, or even microservices, are more of a strategy to allow teams to scale than services or products to scale.

I've never really understood why you couldn't just break up your monolith into modules. So like if there's a "payments" section, why isn't that API stabilized? I think all the potential pitfalls (coupling, no commitment to compatibility) are there for monoliths and microservices, the difference is in the processes.

For example, microservices export some kind of API over REST/GraphQL/gRPC which they can have SDKs for, they can version them, etc. Why can't you just define interfaces to modules within your monolith? You can generate API docs, you can version interfaces, you can make completely new versions, etc.

I just feel like this would be a huge improvement:

- It's so much more engineering work to build the service handler scaffolding (validation, serialization/deserialization, defining errors)

- You avoid the runtime overhead of serialiation/deserialization and network latency

- You don't need to build SDKs/generate protobufs/generate clients/etc.

- You never have the problem of "is anyone using this service?" because you can use code coverage tools

- Deployment is much, much simpler

- You never have the problem of "we have to support this old--sometimes broken--functionality because this old service we can't modify depends on it". This is a really undersold point: maybe it's true that microservice architectures let engineers build things without regard for other teams, but they can't remove things without regard for other teams, and this dynamic is like a no limit credit card for tech debt. Do you keep that service around as it slowly accretes more and more code it can't delete? Do you fork a new service w/o the legacy code and watch your fleet of microservices grow ever larger?

- You never have the problem of "how do we update the version of Node on 50 microservices?"


> I've never really understood why you couldn't just break up your monolith into modules

You can! We used to do this! Some of us still do this!

It is, however, much more difficult. Not difficult technically, but difficult because it requires discipline. The organisations I’ve worked at that have achieved this always had some form of dictator who could enforce the separation.

Look at the work done by John Lakos (and various books), to see how well this can work. Bloomberg did it, so can you!

Creating a network partition makes your system a distributed system. There are times you need this, but the tradeoff is at least an order of magnitude increase in complexity. These days we have a lot of tooling to help manage this complexity, but it’s still there. The combination of possible failure states is exponential.

Having said all this, the micro service architecture does have the advantage of being an easy way to enforce modularity and does not require the strict discipline required in a monolith. For some companies, this might be the better tradeoff.


> easy way to enforce modularity and does not require the strict discipline required in a monolith

In my experience, microservices require more discipline than monoliths. If you do a microservice architecture without discipline you end up with the "distributed monolith" pattern and now you have the worst of both worlds.


Yes, I completely agree. If your team doesn't have the skills to use a proper architecture within a monolith, letting them loose on a distributed system will make things a lot worse. I've seen that happen multiple times.


> does not require the strict discipline required in a monolith

How so? If your microservices are in a monorepo, one dev can spread joy and disaster across the whole ecosystem. On the other hand, if your monolith is broken into libraries, each one in its own repo, a developer can only influence their part of the larger solution. Arguably, system modularity has little to do with the architecture, and much to do with access controls on the repositories and pipelines.


> Arguably, system modularity has little to do with the architecture, and much to do with access controls on the repositories and pipelines.

Monoliths tend to be in large monolithic repos. Microservices tend to get their own repo. Microservices force an API layer (defined module interface) due to imposing a network boundary. Library boundaries do not, and can generally be subverted.

I agree that modularity has nothing to do with the architecture, intrinsically, simply that people are pushed towards modularity when using microservices.


People make this argument as though it's super easy to access stuff marked "private" in a code base--maybe this is kind of true in Python but it really isn't in JVM languages or Go--and as though it's impossible to write tightly coupled microservices. The problem generally isn't reaching into internal workings or coupling, the problem is that fixing it requires you to consider the dozens of microservices that depend on your old interface that you have no authority to update or facility to even discover. In a monolith you run a coverage tool. In microservices you hope you're doing your trace IDs/logging right, that all services using your service used it in the window you were checking, and you start having a bunch of meetings with the teams that control those services to coordinate the update. That's not what I think of when I think of modularity, and in practice what happens is your team forks a new version and hopes the old one eventually dies or that no one cares how many legacy microservices are running.


> It is, however, much more difficult. Not difficult technically, but difficult because it requires discipline.

Before that, people need to know it's even an option.

Years ago when I showed a dev who had just switched teams how to do this with a feature they were partway through implementing (their original version had it threading through the rest of the codebase) it was like one of those "mind blown" images. He had never even considered this as a possibility before.


i agree that its possible. From what i've seen its probably harder though than just doing services. You are fighting against human nature, organizational incentives, etc. As soon as the discipline of the developers, or vigilance of the dictator lapses, it degenerates.


It is really hard to read this for me. How can anyone think that it is harder to write a proper monolith than implementing a distributed architecture?

If you just follow the SOLID principles, you're already 90% there. If your team doesn't have the knowledge (it's not just "discipline", because every proper developer should know that they will make it harder for everyone including themselves if they don't follow proper architecture) to write structured code, letting them loose on a distributed system will make things much, much worse.


its not really a technical problem. As others have mentioned on various threads, its a people coordination problem. Its hard to socially/organizationally coordinate the efforts of 100s of engineers to a single thing. It just is. If they're split into smaller chunks and put behind relatively stable interfaces, those people can work on their own on their own thing, roughly however they want. That was a major reason behind the original bezos service mandate email. You can argue that results in a harder overall technical solution (distributed is harder than monolith) but it is inarguably to me much easier organizationally.

You can sort of get there if you have a strong central team working on monolith tooling that enforces module seperation, lints illegal coupling, manages sophisticated multi deployments per use, allows team based resource allocation and tracking, has per-module performance regression prevention, etc. They end up having many of the organizational problems of a central DBA team, but its possible. Even then, I am not aware of many(any?) monoliths in this situation that have scaled beyond 500+ engineers that people are actually happy with the situation they've ended up in.


> some form of dictator who could enforce the separation.

Like a lead developer or architect? Gasp!

I wonder if the microservices fad is so that there can be many captains on a ship. Of course, then you need some form of dictator to oversee the higher level architecture and inter-service whatnots... like an admiral.


> You never have the problem of "how do we update the version of Node on 50 microservices?"

And instead you have the problem of "how do we update the version of Node on our 10 million LOC codebase?" Which is, in my experience, an order of magnitude harder.

Ease of upgrading the underlying platform versions of Node, Python, Java, etc is one of the biggest benefits of smaller, independent services.


> And instead you have the problem of "how do we update the version of Node on our 10 million LOC codebase?"

I think if you get to that scale everything is pretty hard. You'll have a hard time convincing me that it's any easier/harder to upgrade Node on 80 125K LOC microservices than a 10M LOC monolith. Both of those things feel like a big bag of barf.


Upgrading the platform also happens at least 10x less frequently, so that math doesn't necessarily work out in your favour though.


It's much easier to make smaller scope changes at higher frequency than it is to make large changes at lower frequency. This is the entire reason the software industry adopted CI/CD


I'm not sure that's measuring what you think. The CI pipeline is an incentive for a good test suite, and with a good test suite the frequency and scope of changes matters a lot less.

CI/CD is also an incentive to keep domain-level scope changes small (scope creep tends to be a problem in software development) in order to minimize disruptions to the pipeline.

These are all somewhat different problems than upgrading the platform you're running, which the test suite itself should cover.


CI/CD is usually a component of DevOps, and any decent DevOps team will have DORA metrics. Time-to-fix, frequency of deploys are both core metrics, and mirror frequency and scope of changes. You want change often, and small.

Yes, change failure rate is also measured, and that's why good test suites matter, but if you think frequency and scope of change don't matter for successful projects, you haven't looked at the data.

That means frequently updating your dependencies against a small code base is much more useful (and painless) than occasional boil-the-ocean updates.

(As always, excepting small-ish teams, because direct communication paths to everybody on the team can mitigate a lot of problems that are painful at scale)


> I've never really understood why you couldn't just break up your monolith into modules.

I think part of it is that many just don't know how.

Web developers deal with HTTP and APIs all the time, they understand this. But I suspect that a lot of people don't really understand (or want to understand) build systems, compilers, etc. deeply. "I just want to press the green button so that it runs".


Counterpoint, most monoliths are built like that; I wonder if they think that pressing a green button is too easy, like, it HAS to be more complicated, we HAVE to be missing something.


How do you square that with the fact that shit usually hits the fan precisely because of this complexity, not in spite of it? That's my observation & experience, anyway.

Added bits of "resiliency" often add brand new, unexplored failure points that are just ticking time bombs waiting to bring the entire system down.


Not adding that resiliency isn't the answer though - it just means known failures will get you. Is that better than the unknown failures because of your mitigation? I cannot answer that.

I can tell you 100% that eventually a disk will fail. I can tell you 100% that eventually the power will go out. I can tell you 100% that even if you have a computer with redundant power supplies each connected to separate grids, eventually both power supplies will fail at the same time - it just will happen a lot less often than if you have a regular computer not on any redundant/backup power. I can tell you that network cables do break from time to time. I can tell you that buildings are vulnerable to earthquakes, fires, floods, tornadoes and other such disasters). I can tell you that software is not perfect and eventually crashes. I can tell you that upgrades are hard if any protocol changed. I can tell you there is a long list of other known disasters that I didn't list, but a little research will discover.

I could look up the odds of the above. In turn this allows calculating the costs of each mitigation against the likely cost of not mitigating it - but this is only statistical you may decide something statistically cannot happen and it does anyway.

What I cannot tell you is how much you should mitigate. There is a cost to each mitigation that need to be compared to the value.


Absolutely yeah, these things are hard enough to test in a controlled environment with a single app (e.g. FoundationDB) but practically impossible to test fully in a microservices architecture. It's so nice to have this complexity managed for you in the storage layer.


Microservices almost always increase the amount of partial failures, but if used properly can reduce the amount of critical failures.

You can certainly misapply the architecture, but you can also apply it well. It's unsurprising that most people make bad choices in a difficult domain.


Fault tolerance doesn't necessarily require microservices (as in separate code bases) though, see Erlang. Or even something like Unison.

But for some reason it seems that few people are working on making our programming languages and frameworks fault tolerant.


Because path dependence is real so we're mostly building on top of a tower of shit. And as computers got faster, it became more reasonable to have huge amounts of overhead. Same reason that docker exists at all.


> How do you square that with the fact that shit usually hits the fan precisely because of this complexity

The theoretical benefit may not be what most teams are going to experience. Usually the fact that microservices are seen as a solution to a problem that could more easily be solved in other much simpler ways, is a pretty good indication that any theoretical benefits are going to be lost through other poor decision making.


Microservices is more about organisation than it is about technology.

And that is why developers have so much trouble getting it right. They can't without having the organisational fundamentals in place. It is simply not possible.

The architectural constraints of microservices will show the organisational weaknesses in a much higher rate because of the pressure it puts on having the organisation be very strict about ownership, communication and autonomy.

The takes a higher level of maturity as an organisation to enable the benefits of microservies, which is also why most organisations shouldn't even try.

Stop all the technical nonsense because it won't solve the root cause of the matter. It's the organisation. Not the technology


Except that most people build microservices in a way that ignores the reality of cloud providers and the fact that they are building (more) distributed systems, and often end up with lower resiliency.


> You can't underscale part of your app with a monolith. Your pipe has to be big enough for all of it.

Not necessarily. I owned one of Amazon’s most traffic-heavy services (3 million TPS). It was a monolith with roughly 20 APIs, all extremely high volume. To keep the monolith simple and be able to scale it up or down independently, the thousands of EC2 instances running it were separated in fleets, which serving a specific use case. Then we could for example scale the website fleet while keeping the payment fleet stable. The only catch is that teams needed to be allowed to call only the APIs representing the use case of that fleet (no calling payment-related APIs on the website fleet). Given the low number of APIs and basic ACL control, it was not a challenge.


> Microservices aren't a performance strategy

Who thinks that more i/o and more frequent cold starts are more performant?

Where I see micoservices as very useful is for elasticity and modularization. It's probably slower than a monolith at your scale but you don't want to fallover when loads start increasing and you need to scale horizontally. Microservices with autoscaling can make that very useful.

But of course, updating services can be a nightmare. It's a game of tradeoffs.


Surely one could split a shared monolith into many internal libraries and modules, facilitating ownership?


Yes, but you are still dealing with situations where other teams are deploying code that you are responsible for. With microservices you can always say, "our microservice wasn't deployed, so it is someone else's problem."

But I think you are pointing out one of the reasons most places don't get benefits from microservices. If their culture doesn't let them do as you describe with a monolith, the overhead of microservices just bring in additional complexities to an already dysfunctional team/organization.


It's crazy to me how opposed to building libraries everyone seems these days. We use a fuckton of libraries, but the smallest division of software we'll build is a service.


its not crazy to me... ive worked at a couple of places that decided to try to use libraries in a web services context. They were all disasters, and the libraries ended up being considered severe tech debt. The practical aspects of updating all the consumers of your library, and how the libraries interact with persistence stores end up being their undoing.


How do you square this with the fact that you likely used dozens of libraries from third parties?


probably 1000s hah. It's the speed of change. third party libraries dont change much, or fast - and you dont really want them to. The software you build internally at a product company with 500+ engineers changes much much faster. And you want it to.


It's difficult to own a library end-to-end. Ownership needs to include deployment and runtime monitoring.


> And an engineering coordination strategy.

i've always felt the org chart defines the microservice architecture. It's a way to keep teams out of each other's hair. When you have the same dev working on more than one service then that's an indication you're headed for trouble.


That's not just a feeling, it's a direct consequence of Conway's law. You feel correctly.


> The place where I see any real world benefit for most companies is just engineering coordination. With a single repo for a monolith I can make 1 team own that repo and tell them it's their responsibility to keep it clean. In a shared monolith however 0 people own it because everyone owns it and the repo becomes a disaster faster than you can say "we need enterprise caching".

Microservices supposed to be so small that they are much smaller than a team's responsibility; so I don't think that is a factor.

The bottleneck in a software engineering company is always leadership and social organisation (think Google - they tried to muscle in on any number of alternative billion dollar businesses and generally failed despite having excellent developers who assembled the required parts rapidly). Microservices won't save you from a a manger who doesn't think code ownership is important. Monoliths won't prevent such a manger from making people responsible for parts of the system.

I've worked on monoliths with multiple teams involved. It seems natural that large internal modules start to develop and each team has their own areas. A little bit like an amoeba readying to split, but never actually going through with it.


> In a shared monolith however 0 people own it because everyone owns it and the repo becomes a disaster faster than you can say "we need enterprise caching".

* I give two teams one repo each and tell each one: "This is your repo, keep it clean."

-or-

* I give two teams one folder each in a repo and tell each one: "This is your folder, keep it clean."

If you've got a repo or folder (or anything else) that no one is responsible for, that's a management problem, and micro services won't solve it.

Repos (and folders) don't really have anything to do with organizing a complex system of software -- they are just containers whose logical organization should follow from the logical organization of the system.

Microservices back you into position where when the code of one team to calls the component of another team, it has to be high-latency, fault-prone, low-granularity, config-heavy. Some stuff falls the way anyway, so no problem in those case, but why burn that limitation in to your software architecture from the start? Just so you don't have to assign teams responsibility for portions of a repo?


i think doing releases, deployments, and rollbacks is trickier in a monorepo. with multiple services/repos, each team can handle their own release cycles, on-call rotation, and only deal with the bits, tests, and failures they wrote.

you lose some CPU/RAM/IO efficiency, but you gain autonomy and resilience, and faster time to resolution.

https://blog.nelhage.com/post/efficiency-vs-resiliency/

e.g. at Grafana we're working through some of these decoupling-to-microservices challenges, because the SREs that deploy to our infra should not need to deal with investigating and rolling back a whole monolith due to some regression introduced by some specific team to a core datasource plugin, or frontend plugin/panel. the pain at scale is very real, and engineering hours are by far more expensive than the small efficiency loss you get by over-provisioning the metal a bit.


I didn't mean to say there are no reasons to have separate repos, just more responding to the post above mine.

But I will note that almost all the benefits listed aren't specific to micro services.

(Also, to me, it's flawed that a dependency can really push to production without coordination. At the least, dependent components should decide when to "take" a new version. There are lot of ways to manage that -- versioned endpoints, e.g., so that dependents can take the new when they are ready. But it's extra complexity that's tricky in practice. e.g., you really ought to semver the endpoints, and keep versions as long as there could be any dependents... but how many is that, and how do you know? From what I've seen, people don't do that and instead dependencies push new versions, ready or not, and just deal with the problems as breakages in production, which is pretty terrible.)


> Your pipe has to be big enough for all of it.

What do you mean by "pipe" here? It's easier to share CPU and network bandwidth across monolith threads than it is across microservice instances. (In fact, that is the entire premise of virtualization - a VM host is basically a way to turn lots of disparate services into a monolith.)


I view micro services as a risk mainly in a Conway sense rather than technology.

Most companies can run on a VPS.


I mostly agree. I'd add though that isolation can be a legitimate reason for a microservice. For example, if you have some non-critical logic that potentially uses a lot of CPU, splitting that out can make sense to be sure it's not competing for CPU with your main service (and bringing it down in the pathological case).

Similarly, if you have any critical endpoints that are read-only, splitting those out from the main service where writes occur can improve the reliability of those endpoints.


> Microservices aren't a performance strategy. They are a POTENTIAL cost saving strategy against performance.

Yeah they were a way to handle databases that couldn't scale horizontally. You could move business logic out of the database/SQL and into Java/Python/TypeScript app servers you could spin up. Now that we have databases like BigQuery and CockroachDB we don't need to do this anymore.


If you plan to scale horizontally with a microservice strategy, you'll in a lot of pain.

As if you have say 100x more customers, will you divide your application in 100x more services?

As others said, microservices scale manpower, not performance. It only comes sometimes as a bonus.


Microservices can help with performance by splitting off performance critical pieces and allowing you to rewrite in a different stack or language (Rust or go instead of Ruby or Python)

But yeah, they also tend to explode complexity


An important point that people seem to forget is that you don't need microservices to invoke performant native code, just a dynamic library and FFI support. The entire Python ecosystem is built around this idea.


This is why Python stole enterprise big data and machine learning from Java. It actually has a higher performance ceiling for certain specific situations because, almost uniquely among garbage collected high-level languages, it can call native code without marshaling or memory pinning.


You don't need it but you can also explode your repo, test, and build complexity when it'd be easier to keep them isolated.

For instance, you might not want to require all develops have a C tool chain installed with certain libraries for a tiny bit of performance optimized code that almost never gets updated.


I don't know, that seems like confusion of runtime and code organization boundaries. Adding a network boundary in production just to serve some code organization purpose during development seems completely unnecessary to me.

For development purposes you could build and distribute binary artifacts just like you would for any other library. Developers who don't touch native code can just fetch the pre-built binaries corresponding to current commit hash (e.g. from CI artifacts).


>Adding a network boundary in production just to serve some code organization purpose during development seems completely unnecessary to me.

That's a pretty common approach. Service oriented architecture isn't necessarily "development" purposes, usually more generic "business" purposes but organization layout and development team structure are factors.

From an operational standpoint, it's valuable to limit the scope and potential impact code deploys have--strong boundaries like the network offers can help there. In that regard, you're serving a "development" purpose of lowering the risk of shipping new features.


One would imagine that by 2024 there would be a solution for every dynamic language package manager for shipping pre-compiled native extensions, _and_ for safely allowing them to be opted out of in favour of native compilation - I don't use dynamic languages though so don't really know whether that has happened.

My overriding memory of trying to do literally anything with Ruby was having Nokogiri fail to install because of some random missing C dependency on basically every machine...


I think it's gotten a little better but is still a mess. I think the biggest issue is dynamic languages tend to support a bunch of different OS and architectures but these all have different tool chains.

Nodejs, as far as I know, ships entire platform specific tool chains as dependencies that are downloaded with the dependency management tool. Python goes a different direction and has wheels which are statically linked binaries built with a special build tool chain then hosted on the public package repo ready to use.

I don't think anything in Ruby has changed but I haven't used it in a few years.


I am good with splitting off certain parts as services once there is a performance problem. But doing microservices from the start is just a ton of complexity for no benefit and most likely you'll get the service boundaries wrong to some degree so you still have to refactor when there is a need for performance.


Often you can get your boundaries close enough. There will always be cases where if you knew then what you know now. However you don't need to be perfect, just close enough. Web apps have been around in all sizes for 30 years - there is a lot of culture knowledge. Do not prematurely pessimize just because we don't know what is perfect.

I'm not saying microservices are the right answer. They are a useful tool that sometimes you need and sometimes you don't. You should have someone on your team with enough experience to get the decisions close enough up front. This isn't anything new.


Github scaled all the way to acquisition with Ruby on Rails, plus some C for a few performance critical modules and it was a monolith.

It doesn’t take a microservice to allow rewriting hot paths in a lower level language. Pretty much everything has a C-based FFI. This is what makes Python useable for ML—the libraries are written in C/C++.


But you don't need micro services for that. You can always split things when useful, the issue with microservices is the idea that you should split things also when not necessary.


Keep it simple :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: