It seems like splitting into separate repos was a rash response to low-value automated tests. If tests don't actually increase confidence in the correctness of the code they're negative value. Maybe they should have deleted or rewritten a bunch of tests instead. Which is what they did in the end anyway.
>> A huge point of frustration was that a single broken test caused tests to fail across all destinations. When we wanted to deploy a change, we had to spend time fixing the broken test even if the changes had nothing to do with the initial change. In response to this problem, it was decided to break out the code for each destination into their own repos
They also introduced tech debt and did not responsibly address it. The result was entirely predictable, and they ended up paying back this debt anyway when they switched back to a monorepo.
>> When pressed for time, engineers would only include the updated versions of these libraries on a single destination’s codebase... Eventually, all of them were using different versions of these shared libraries.
To summarize, it seems like they made some mistakes, microed their services in a knee-jerk attempt to alleviate the symptoms of the mistakes, realized microservices didn't fix their mistakes, finally addressed the mistakes, then wrote a blog post about microservices.
This is a common pattern, when it come to semi-idealistic memes like microservices or agile. I think it's a bad idea to have such hairy, abstract ideas travel too far and wide.
They become a bucket of clichés and abstract terms. Clichéd descriptions of problems you're encountering, like deployments being hard. Clichéd descriptions of the solutions. This let's everyone in on the debate, whether they actually understand anything real to a useful degree or not. It's a lot easier to have opinions about something using agile or microservice standard terms, than using your own words. I've seen heated debates between people who would not be able to articulate any part of the debate without these clichés, they have no idea what they are actually debating.
For a case in point, if this article described architecture A, B & C without mentioning microservices, monoliths and their associated terms... (1) Far fewer people would have read it or had an opinion about it. (2) The people who do, will be the ones that actually had similar experiences and can relate or disagree in their own words/thoughts.
What makes these quasi-ideological in my view is how things are contrasted, generally dichotomously. Agile Vs Waterfall. Microservices Vs Monolithic Architecture. This mentally limits the field of possibilities, of thought.
So sure, it's very possible that architecture style is/was totally besides the point. Dropping the labels of microservices architecture frees you up to (1) think in your own terms and (2) focus on the problems themselves, not the clichéd abstract version of the problem.
Basically, microservice architecture can be great. Agile HR policies can be fine. Just... don't call them that, and don't read past the first few paragraphs.
Interesting perspective. I think that seeking and naming patterns "microservices", "agile", etc. is useful. It provides something like a domain specific language that allows a higher level conversation to take place.
The problem, as your identify, is that once a pattern has been identified people too easily line up behind it and denigrate the "contrasting" pattern. The abstraction becomes opaque. We're used to simplistic narratives of good vs evil, my team vs your team, etc. and our tendency to embrace these narratives leads to dumb pointless conversations driven more be ideology than any desire to find truth.
I agree that it's useful, I even think more people should do it more often. Creating your own language (and learning other people's) is a way of having deep thoughts, not just expressing. Words for patterns (or abstract ions generally) are a quanta of language.
I just think there can be downsides to them. These are theories as well as terms and they become parts of our worldview, even identity. This can engage our selective reasoning, cognitive biases and our "defend the worldview!" mechanisms in general. At some point, it's time for new words.
Glad people seem ok with this. I've expressed similar views before (perhaps overstating things) with fairly negative responses. I think part of it might be language nuance. The term "ideology" carries less baggage in Europe, where "idealist" is what politicians hope to be perceived as while "ideologue" is a common political insult statesside, meaning blinded and fanatic.
The issue is that it is rare and difficult to be able to synthesize all the changes happening in computing and to go deep. So a certain “Pop culture” of computing develops that is superficial and cliche’d. We see this in many serious subjects: pop psychology, pop history, pop science, pop economics, pop nutrition. Some of these are better quality than others if they have a strong academic backing, but even in areas such as economics we can’t get to basic consensus on fundamentals due to the politicization, difficulty of reproducible experiment, and widespread “popular” concepts out there that may be wrong.
Concepts like microservices synthesize a bunch of tradeoffs and patterns that have been worked on for decades. They’re boiled down to an architecture fad, but have applicability in many contexts if you understand them.
Similarly with Agile, it synthesizes a lot of what we know about planning under uncertainty, continuous learning, feedback, flow, etc. But it’s often repackaged into cliche tepid forms by charlatans to sell consulting deals or Scrum black belts.
“computing spread out much, much faster than educating unsophisticated people can happen. In the last 25 years or so, we actually got something like a pop culture, similar to what happened when television came on the scene and some of its inventors thought it would be a way of getting Shakespeare to the masses. But they forgot that you have to be more sophisticated and have more perspective to understand Shakespeare. What television was able to do was to capture people as they were.
So I think the lack of a real computer science today, and the lack of real software engineering today, is partly due to this pop culture.”
I will take issue with one thing though... Shakespeare's plays were for something like a television audience, the mass market. The cheap seats cost about as much as a pint or two of ale. A lot of the audience would have been the illiterate, manual labouring type. They watched the same plays as the classy aristocrats in their box seats. It was a wide audience.
Shakespeare's stories had scandal and swordfighting, to go along with the deeper themes.
A lot of the best stuff is like that. I reckon GRRM a great novelist, personally, with deep contribution to the art. Everyone loves game of thrones. It's a politically driven story with thoughtful bits about gender, and class and about society. But, its not stingy on tits and incest, dragons and duels.
The one caveat was that Shakespeare's audience were all city slickers, and that probably made them all worldlier than the average Englishman who lived in a rural hovel, spoke dialect and rarely left his village.
What is an elitist pursuit is not really Shakespeare, it's watching 450 year old plays.
Thank you for writing up a concise text about the actual problem. While reading the article I consistently felt bothered by the terminology thrown around but couldn't really pin point why.
We really like to think in silos, categorize everything to make them feel familiar and approachable. Which is useful, but sometimes we need to shake them off so we can actually see the problems.
I agree exactly. A concept like "agile" is a great way of organising your own thoughts. Inventing words and concepts is a powerful tool. But... we have to remember that we invented them. They aren't real. That's easier when you invented them yourself.
After a while... it's like the cliché about taxi drivers investing in startups... Sign it's time to get out. When people I know have no idea start talking about the awesomeness of some abstract methodology... I'm out.
As an engineer in a large company this seems very similar to management structure. Every 6-12 months there's a re-organisation to split the business into vertically aligned business units, and then to horizontally aligned capabilities. Then back again. It's always fun to watch.
In reality this process has absolutely nothing to do with the structure of the organisation. It's true purpose is to shuffle out people who are in positions where they're performing poorly, and move in new people. It just provides cover (It's not your fault, it's an organisational change).
This is exactly the same, they couldn't say "You've solved this problem badly, go spend 6 months doing it properly". So instead they say they need a new paradigm to organise how they build their solution. In the process of that they get to spend all the time they need fixing the bad code, but it's not because it's bad code, it's because the paradigm is wrong.
The problem is the same problem with the organisational structure- if you don't realise the real purpose, and buy into the cover you end up not addressing the issue. You end up with a shit manager managing a horizontal and then managing a verticle, then managing a horizontal. You end up with a bad monolithic-service instead of bad micro-services.
it seems like they made some mistakes, microed their services in a knee-jerk attempt to alleviate the symptoms of the mistakes, realized microservices didn't fix their mistakes, finally addressed the mistakes, then wrote a blog post about microservices.
That seems... appropriate?
This is the general problem with the microservices bandwagon: Most of the people touting it have no idea when or why it's appropriate. I once had a newly hired director of engineering, two weeks into a very complicated codebase (which he spent nearly zero time looking at), ask me "Hey there's products here! How about a products microservice?" He was an idiot that didn't last another two months, but not before I (and the rest of the senior eng staff) quit.
I'm fully prepared to upvote more stories with the outline of Microservices were sold as the answer! But they weren't.
The problem is not micro services. The problem is that company's engineering leadership could not lead their way out of a paper bag.
Micro services is an extremely powerful pattern which solves a bazillion critical issues most important ones being:
* separation of concerns
* ability of different teams maintain, develop and reiterate on different subsystems independently from each other
* loose coupling of the subsystems
Do you have auth server that your API accesses using auth.your.internal.name which does not share its code base with the API? You have a micro service.
Do you have a profile service that is responsible for the extra goodies on a profile of a user that the rest of the API business logic does not care about?
You have a micro service. Do you spin up some messaging layer in a cloud that knows a few things about the API but really is only concerned with passing messages around? You have a micro service.
The alternative is that you have a single code base and a single app that starts with ENV_RUNMODE=messaging or ENV_RUNMODE=API or ENV_RUNMODE=website and ENV_RUNMODE=auth ( except in the case of auth it only implements creation/changes of the new entries and a change of passwords but not validation as the validation is done by the code in any ENV_RUNMODE by accessing the auth database directly with read-write privilege and no one ever implemented deletion of the entries from the authentication database. Actually, even that would be an good step - there's no auth database because that would require knowing the mode we are running in and managing multiple sets of credentials so instead it is simply another table in a single database that stores everything )
That is the alternative to micro services. So I would argue that unless Segment has that kind of architecture it does not have a monolith. It implements a sane micro services pattern.
Should the engineering be lead by a blind squirrel that once managed to find a nut, in a winter, three years ago, the sane micro services pattern would be micro serviced even more -- I call it nanoservice pattern aka LeftPad as a service. We aren't seeing it much yet but as Go becomes bigger and bigger player in shops without excellent engineering leadership I expect to see it more and more due to Go giving developers tools to gRPC between processes.
This is a problem you get when senior leadership is excessively confident, don’t know enough about engineering and underestimate the difficulties of software. «Why not just do this to solve your problems?» The response will often be «won’t work/already doing it/will have unintended consequences/impossible to do cheaply».
> To summarize, it seems like they made some mistakes, microed their services in a knee-jerk attempt to alleviate the symptoms of the mistakes, realized microservices didn't fix their mistakes, finally addressed the mistakes, then wrote a blog post about microservices.
You try to remove the critique from microservices, but for me these issues are actually good arguments against microservices. It's hard to do right.
It is weird that they took on some problems so easy. Shared libraries is one point. To get them right for hundreds slightly different services is something I don't even want to think about. The only strategy I can come up with is to maintain them as if they're a 3rd party lib and cannot contain business logic. So you're forced to build solutions around them and not with them.
Then there have been quite a few warnings to not use shared code in microservices.
This was exactly my thought! Despite all the hoohah around the decisions made and whether or not they did things correctly, this idea of "we'll create a bunch of separate services and then use a common shared library for all/most of them" was the start of the end from where I'm sitting... doing this is exactly where the trouble starts with future code changes as the shared library almost becomes a god-like object amongst the services using it: change something in the shared lib and all services using it need to be re-tested. Unless proper versioning takes place, but that, from my experience, seems to rarely be the case. Or need proper service ownership and chain or notification to inform service owners when particular versions are being deprecated or retired... which seems to rarely be the case as well.
Even so, imagine the chaos if frequently engineers/devs need to add code to one lib(the shared one), wait for PR approval, then use that new version in a different lib to implement the actual change that was needed? Thats seems to be introducing a direct delay into getting anything productively done...
>> A huge point of frustration was that a single broken test caused tests to fail across all destinations. When we wanted to deploy a change, we had to spend time fixing the broken test even if the changes had nothing to do with the initial change. In response to this problem, it was decided to break out the code for each destination into their own repos
We went through this painful period. Kept at it devoting a rotating pair to proactively address issues. Eventually it stabilized, but the real solution was to better decouple services and have them perform with more 9s of reliable latency. Microservices are hard when done improperly and there doesn't seem to be a short path to learning how to make them with good boundaries and low coupling.
>>To summarize, it seems like they made some mistakes, microed their services in a knee-jerk attempt to alleviate the symptoms of the mistakes, realized microservices didn't fix their mistakes, finally addressed the mistakes, then wrote a blog post about microservices.
I read the article a few days ago and was struck by what a poor idea it was to take a hundred or so functions that do about the same thing and to break them up into a hundred or so compilation and deployment units.
If that's not a micro-service anti-pattern, I don't know what is!
I'm not sure what they've been left with is a monolith after all. I would say they just have a new service, which is the size of what they should have originally attempted before splitting.
In particular, as to their original problem, the shared library seems to be the main source of pain and that isn't technically solved by a monolith, along with not following the basic rule of services "put together first, split later".
I feel prematurely splitting services like that is bound to have issues unless they have 100 developers for 100 services.
The claim of "1 superstar" is misleading too, this service doesn't include the logic for their API, Admin, Billing, User storage etc etc, it's still a service, one of a few that make up Segment in totality.
Reading about their setup and comparing with some truly large scale services I work with, I'm left with the idea that Segment's service is roughly the size of one microservice on our end.
Perhaps the takeaway is don't go overboard with fragmenting services when they conceptually fulfill the same business role. And regardless of the architecture of the system, there are hard state problems to deal with in association with service availability.
The most telling fact is that it "took milliseconds to complete running the tests for all 140+ of our destinations". I've never worked on a single service whose tests ran that fast, given that the time spent by the overhead of the test framework and any other one-time initialization can take a few seconds just itself. It's great to have tests that run fast, but that's a bit ridiculous.
Some rules of thumb I just came up with:
Number of repos should not exceed number of developers.
Number of tests divided by number of developers should be at least 100.
Number of lines of code divided by number of repos should be at least 5000.
Your tests should not run faster than the time it takes to read this sentence.
A single person should not be able to memorize the entire contents of a single repo, unless that person is Rain Man.
> never worked on a single service whose tests ran that fast
I'd say you've never had good tests.
I have a test-suite for a bunch of my frameworks that dates to the mid 90s, with tests added regularly with new functionality.
It currently takes 4 seconds total for 6 separate frameworks and 1000 individual tests. Which is actually a bit slower than it should be, it used to take around 1-2 seconds, so might have to dig a little to see what's up.
With tests this fast, they become a fixed part of the build-process, so every build runs the tests, and a test failure is essentially treated the same as a compiler error: the project fails to build.
The difference goes beyond quantitative to qualitative, and hard to communicate. Testing becomes much less of a distinct activity but simple an inextricable part of writing code.
So I would posit:
Your tests should not run slower than the time it takes to read this sentence.
Unit tests that don’t read or write to disk and don’t try thousands of repetitions of things should be bleeding fast, but the most useful integration tests that actually help find faults (usually with your assumptions about the associated APIs) often need interaction with your disk or database or external service and tend to take a bit more than a few seconds. I find you need both.
I have tests which verify DNA analysis. The test data vectors are large -- a few hundred MB here, a couple GB there. The hundreds of tests that use these test vectors still run in a few seconds.
If you're using a tape drive or SD cards, sure. But even a 10 year old 5400RPM on an IDE connection should be able to satisfy your tests' requirements in a few seconds or less.
I suspect your tests are just as monolithic as you think microservices shouldn't be. Break them down into smaller pieces. If it's hard to do that, then redesign your software to be more easily testable. Learn when and how to provide static data with abstractions that don't let your software know that the data is static. Or, if you're too busy, then hire a dedicated test engineer. No, not the manual testing kind of engineer. The kind of engineer who actually writes tests all day, has written thousands (or hundreds of thousands) of individual tests during their career. And listen to them about any sort of design decisions.
Sounds like you have tests that need to read (probably cached) data files while the parent poster has tests that need to write to disks (probably in a database transaction). Those are different enough that run times won't ever be comparable.
I have tests that need to read. I have tests that need to write. All data written must also be read and verified. You're right, the data is probably cached.
If you need to access a database in your tests you're probably doing it wrong. Build a mock-up of your database accessor API to provide static data, or build a local database dedicated for testing.
Sure. I'd venture to say that integration tests should be fewer than unit tests, see hexagonal etc. Hopefully those external interfaces are also more stable, so they don't need to be run as often.
I tend to use my integration tests also as characterization tests that verify the simulator/test-double I use for any external systems within my unit tests.
See also: the testing pyramid[1] and "integrated tests are a scam"[2], which is a tad click-bait, but actually quite good.
> I've never worked on a single service whose tests ran that fast, given that the time spent by the overhead of the test framework and any other one-time initialization can take a few seconds just itself. It's great to have tests that run fast, but that's a bit ridiculous.
It's not ridiculous. It's good.
I work on an analysis pipeline with thousands of individual tests across a half dozen software programs. Running all of the tests takes just a few seconds. They run in under a second if I run tests in parallel.
If your tests don't run that fast then I suggest you start making them that fast.
I'd be willing to bet that if you learned (or hired someone with the knowledge of) how to optimize your code, you could get some astounding performance increases in your product.
I felt this article is more about how to use microservices right way vs butchering the idea. It is not right to characterize this as microservices vs monolith service.
Initial version of their attempt went too far by spinning up a service for each destination. This is taking microservices to extreme which caused organizational and maintenance issue once number of destinations increased. I am surprised they did not foresee this.
The final solution is also microservice architecture with a better separation of concerns/functionalities. One service for managing in bound queue of events and other service for interacting with all destinations.
That cure is worse than the disease. Every service works differently and 80% of them are just wrong, and there’s nothing you can do because Tim owns that bit.
I work on that project. Every time some idiot starts talking about 'code coverage' my face turns red. Our code coverage is 1e-10%. Don't talk to me about this 70% bullshit.
It's not just code coverage that matters. It's the code path selection that matters. If you have a ton of branches and you've evaluated all of them once then yeah you sure might have 100% "coverage". But you have 0% path selection coverage since a single invokation of your API might choose true branch on one statement, false branch on another statement, and a second invokation might choose false branch on the first and true on the second.
While the code was 100% tested, the scenarios were not. What happens if you have true/true or false/false? That's not tested.
There's a term for this but I forgot what it is and don't care to go spelunking to find it.
How does the service architecture affect that? Tim could be as protective of a code file as he is of a service. At least with a service you could work around it.
One way is that with different services, its more likely to have both a different language, framework, and paradigms -- that perhaps only Tim is familiar with (that's been my experience). Its definitely got a different repo, perhaps with different permissions.
But if you can explain to the team or the CTO why Tim is doing it wrong and how it is impacting X, Y and Z, then Tim will fix or be sent else where, no?
Tim accuses everyone else of being lazy or stupid.
Tom (real guy) was too busy all the time to do anything other than the 80/20 rule. He was too busy because he didn't share. So of course he was a fixture of the company...
Each developer works in their own little silo and doesn't bother to learn the code outside their silo. Each team member developers their own idiosyncratic style. If they have to work with someone else's code, it's unfamiliar and they make slow progress and get cranky.
Now all the developers are going to the CTO or CEO and undermining the other developers, trying to persuade the CTO that so-and-so's code is shit.
It looks to me that the shared library issue got solved by the monorepo approach. They could have gone the monorepo way and still have microservices.
Managing a lot of repos and keeping them consistent with regards do dependencies is not easy. In reality you do not want everyone to use a different version of a dependency. You might allow deviations but ultimately you wand to minimize them.
Next to a monorepo they would also need a deployment strategy allowing them to deploy multiple services (e.g. every service that was affected by the library change) simultaneously, so that after deploying they can still talk to one another. For a single service this is doable enough (start up, wait for green health, route requests to new service instance), but it increases in complexity when there's >1 service. I'm sure the process can be repeated and automated etc, but it will be more complex. Doing zero-downtime deployments for a single service is hard enough.
> along with not following the basic rule of services "put together first, split later".
Agreed. I treat services like an amoeba. Let your monolith grow until you see the obvious split points. The first one I typically see is authentication, but YMMV.
Notice I also do not say 'microservices'. I don't care about micro as much as functional grouping.
> the basic rule of services "put together first, split later"
Is this rule mentioned or discussed somewhere? A quick google search links to a bunch of dating suggestions about splitting the bill. Searching for the basic rule of services "put together first, split later" reveals nothing useful.
I can see their reasoning though; most of those services are pretty straightforward I think (common data model in -> transform -> specific outbound API data out -> convert result back to common data model). The challenge they had is that a lot of the logic in each of those services could be reused (http, data transformation, probably logging / monitoring / etc), so shared libraries and such.
I've always said if the Linux kernel can be a giant monolith, in C no less, than there's maybe 100 web applications in the world that need to be split into multiple services.
I've worked with microservices a lot. It's a never-ending nightmare. You push data consistency concerns out of the database and between service boundaries.
Fanning out one big service in parallel with a matching scalable DB is by far the most sane way to build things.
Right, but the thing that makes Linux actually useful isn't really the kernel is it? I would say what makes it useful is all the various small, targeted programs (some might call them microservices) it lets you interact with to solve real world problems.
If Linux tried to be an entire computing system all in one code base, (sed, vim, grep, top, etc., etc.) what do you think that would look like code base/maintainability wise? Sounds like a nightmare to me.
One example I'm familiar with that sounds like microservices is the Robot Operating System (ROS). At its heart it's just a framework for pub/sub over IP, it just happens to be targeted towards robotics. A ROS system comprises of 'nodes' for each logical operation e.g. image acquisition -> camera calibration -> analysis -> output.
The system is defined by a graph of these nodes, since you can pipe messages wherever they're needed; all nodes can be many-many. Each node is a self-contained application which communicates to a master node via tcp/ip (like most pub/sub systems a master node is requried to tell nodes where to send messages). So you can do cool stuff like have lots of seprate networked computers all talking to each other (fairly) easily.
It works pretty well and once you've got a particular node stable - e.g. the node that acquires images - you don't need to touch it. If you need to refactor or bugfix, you only edit that code. If you need to test new things, you can just drop them into an existing system because there's separation between the code (e.g. you just tell the system what your new node will publish/subscribe and it'll do the rest).
There is definitely a feeling of duct tape and glue, since you're often using nodes made by lots of different people, some of which are maintainted, others aren't, different naming conventions, etc. However, I think that's just because ROS is designed to be as generic as possible, rather than a side effect of it running like a microservice.
IMO, the main benefit of ROS is the message definitions. Having exactly one definition of "point cloud" that everybody accepts means everyone's code will be compatible. That isn't normally the case in library ecosystems. If ROS was replaced by <robotics_types.hpp> I think we'd get 90% of the benefit.
There's actually no reason you can't architect microservices like this. You can put RabbitMQ or some AMQP service as a comm layer between services. But then you have to architect your system to be event-driven. It's not a bad approach.
I've worked on a system that used Kafka in-between. It makes consistency issues a ton worse because everything is async. At least with HTTP you can do synchronous calls
This is just not true. Google uses almost entirely immediately consistent databases and gRPC internally. Eventual consistency is hardly ever required for REST type calls even at massive scale.
gRPC has no queuing and the connection is held open until the call returns. All of Google's cloud databases are immediately consistent for most operations
My understanding is that Spanner is optimistically immediately consistent, with very clever logic for deciding on whether to bounce the attempted transaction (TrueTime).
But strictly speaking, even inside a single multicore CPU, there is no such thing as immediate consistency. The universe doesn't allow you to update information in two places simultaneously. You can only propagate at the speed of light.
Oh, and the concept of "simultaneous" is suspect too.
Our hardware cousins have striven mightily and mostly successfully for decades to create the illusion that code runs in a Newtonian universe. But it is very much a relativistic one.
ROS is not a particularly good implementation of this concept though. And it tends to encourage extremely fine splitting of jobs, which basically just turns your robotics problem into a distributed robotics problem, making it much harder in the process (as well as less efficient).
> If Linux tried to be an entire computing system all in one code base, (sed, vim, grep, top, etc., etc.) what do you think that would look like code base/maintainability wise?
> Same with BSD; the userspace programs are not part of the kernel, and developed separately by the same entity.
they're part of the same repo and built at the same time than the kernel. Run a big "make universe" here : https://github.com/freebsd/freebsd and see for yourself. That they are different binaries does not matter a lot, it's just a question of user interface. See for instance busybox where all the userspace is in a single binary.
I would argue that the fact that they're in the same repo does not make it a monolith at all. All of google and facebook is in a single repo and built at the same time, but it doesn't make them monoliths.
My understanding is that OpenBSD is constructed this way, and I have also heard that their code is well-organized and easy to follow.
There is the "base" system which is the OS itself and common packages (all the ones you mentioned), then there is the "ports" repo which contains many open-source applications with patches to make them work with OpenBSD.
I think OpenBSD has reaped many of the same advantages described by Segment with their monorepo approach, such as easily being able to add pledge[1] to many ports relatively quickly.
The whole thing is released and deployed together, which is the usual distinction between microservices and not. I don't think anyone's advocating not using modules and folders to structure your code in your repository (certainly I hope not), but a single deployment artifact makes life a lot easier.
But there is a big difference. These small targeted programs are invoked in user land, usually by the user. Microservices get invoked directly by the user when debugging is going on. Otherwise they are expected to automagically talk to each other and depending on the abstraction even discovery each other automatically.
Also I can pipe these tools together from the same terminal session, like
tail -f foo | grep something | awk ...
You don't have that in general with Microservices. Unix tools are Lego, Microservices aren't. They are Domino at best.
Probably one could come up with an abstraction to do Lego with Microservices but we're not there yet.
And they can be a right mess to wrangle because upstream gets into some artistic frenzy, and shouts down anyone that worries about the ensuing breakages as luddites and haters.
And still nobody would argue to pack everything together into one executable because of these issues. As everything in software engineering, it is about finding the right trade-off.
The big problems with microservices come from distributed transactions being difficult and clocks getting out of sync. Multiple services on a single machine don't have that problem.
With web apps the main concern is data consistency between relations. On the OS level you have these same concerns with memory and disk, and there's database-like systems in the kernel and drivers to handle it. Essentially all these utilities are running within the same "database" which is disk and memory management handled by the kernel. Usually microservices have their own databases, which is where consistency hell begins
Unless you mean "their own database tables", not "database servers". But that's just the same as having multiple directories and files in a Unix filesystem.
To be blunt, its news to a lot of people, but it also isn't wrong. Microservices really shouldn't share a database, and if they do then they aren't "microservices".
(a) a prototype
(b) a set of applications that share a database.
You would have to have an oddly disconnected schema if modifications to the program don't result in programs accessing parts of the database that other programs are already accessing. If this isn't a problem it means you're using your database as nature intended and letting it provide a language-neutral, shared repository with transactional and consistency guarantees.
so maybe not microservices, but fine nonetheless.
EDIT: two more comments:
- this is exactly what relational databases were designed for. If people can't do this with their micro-services, maybe their choice of database is the issue.
- "micro-service" as the original post suggests, is not synonymous with good. "monolith" is only synonymous with bad because it got run-over by the hype-train. If you have something that works well, be happy. Most people don't.
I think you are exactly right. Microservice architectures are definitely not automatically good, and there is nothing wrong with a well architected "monolith".
If you're going to go microservices, you want service A to use service B's public API (MQ or RPC or whatever), not to quietly depend on the schema B happened to choose. And sharing a database server instance turns overloads into cascading failures, unless the stack is very good at enforcing resource limits on noisy neighbors.
That architecture is commonly known as a distributed monolith. If you put two services on the same DB you can guarantee that someone will be joining onto a table they shouldn't have before the week is out.
If you can afford all your components sharing the same database without creating a big dependency hell, then your problem is _too small_ for microservices.
If your problem is so large that you have to split it up to manage its complexity, start considering microservices (it might still not be the right option for you).
Yes, they do fine as examples. I have worked both on decades old booking systems and banking software. Both were dependency hells. Refactoring was impossible. Everybody was super-careful even with tiny changes because the risk to break something was just too high.
If it could be avoided, these systems were not touched anymore. Instead, other applications where attached to the front and sides.
I don't think so. For example, the Linux kernel is old, but still quite well maintainable.
So I would say: it applies to systems where proper modularization was neglected. In the anecdotical cases I referred to, one major element of this deficiency was a complex database, shared across the whole system.
Nah, just use a database that can scale with your app and run a ton of instances. You'll be hard pressed to find any size app that doesn't fit in RAM these days.
While that's true. It also means you are only one grant away from sharing database tables. Maybe with good discipline you will be ok but all it take is for one dev to take that one shortcut.
Agreed. I've only seen proper user separation happen when legal gets involved. Usually it's an all-access free-for-all, and after all, why shouldn't it be? It's a lot easier to pull data and logs from adjacent services yourself rather than taking a ticket and hoping the team still exists for a response.
If you have different database servers then you are only a "connection string" away from breaking your separation. Using a different login and schema is perfectly fine in a lot of architectures.
> You push data consistency concerns out of the database and between service boundaries.
Sing that from the rooftops. That is exactly my observation as well. All the vanilla "track some resource"-style webapps I've worked on were never designed to cope with a consistency boundary that spans across service boundaries. Turning a monolith into distributed services is hard for that reason - you have to redesign your data access to ensure that consistency boundaries don't span across multiple services. If you don't do that, then you have to learn to cope with eventual consistency; in my experience, most people just don't think that way. I know I have trouble with it. Surely I'm not the only one.
It's worse than that; it's my observation that most microservice architectures just ignore consistency altogether ("we don't need no stinking transactions!") and blindly follow the happy path.
I've never quite understood why people think that taking software modules and separating them by a slow, unreliable network connection with tedious hand-wired REST processing should somehow make an architecture better. I think it's one of those things that gives the illusion of productivity - "I did all this work, and now I have left-pad-as-a-service running! Look at the little green status light on the cool dashboard we spent the last couple months building!"
Programmers get excited about little happily running services, these are "real" to them. Customers couldn't care less, except that it now takes far longer to implement features that cross multiple services - which, if you've decomposed your services zealously enough, is pretty much all of them.
I've heard of even Unicorns throwing consistency out the window. Apparently Netflix has a bunch of "cleanup jobs" that comb the database for various inconsistencies that inevitably show up.
You can't have consistent microservices without distributed transactions. If a service gets called, and inside that call, it calls 3 others, you need to have a roll back mechanism that handles any of them failing in any order.
If you write to the first service and the second two fail, you need to write a second "undo" call to keep consistent.
Worse, this "undo state" needs to be kept transactionally consistent in case it's your service that dies after the first call.
In reality, nobody does this, so they're always one service crash away from the whole system corrupting the hell out of itself. Since the state is distributed, good luck making everything right again.
Microservices are insane. Nobody that knows database concepts well should go near them
Data consistency is one of those things that sounds like it matters but often doesn't. There's not much in Netflix's platform that screams data consistency is an ultra high priority. Any application that deploys active/active multi-region is by definition going to encounter scenarios where data loss is possible. There's just no way around CAP.
I'd venture a guess that most applications have all sorts of race conditions that could cause data corruption. The fact of the matter is that almost nobody notices or even cares.
A little birdie who worked at Grab once told me that they pretty much don't use transactions anywhere. So... maybe there are problem domains where you can get away without transactions, but I'm quite sure that a marketplace that arranges transportation for fee is not one of them. The next time you're standing in a monsoon waiting for the "coming" car that never comes, remember this post :-)
Aside from that, I've found that even in non-mission-critical scenarios ("it's just porn!") it's incredibly convenient to have a limited number of states the system can be in. It makes debugging easier and reduces the number of edge cases ("why is this null??") you have to handle.
> maybe there are problem domains where you can get away without transactions, but I'm quite sure that a marketplace that arranges transportation for fee is not one of them.
I think you'd be surprised/alarmed at how little transactions actually get used in the software world. Not just on small systems where it doesn't matter but I've seen a complete absence of them in big financial ones handling billions of dollars worth transactions (the real world kind) a day. Some senior, highly paid people even defend this practice for performance reasons because they don't realize the performance cost of implicit transactions. And this is just the in process stuff where transactions are totally feasible, it get's even worse when you look at how much is moved around via csv files to FTP and excel sheets attached to emails. I've spent the last 2 weeks being paid to fix data consistency issues that should never have been issues in the first place.
Maybe when we're teaching database theory we shouldn't start at select/join but at begin transaction/commit/rollback?
Having spent a good amount of time fixing bugs related to code that fetches data with a consistency level of READ_UNCOMMITTED for "performance" reasons I can appreciate what transactions give you as an application developer.
However, I would argue that transactions are overkill. What's the worst case scenario if I book a ride for Grab and my request gets corrupted? I'm guessing I'll see an error message and I'll have to re-request my ride.
Re: I've heard of even Unicorns throwing consistency out the window. Apparently Netflix has a bunch of "cleanup jobs" [to fix "bad" transactions]
If your business model is to be cheap with high volume sales, then corrupting say 1 out of 10,000 customer transactions may be worth it. If you give customers a good price and/or they have no viable alternative, you can live with such hiccups, and the shortcuts/sacrifices may even make the total system cheaper. You are like a veterinarian instead of a doctor: you can take shortcuts and bork up an occasional spleen without getting your pants sued off. But most domains are NOT like that.
>It's worse than that; it's my observation that most microservice architectures just ignore consistency altogether ("we don't need no stinking transactions!") and blindly follow the happy path.
If two microservices have to share databases, they shouldn't be microservices.
One microservice should have write access to one database and preferably, all read requests run through that microservice for exactly the reason you mentioned.
>I've never quite understood why people think that taking software modules and separating them by a slow, unreliable network connection with tedious hand-wired REST processing should somehow make an architecture better.
If you're running microservices between regions and communicating with each other outside of the network it is living in, you're probably doing it wrong.
Microservices shouldn't have to incur the cost of going from SF to China and back. If one lives in SF, all should and you can co-locate the entire ecosystem (+1 for "only huge companies with big requirements should do microservices")
>ustomers couldn't care less, except that it now takes far longer to implement features that cross multiple services - which, if you've decomposed your services zealously enough, is pretty much all of them.
Again, that is an example of microservices gone wrong. You'll have the same amount of changes even in a monolith and I'd argue adding new features is safer in microservices (No worries of causing side effects, etc).
I will give you +1 on that anyway because I designed a "microservice" that ended up being 3 microservices because of dumb requirements. It probably could've been a monolith quite happily.
My problem with microservices is the word 'micro'. It should just be services.
Problem domains (along with organizational structures) inherently create natural architectural boundaries... certain bits of data, computation, transactional logic, and programming skill just naturally "clump" together. Microservices ignore this natural order. The main driving architectural principle seems to be "I'm having trouble with my event-driven dynamically-typed metaprogrammed ball-of-mud, so we need more services!".
> Problem domains (along with organizational structures) inherently create natural architectural boundaries
The "natural" order is very often bad for reliability, speed and efficiency. It forces the "critical path" of a requests to jump through a number of different services.
In well built SOA you often find that the problem space is segmented by criticality and failure domains, not by logical function.
It took me longer than it probably should have to realize that the term microservices was made by analogy to the term microkernel. I think that part of my main issue with the "microservices" is that it conflates highly technical semantics with word forms that are a bit more wishy washy in meaning (i.e., a service is something formed more of human perception, not rooted directly in operating system abstractions).
I wonder if microservices will have a similar evolution as microkernels. While some microkernels almost closed the performance gap to monolith kernels, the communication overhead killed them. It turned out that the kernel's complexity could be reduced by moving some functions to the applications ("library OS"). Linux kernel drivers are only accepted when the functionality can't be done in user space. The extreme version is running single purpose, highly specialized unikernels on a hypervisor.
Yes! But it’s worse again. Because some bright spark comes along and says “hey there’s some commonality between these n services. I’ll make a shared library that will remove that duplicated code.
Then before you know it there are a dozen more shared libraries and you have the distributed monolith.
Then either have to Stand up every micro service every time integration test or make changes and hope for the best.
When I was designing a microservices architecture for a former employee, the justification for it was security. We needed separation of the components in the system because we were going to be handling money. It used ZeroMQ for communication, because I wanted something more lightweight and falut-resistant for message passing (in such a sensitive application, we didn't want to trust the network). Although it was of course fun to design, microservices weren't my first choice. It just made sense to us to use it in that particular scenario.
Sometimes, I've seen a lack of regard for data consistency within a monolith.
That is not completely on the developer, either. Pre 4.0 Mongodb, for example, does not do transactions. On the other hand, I've seen some pretty flagrant disregard for it just because there are not atomicity guarantees.
> That is not completely on the developer, either. Pre 4.0 Mongodb, for example, does not do transactions
I'd argue that's on the developer, if he was the one to choose a database that doesn't support transactions, and then didn't implement application-level transactions (which is very hard to do correctly).
Txns are just hard in general but mature RDBMS have spent a huge number of man-years on getting it right. After the time I've spent on Couchbase, Mongo, Elasticsearch, I doubt I'll ever use something non-transactional for anything OLTP or OLTP-adjacent. If Postgres or MySQL can't scale up to handle it, make them...or get a new job. Scale is a PITA.
I've seen it on monoliths too. We run a pretty large app just fine with the "nuclear option". Database transaction isolation level serial. Makes it impossible to have inconsistencies on the database level, and on some RDBMS like PostGres the performance impact is small
I remember using BEA Tuxedo 15 years ago and I can't help wondering if theoretically one could do microservices using it or some OLTP to achieve consistency when needed. Has it progressed over the years? Is there any free alternative to it? Or is it dead tech? When I used it it was hard and too much extra work, but it did the job successfully.
"Need to" and "sane" are among my favourite subjective terms!
(Further below, I'll go into in which contexts I'd agree with your assessment and why. But for now the other side of the coin.)
In the real world, current-day, why do many enterprises and IT departments and SME shops go for µservice designs, even though they're not multimillion-user-scale? Not for Google/Netflix/Facebook scale, not (primarily/openly) for hipness, but they do like among other reasons:
- that µs auto-forces certain level of discipline in areas that would be harder-to-enforce/easier-to-preempt by devs in other approaches --- modularity is auto-enforced, separation of concerns, separation of interfaces and implementations, or what some call (applicably-or-not) "unix philosophy"
- they can evolve the building blocks of systems less disruptively (keep interfaces, change underlyings), swap out parts, rewrites, plug in new features to the system etc
- allows for bring-your-own-language/tech-stack (thx to containers + wire-interop) which for one brings insights over time as to which techs win for which areas, but also attracts & helps retain talent, and again allows evolving the system with ongoing developments rather than letting the monolith degrade into legacy because things out there change faster than it could be rewritten
I'd prefer your approach for intimately small teams though. Should be much more productive. If you sit 3-5 equally talented, same-tech/stack and superbly-proficient-in-it devs in a garage/basement/lab for a few months, they'll probably achieve much more & more productively if they forgoe all the modern µservices / dev-ops byzantine-rabbithole-labyrinths and churn out their packages / modules together in an intimate tight fast-paced co-located self-reinforcing collab flow. No contest!
Just doesn't exist often in the wild, where either remote distributed web-dev teams or dispersed enterprise IT departments needing to "integrate", rule the roost.
(Update/edit: I'm mostly describing current beliefs and hopes "out there", not that they'll magically hold true even for the most inept of teams at-the-end-of-the-day! We all know: people easily can, and many will, 'screw up somewhat' or even fail in any architecture, any language, any methodology..)
In my experience if your developers were going to make choices that lead to tight coupling in a monolith, they’re going to make the same choices in a distributed architecture. Only now you’ve injected a bunch of network faults and latency that wouldn’t have been there otherwise.
In this case it sounds like they started with a microservice architecture, but CI/CD automation necessary for robust testing and auto-scaling was not in place. The problem of queues getting backed up might have been addressed by adding a circuit breaker, but instead they chose to introduce shared libraries (again, without necessary testing and deployment), which resulted in very tight coupling of the so-called microservices.
Do they actually force a discipline? Do people actually find swapping languages easier with RPC/messaging than other ffi tooling? And do they really attract talent?!
You make some amazing claims that I have seen no evidence of, and would love to see it.
In my experience, there's a lot of cargo culting around microservices. The benefits are conferred by having a strong team that pays attention to architecture and good engineering practices.
Regardless of whether you are a monolith or a large zoo of services, it works when the team is rigorous about separation of concerns and carefully testing both the happy path and the failure modes.
Where I've seen monoliths fail, it was developers not being rigorous/conscientious/intentional enough at the module boundaries. With microservices... same thing.
Also, having a solid architectural guideline that is followed across the company in several places (both in infrastructure and application landscapes) makes up the major bulk of insuring stability and usability.
The disadvantage is obviously that creating such's a 'perfect architecture' is hard to do because of different concerns by different parties within the company/organisation.
> The disadvantage is obviously that creating such's a 'perfect architecture' is hard to do because of different concerns by different parties within the company/organisation.
I think you get at two very good points. One is that realistically you will never have enough time to actually get it really right. The other is that once you take real-world tradeoffs into account, you'll have to make compromises that make things messier.
But I'd respond that most organizations I see leave a lot of room for improvement on the table before time/tradeoff limitations really become the limiting factor. I've seen architects unable to resolve arguments, engineers getting distracted by sexy technologies/methodologies (microservices), bad requirements gathering, business team originated feature thrashing, technical decisions with obvious anticipated problems...
> "You make some amazing claims that I have seen no evidence of"
I'm just relaying what I hear from real teams out there, not intending to sell the architecture. So these are the beliefs I find on the ground, how honest and how based-in-reality they are are harder to tell and only slowly over time at any one individual team.
A lot of this is indeed about hiring though, I feel, at least as regards the enterprise spheres. Whether you can as a hire really in-effect "bring your own language" or not remains to be seen, but by deciding on µs architecture for in-house you can certainly more credibly make that pitch to applicants, don't you think?
Remember, there are many teams that have suffered for years-to-decades from the shortcomings and pitfalls of (their effectively own interpretation of / approach to) "monoliths" and so they're naturally eagerly "all ears". Maybe they "did it wrong" with monoliths (or waterfall), and maybe they'll again "do it wrong" (as far as outsiders/gurus/pundits/coachsultants assess) with µs (or agile) today or tomorrow. The latter possibility/danger doesn't change the former certainties/realities =)
Pure anecdote so I know it is meaningless but I have rewritten/refactored old services with new code, or even new languages twice with little problem because the interface was well defined. We had hundreds of devs working in other areas and we were all on different release cycles because changes were easy and decoupled. We let any team submit bug reports when we either weren't complying with our interface or we had a bug somewhere.
The only teams I had to spend time on were the ones which were on a common DB before we moved off of it.
I think this is probably true for larger or more distributed corporate environments, but I think a modular monolith is going to be a more productive and flexible architecture for most teams, and should be the default option for most startups (many of whom are doing microservices from day 1 it seems).
1. Is auto-enforced modularity, separation of concerns, etc actually better than enforcing these things through development practices like code review? Why are you paying people a 6 figure salary if they can't write modular software?
2. Is the flexibility you gain from this loose coupling worth the additional costs and overhead you incur? And is it really more flexible than a modular system in the first place? And how does their flexibility differ? With an API boundary breaking changes are often not an option. In a modular codebase they can easily be made in a single commit as requirements change.
3. Is bring-your-own-language actually a good idea for most businesses? Is there a net benefit for most people beyond attracting and retaining talent? What about the ability to move developers across teams and between different business functions? Having many different tech stacks is going to increase the cost of doing this.
I do see the appeal of some of these things, but IMO the pros outweigh the cons for a smaller number of businesses than you've mentioned. And the above is only a small sample of that. Most things are just more difficult with a distributed system. It's going to depend on the problem space of course, but most backend web software could easily be written in a single language in a single codebase, and beyond that modularization via libraries can solve a lot of the same problems as microservices. I'm very skeptical of the idea that microservices are somehow going to improve reliability or development speed unless you have a large team.
These are great observations. For anyone interested in going more in depth on the topic, I highly recommend the book Building Evolutionary Architectures
Except the language argument, don't you get all that by just having modules in your code (assuming a statically typed language since the boundaries are type checked)?
Not really. For example, it's easier to mock a microservice, than a module, for testing purposes. Let's say you have component A and component B, A depends on B (dependency implemented via runtime sync or async call), B is computationally intensive or has certain requirements on resources that make it harder or impossible to test on developer's machine. You may want to test only A: with monolithic architecture you'll have to produce another build of the application, that contains mock of B (or you need something like OSGi for runtime module discovery). When both components are implemented as microservices, you can start a container with mock of B instead of real B.
Running E2E blackbox test is equally simple for all kinds of architectures, especially today, when it's so easy to create a clean test environment with multiple containers even on developer's machine. It may be harder to automate this process for a distributed system, but, frankly speaking, I don't see a big difference between a docker-compose file or a launch script for monolith - I've been writing such tests for distributed systems casually for several years and from my personal experience it's much easier to process the test output and debug the microservices than monolithic applications.
> it's much easier to process the test output and debug the microservices than monolithic applications.
You easier to debug end-to-end tests of a microservice architecture that monolith? That's not my experience. How do you manage to put side by side all the events when they are in dozen of files?
Using only files for logging is the last thing I would do in 2018.
I use Serilog for structured logging. Depending on the log destination, your logs are either stored in an RDMS (I wouldn’t recommend it) or created as JSON with name value pairs that can be sent directly to a JSON data store like ElasticSearch or Mongo where you can do adhoc queries.
I just don't use files for anything (services should be designed with the assumption that container can be destroyed any time, so files are simply not an option here). If you are talking about the logs, there are solutions like Graylog to aggregate and analyze them.
Easy until you have 100,000 of them anyway, in which case it's expensive and slow to run it for every dev. (At that point you have enough devs that microservices 100% make sense, though)
A dependency injection framework where you use flags at the composition root to determine whether the “real” implementation class or the mock class is used based on the environment.
You will end up with something like OSGi. That can be the right choice, but is also a quite 'heavyweight' architecture.
For a certain class of applications and organizational constraints, I also would prefer it. But it requires a much tighter alignment of implementation than microservices (e.g., you can't just release a new version of a component, you always have to release the whole application).
For a certain class of applications and organizational constraints, I also would prefer it. But it requires a much tighter alignment of implementation than microservices (e.g., you can't just release a new version of a component, you always have to release the whole application).
Why is that an issue with modern CI/CD tools? It’s easier to just press a button and have your application go to all of your servers based on a deployment group.
With a monolith, with a statically typed language, refactoring becomes a whole lot easier. You can easily tell which classes are being used, do globally guaranteed safe renames, and when your refactoring breaks something, you know st compile time or with the correct tooling even before you compile.
> It’s easier to just press a button and have your application go to all of your servers based on a deployment group.
It's not so much about the deployment process itself (I agree with you that this can be easily automated), but rather about the deployment granularity. In a large system, your features (provided by either components or by independent microservices) usually have very different SLAs. For example, credit card transactions need to work 24x7, but generating the monthly account statement for these credit cards is not time-critical. Now suppose one of the changes in a less critical component requires a database migration which will take a minute. With separate microservices and databases, you could just pause that microservice. With one application and one database, all teams need to be aware of the highest SLA requirements when doing their respective deployments, and design for it. It is certainly doable, but requires a higher level of alignment between the development teams.
I agree with your remark about refactoring. In addition, when doing a refactoring in a microservice, you always need a migration strategy, because you can't switch all your microservices to the refactored version at once.
With separate microservices and databases, you could just pause that microservice. With one application and one database, all teams need to be aware of the highest SLA requirements when doing their respective deployments, and design for it. It is certainly doable, but requires a higher level of alignment between the development teams.
That’s easily accomplished with a Blue-Green deployment. As far as the database, you’re going to usually have a replication set up anyway. So your data is going to live in multiple databases anyway.
Once you are comfortable that your “blue” environment is good, you can slowly start moving traffic over. I know you can gradually move x% of traffic every y hours with AWS. I am assuming on prem load balancers can do something similar.
If your database is a cluster, then it is still conceptually one database with one schema. You can't migrate one node of your cluster to a new schema version and then move your traffic to it.
If you have real replicas, then still all writes need to go to the same instance (cf. my example of credit card transactions). So I also don't understand how your migration strategy would look like.
blue-green is great for stateless stuff, but I fail to see how to apply it to a datastore.
Do you realize that this is actually an anti-pattern, that adds unnecessary complexity and potential security problems to your app? Test code must be separated from production code - something, that should know every developer.
It’s basically a feature flag. I don’t like feature flags but it is a thing.
But if you are testing an artifact, why isn’t the artifact testing part of your CI process? What you want to do is no more or less an anti pattern than swapping out mock services to test a microservice.
I’m assuming the use of a service discovery tool to determine what gets run. Either way, you could screw it up by it being misconfigured.
First of all it is the test code, no matter whether it's implemented as a feature flag or in any other way. Test code and test data shall not be mixed with the production one for many well-documented and well-known reasons: security, additional points of failure, additional memory requirements, impact on architecture etc.
>But if you are testing an artifact, why isn’t the artifact testing part of your CI process?
It is and it shall be part of the CI process. Commit gets assigned build number in tag, artifact gets the version and build number in it's name and metadata, deployment to CI environment is performed, tests are executed against specific artifact, so every time you deploy to production you have a proof, that the exact binary that is being deployed has been verified in its production configuration.
>I’m assuming the use of a service discovery tool to determine what gets run.
Service discovery is irrelevant to this problem. Substitution of mock can be done with or without it.
If you are testing a single microservice and don’t want to test the dependent microservice - if you are trying to do a unit test and not an integration test, you are going to run against mock services.
If you are testing a monolith you are going to create separate test assemblies/modules that call your subject under test with mock dependencies.
They are both going to be part of your CI process then and either way you aren’t going to publish the artifacts until the tests pass.
Your deployment pipeline either way would be some type of deployment pipeline with some combination of manual and automated approvals with the same artifacts.
The whole discussion about which is easier is moot.
Edit: I just realized why this conversation is going sideways. Your initial assumptions were incorrect.
you may want to test only A: with monolithic architecture you'll have to produce another build of the application, that contains mock of B (or you need something like OSGi for runtime module discovery).
> What exactly are you trying to accomplish?
Good test must verify the contract on the system boundaries: in case of the API, it's verification done by calling the API. We are discussing two options here: integrated application, hosting multiple APIs, and microservice architecture. Verification on the system boundaries means running the app, not running a unit test (unit tests are good, but serve different purpose). Feature flags make it only worse, because testing with them covers only non-production branches of your code.
> Your initial assumptions were incorrect.
With nearly 20 years of engineering and management experience, I know very well how modern testing is done. :)
Verification on the system boundaries means running the app, not running a unit test
What is an app at the system boundaries if not a piece of code with dependencies?
If you have a microservice - FooService that calls BarService. The "system boundary" you are trying to test is FooService using a fake BarService. I'm assuming that you're calling FooService via HTTP using a test runner like Newman and test results.
In a monolithic application you have class FooModule that depends on BarModule that implements IBarModule. In your production application you use create FooModule:
var x = FooModule(new BarModule)
y = x.Baz(5);
In your Unit tests, you create your FooModuleL
var x = FooModule(new FakeBarModule)
actual= x.Baz(5)
Assert.AreEqual(10,actual)
And run your tests with a runner like NUnit.
There is no functional difference.
Of course FooModule can be at whatever level of the stack you are trying to test - even the Controller.
I was doing this with COM twenty years ago. It had the same advantages of modularity and language independence but without the unnecessary headaches of a distributed system.
I take your point, but it saddens me that there aren't better ways of achieving this modularity nowadays.
I was too, of course a distributed system could also be built with DCOM and MTC (later COM+), and the DTC (Distributed Transaction Coordinator) could be used when you needed a transaction across services (or DBs, or MQs). Obviously the DTC was developed in recognition of the fact that distributed transactional service calls were a real requirement - something that current microservice architectures over HTTP REST don't seem to support.
Are you telling me you chose micro services just to enforce coding standards and allow devs to be more comfortable?
The legacy concerns I don’t see being true, as it’s mainly a requirements/documentation problem and you can achieve the same effect with feature toggles.
The key to a large code base is a hierarchical design. People are looking for a technical solution to a design problem. No matter what technology you use if you don't maintain the hierarchical layers your code will turn to mush. Micro services can provide a really strict boundary but if you draw the line in the sand at the wrong place it isn't going to matter.
Teams have a lot to do with microservice use. If you have a small team a monolith can work well. If you have a large team, microservices have some significant advantages. And in some cases, you might have a microservice in Swift when you need high performance, but other less intensive services might be in Ruby or some other language, etc.
Right took for the job should be the goal as opposed to chasing fashion. Microservices are definitely overused, but they do have many legitimate use cases. Your CRUD web application probably doesn’t need microservices, but complex build systems might.
Imagine the complexity of Amazon.com or Netflix were a monolith. But something like Basecamp is probably better as a monolith.
There is also the issue of scale. An ML processor might need more (and different) hardware than a user with system.
It depends but in general I'd disagree. If the small balls of mud have all kinds of implicit dependencies, but you have to find them by searching across codebases (and languages) -- does that sound easier than finding them all in the same codebase and language? Overall, its comparing bad design to bad design. I think the main argument I'd make here is, micro-services doesn't actually solve the big ball of mud problem, it solves a completely different problem.
You can step-through your big ball of mud in a debugger, but you can't do the same with your small balls of mud. Not easily at least. That alone makes a huge difference.
They're both bad. It feels like a reverse Sophie's Choice to have to pick one.
The real friction in the system is always in the boundaries between systems. With microservices it's all boundaries. Instead of a Ball of Mud you have Trees and No Forest. Refactoring is a bloody nightmare. Perf Analysis is a game of finger pointing that you can't defuse.
> 100 web applications in the world that need to be split into multiple services.
Services are not micro services. Most large scale applications can and should be split into multiple services. However, when approaching a new problem you should work within the monolith resisting the service until you absolutely can't any longer. Ideally this will make your services true services, that could capture an entire business unit. When it's all said and done you should be able to sell off the service as a business.
The other use case, which should be obvious, is compliance. If you are thinking about implementing anything that would require PCI or SOX you should do that in a service to shield the rest of the dev org from the complexities. So, any webapp that takes payment and interacts directly a payment processor.
That said, you're correct in that you should not be rolling out a new service to avoid sharding.
Microservices aren't some magic bullet for scaling. If anything, they conform to Conway's Law [1]. I'd agree, though, if a single engineer is singularly responsible for 2+ microservices that are only supporting a single product...you're doing it wrong.
I think that's possibly wrong as a key advantage of micro service over monolith is scaling. We have a 'single' product built as 2 micro services which is basically an external facing orchestration service that calls the compute heavy backend and it allows the external facing service to serve all the incoming traffic out of (on average) 4 containers whilst the backend computes with 32 containers and scales independently - very small changes to the incoming traffic can have large effects on the volume of traffic going to the compute service.
> I've always said if the Linux kernel can be a giant monolith, in C no less, than there's maybe 100 web applications in the world that need to be split into multiple services.
This is apples and oranges. The production profile of operating system kernels and web applications are so dissimilar that the analogy is not useful. It may be true that most web applications don't need to be split into multiple services, but the Linux kernel provides no evidence either way.
Yes, but "monolithic" web applications can be built in the same way. It might not be a 100% accurate usage of the term, but microservice/SOA advocates love to call modular applications monliths anyways, to the point that it's something that most people seem to do.
The kernel can't be categorized fully as monolith: Many kernel subsystems are developed individually, by different teams and merged into the main branch by subsystem.
This is really missing the point. Microservices are not about code organisation - they are about runtime separation.
And even kernels have kernel threads, which are basically local microservices. Anything which needs to scale beyond a single system is more deserving of microservices than a kernel.
"I've worked with microservices a lot. It's a never-ending nightmare. You push data consistency concerns out of the database and between service boundaries."
This exactly. Me too. Data consistency concerns in sufficiently large real world projects can be practically dealt with only 2 ways IMO: transactions or spec changes.
I generally agree that for the vast majority of web applications KiiS holds true and use monolith.
In terms of the common code divergence why not just use private NPM and enforce latest? Have a hard-fast rule that all services must always use the latest version of common.
I've always said if the Linux kernel can be a giant monolith, in C no less, than there's maybe 100 web applications in the world that need to be split into multiple services.
Similarly it’s made with make. If anyone has a project more complex than the Linux kernel or GCC I’ll gladly listen to why they need some exotic build system... never met anyone yet...
I’ve been tracking the comments and my sense is that almost no one here believes the business domain drives the technical solution.
Microservices, when constructed from a well-designed model, provides a level of agility I’ve never seen in 33 years of software development. It also walls off change control between domains.
My take from the Segment article is that they never modeled their business and just put services together using their best judgment on the fly.
That’s the core reason for doing domain driven design. When you have a highly complex system, you should be focused on properly modeling your business. Then test this against UX, reporting, and throughput and build after you’ve identified the proper model.
As for databases, there are complexities. Some microservices can be backed by a key-value store at a significantly lower cost, but some high-throughput services require a 12-cylinder relational database engine. The data store should match the needs of the service.
One complexity of microservices I’ve seen is when real-time reporting is a requirement. This is the one thing that would make me balk at how I construct a service oriented architecture.
See Eric Evans book and Vaughn Vernon’s follow up.
As a microservice agnostic, i wonder how you can deal elegantly with transactions across services, concurrent access & locks, etc. [Disclaimer: i have not read the article yet]
Reality is that the IT solution of any company consists of multiple separate applications which need to be coordinated into one working solution.
From this reality it's good to design everything as if it's a (micro|macro)service part of a larger landscape of apps.
Reality is also that you can never have transactions for everything across all your systems, so transaction alternatives like compensations are always something to deal with.
Exactly. And there are tons of techniques to do so. For instance, Stripe is a payment service (you would think a payment service is the number one choice for transactions/locks/etc right?) implemented on the good ol unreliable internet. you can look at how they do it (idempotency, etc)
Transactions are a nice and convenient shortcut when they're applicable but they're far from mandatory.
When she's discussing Compensations she mentions that the Transaction (T_i) can't have an input dependency on T_i-1. What are some things I should be thinking about when I have hard, ordered dependencies between microservice tasks? For example, microservice 2 (M2) requires output from M1, so the final ordering would be something like: M1 -> M2 -> M1.
Currently, I'm using a high-level, coordinating service to accomplish these long-running async tasks, with each M just sending messages to the top-level coordinator. I'd like to switch to a better pattern though, as I scale out services.
Nice, thanks for sharing! I had not heard of this pattern before.
The only nit I have on that video is that after a great motivation and summary, their example application at the end (processing game statistics in Halo) didn’t seem to need Sagas at all. Their transactions (both at the game level and at the player level) were fully idempotent and could be implemented in a vanilla queue/log processor without defining any compensating transactions, unless there were additional complexities not mentioned in the talk.
You can't IMO. We have eventual consistency by all writes being commands that can be picked up by services listening to a Kafka queue but this represents LOADS of extra investment and work above building a monolith.
you avoid them. you design your system (a set of microservices) in such a way that you don't need a transaction across service boundaries ( or rather database boundaries )
First of all, I'm 54 and started coding at 15 on punch cards, so I've seen pretty much every paradigm in the last 40 years.
It's certainly a truism that technology can be cyclical, but that's not relevant in this case.
The OP's statement and article "Goodbye Microservices" is anecdotal and incorrect.
I have been on teams developing microservice architectures for about six years and this particular paradigm shift has proven to be a dramatic leap forward in efficiency, especially between the business, technical architecture, and change control management.
When you develop a domain model, the business can ask questions about the model. The architects can answer them by modifying the model. The developers can improve services by adopting model changes in code. This is the fundamental benefit of domain driven design and works fluidly with a microservice architecture.
There's still a pervasive belief in technology circles that software should be developed from a purely technical perspective. This is like saying the plumber, electrician, and drywaller should design a house while they're building it. They certainly have the expertise to build a house, and they may actually succeed for a time, but eventually the homeowner will want to change something and the ad hoc design of the house just won't allow for it. This is why we have architects. They plan for change within the structure of a house. They enable modification and addition.
Software development is no different. The Segment developers have good intentions, but they needed to work with the business to properly model everything, then build it. Granted, it sounds like they're a fast moving and successful business, so there are trade-offs. But once the business "settles", they really should go back to the drawing board, model the business, then build Segment 3.0.
>> 33 years of software development.
> Wait till you get older.
Presumably someone with 33 years of software development experience is around 50 years old. How old do you think someone needs to be before they are qualified to comment on trends in software development practices? 70 years old?
He is correct though. Not only software, but lots of things are cyclical and come back in different forms. And you do have to have a few decades of life experience to be part of that.
I also would totally agree that companies should never just adapt "best practices" because those leads to super complex enterprise systems which are not necessary for most companies.
> I’ve seen the hype swing from micro services to soa and back to micro services a few times already.
SOA and microservices are the same thing; microservices is just a new name coined when the principles were repopularized so that it didn't sound like a crusty old thing.
Let me write a meta technology hype roadmap, so we can place these sorts of articles:
* Old technology is deemed by people too troublesome or restrictive.
* They come up with a new technology that has great long-term disadvantages, but is either easy to get started with short-term, or plays to people's ego about long-term prospects.
* Everyone adopts this new technology and raves about how great it is now that they have just adopted it.
* Some people warn that the technology is not supposed to be mainstream, but only for very specific use cases. They are labeled backwards dinosaurs, and they don't help their case by mentioning how they already tried that technology in the 60s and abandoned it.
* Five years pass, people realize that the new technology wasn't actually great, as it either led to huge problems down the line that nobody could have foreseen (except the people who were yelling about them), or it ended up not being necessary as the company failed to become one of the ten largest in the world.
* The people who used the technology start writing articles about how it's not actually not that great in the long term, and the hype abates.
* Some proponents of the technology post about how "they used it wrong", which is everyone's entire damn point.
* Everyone slowly goes back to the old technology, forgetting the new technology.
* Now that everyone forgot why the new technology was bad, we're free to begin the cycle again.
> Now that everyone forgot why the new technology was bad, we're free to begin the cycle again.
We can very easily break the cycle by training a deep learning TensorFlow brain in the cloud, that will be fed the daily mouse gestures and key presses of all developers in the world. It's an awesome new technology that can solve any problem.
Pretty soon the global brain will start to see patterns emerging, for example when developers post hype phrases on forums with unsubstantiated claims about the potential of some awesome new technology. As soon as a hype event is detected, a strong electric shock is commanded via the device the developer is using, thereby stopping the hype flow and paralyzing the devellllllllllllllllllllllllllll
5 years later, tensor flow brain writes a medium post about why its very creation was a bad idea, then follows up with a quick rebuttal article about how it is just using itself incorrectly.
sounds like an awesome idea! Anyone who wants to do a startup around this I have an awesome idea about how to implement this using blockchain and a serverless tech stack :D
I have been doing some tech advice jobs on the side to see what's going on in the world and it's really scary what I found. Only yesterday I was talking with the cto of a niche social networking company that has a handful of users and probably won't get much more who was telling me the tech they use; Node, Go, Rust, Mongo, Kafka, some graph db I forgot, Redis, Python, React, Graphql, Cassandra, Blockchain (for their voting mechanism...), some document database I had never heard of and a lot more. A massive, brittle, SLOW (!) bag of microservices and technologies tied together where in 'every micro part' they used best practices as dictated by the big winners (Facebook, Google, whatever) in Medium blogs. It was a freak show for a company of 10 engineers. But this is not the first time I encounter it; 3 weeks ago, on the other side of the world, I found a company with about the same 'stack'.
People really drink the koolaid that is written on these sites and it is extremely detrimental to their companies. PostgreSQL with a nice boring Java/.NET layer would blow this stuff out of the water performance wise (for their actual real life usecase), would be far easier to manage, deploy, find people for etc. I mean; using these stacks is good for my wallet as advisor, but I have no clue why people do it when they are not even close to 1/100000th of Facebook.
Been there. OTOH, my last job was at a startup that used a LAMP stack but made enough money to be self-sufficient and not depend on VC money to keep running.
When the legacy systems started to hurt us (because they were written by the founder in a couple of weeks in the most hacky way), we decided against microservices and went to improve the actual code into something more performing and more maintenable,also moving from PHP5 to PHP7.
As much as we all wanted to go microservices and follow the buzz, we were rational enough to see that it didn't make any sense in our case.
Resume driven development. I've worked for a similar small company with barely any users/data but the tech choices were driven by how useful is the tech for their future job prospects and not the current or future needs of an organization.
Maybe if companies gave sufficient raises and promotions, and actually tried to retain talent, then we wouldn't have this culture where people keep having to switch jobs, and therefore always be looking out for what will get them the next gig.
I think engineers would still jump ship just as often even if they were paid more. When you really get down to it, most programming is pretty tedious. What makes it fun, for some engineers, is the opportunity to learn new things, even it means doing so at the detriment of the business.
I had a similar experience last year as an advisor to a large "enterprise" type company. They are a slow-moving company with no competitive pressure & tons of compliance constraints, had several teams of Java engineers, they all know Java really well, they do not know JavaScript well. They had been attempting to migrate all their services to a modern, cool, bells & whistles JS SPA architecture.
Get ready for the surprise twist: It wasn't going well. I was hired as an expert JS consultant to advise them on which JS framework to use.
My advice? Get ready for surprise #2: "don't use javascript [or use it sparingly as needed]."
Look, for most websites renting less than a handful of dedicated boxes would be enough for all purposes -- dev, staging, production. Instead they rack up an astonishing cloud bill from Amazon. It's not hype. It's boring. But it goddamn works. Sometimes I feel I am the last warrior of the YAGNI tribe.
You are not. I am only 38 and I only work professionally as a programmer for 16.5 years but I am already quite conservative in terms of ruthlessly eliminating complexity and removing any tech that can be consolidated in a monolith repo without huge amounts of effort.
Experience helps a lot. When you know that complexity and too much diversity breeds tech debt, you learn to say "No" decisively.
Decisions above my pay grade have us replacing queries via DNS [1] where we have to finish our work in max 1 second, with a nice, HTTPS REST API that returns JSON that is guaranteed to return within 1.7 seconds.
No one above my pay grade seems to see a problem with this. But hey! REST! JSON! HTTPS! Pass the Kool-Aid!
[1] NAPTR records---given a phone number, return name information; RFC-3401 to RFC-3404
I saw this at my last company. Even worse breaking it into microservices allowed teams of 2 or 3 to start building entire realms where only they could commit and used it for internal politics.
I witnessed someone that wanted to leverage their service into a promotion so they started pushing for an architecture where everything flowed through their service.
It was the slowest part of our stack and capped at 10tps.
This cycle blankets every new technology as a misstep that we’ll revert back from. The real cycle is that we continually test new ideas and new approaches, learn their strengths and weakness, and ultimately keep what works and discard what doesn’t. Sure we miss the ball more than we connect. But take this example: We now have native JSON support since Postgres 9.2. So did we abandon the document model or learn more about how and where it might be appropriate? Some advancements are even more sweeping: I don’t think we’ll ever go back to FTP or even CVS from the Git-centric workflows we now have.
Technology, in my experience, never seems to reward too much optimism or too much cynicism.
Maybe in frontenddev/small companies/small largely static websites FTP pervades, but I'd argue that most developers that deal with any real complexity probably use some kind of source control system to 'share' files rather than some kind of FTP program. And I'd be very surprised if relative FTP usage hadn't decreased significantly over the last 3-4 years (But this is all quite anecdotal, and I could be wrong).
Something that does surprise me is that Panic's Coda and Transmit apps still seem quite successful, so maybe my perception is out of whack.
Which is why I feel like after 20 years in the industry a good portion of people start consultancies realizing there's good money to be made on this hype cycle churn.
It's like a heat engine of hype that you can extract useful energy from.
$_max = 1 - (H_c / H_h)
That is, the amount of money you can extract via consulting is proportional to the ratio of "hot" to "cold", representing the minimum and maximum hype for a given technology during the cycle.
With hype always comes a throng of people who aren't capable of doing the necessary analytical thinking to solve their problem and who think they can substitute a magical pattern for designing their application instead of truly understanding their problem domain and building a real solution to that problem.
Does that make it the fault of the technology/pattern? I don't think so. I think it just means that there's no magic bullets in tech and people who don't know what they're doing will always cause problems no matter what models they follow.
What this actually looks like is a case of very imperfect knowledge transfer within the technology industry. What's interesting about the entire craze of microservices and containers and the cloud is that they are not a new technology. This entire architecture -- the so-called lambda architecture -- has been the standard approach to developing high-performance trading systems on Wall St. since the early 90s. The architecture is literally 30 years old and certain technologies (event sourcing, event driven architecture) are even older.
The problem is that the all the best knowledge has clearly not made it out. For example, this design introduces a "Centrifuge" process that redirects requests to destination specific queues... congratulations you've just reinvented a message bus, a technology that goes back to the 80s. There is absolutely nothing new about virtual queues as described here but unfortunately the authors are likely not at all aware of the capabilities of real enterprise messaging systems (even free, open-source ones like Apache Artemis) and certainly not aware of the architecture and technologies and algorithms that underlie them and the (admittedly much more expensive) best-of-breed commercial systems.
(I won't even go into the craziness of 50+ repos. That's just pure cargo cult madness.)
Watching the web/javascript reinvent these 30 year old technologies is a little disheartening but who knows they may come up with something new. (Then again, recently the javascript guys have discovered the enormous value of repeatable builds. Unfortunately the implementations here all pretty much suck.) Still, we ought to perhaps ask ourselves why this situation has come about...
> Old technology is deemed by people too troublesome or restrictive.
"deemed" is the keyword here. Old tech is "deemed" bad, new one is "deemed" good. Without any numbers attached, just by way of hand-waving and propaganda. And it's all "deemed" Computer Science :)
Once we got into a big discussion at c2.com about "proving nested blocks were objectively better than go-to's". Being most agreed nested blocks are usually "better", it seemed like it would be an easy task. Not! Too much depends on human psychology/physiology, which both varies between people, and is poorly understood. We couldn't even agree on a definition of nested blocks, being hybrid structures were presented as case studies.
I personally believe nested blocks produce a visual structure (indenting) that helps one understand the code by its "shape". Go-to's have no known visual equivalent.
Computers don't "care" how you organize software, they just blindly follow commands. Thus, you are writing it for people more than machines, and people differ too much in how they perceive and process code. For the most part, software is NOT about machines.
If you'd ever had to support some old Fortran or Cobol littered with gotos you'd probably sing a different tune. I'd guess you've never seen that sort of mess though. It's gotten pretty rare.
I wish to make it clear I am NOT defending goto's as an actual practice. I'm only saying that objective proof that they are "bad" is lacking. Goto's served as a test case of objective evidence of code design.
Do note that some actual coders have claimed that if you establish goto conventions within a shop, people handle them just fine.
Computer Science... Now there's an oxymoron. At most institutions the subject matter is not really about computers nor is it science, at least not in the physics or chemistry sense.
Reminds me of the opening lines of the SICP lectures[0]:
"I'd like to welcome you to this course on Computer Science. Actually that's a terrible way to start. Computer science is a terrible name for this business. First of all, it's not a science. It might be engineering or it might be art. We'll actually see that computer so-called science actually has a lot in common with magic. We will see that in this course. So it's not a science. It's also not really very much about computers. And it's not about computers in the same sense that physics is not really about particle accelerators. And biology is not really about microscopes and petri dishes. And it's not about computers in the same sense that geometry is not really about using a surveying instruments."
I like to think of it as dynamic math. Calculus is also a form of "math over time", but is more about a single equation to describe that change over time. Computer science is more like "math with discrete logical steps over time". It allows you to answer questions like "which algorithm can sort a given array in the fewest steps"?
It's not an English language problem, terms like informatics[0], computing, computation or <x> engineering are valid and used by some universities.
My degree is simply "MEng Computing". It's recognized[1] by the engineering association, although that's so irrelevant in most IT that I had to look up the organization: the "BCS (BCS - Chartered Institute for IT) and the IET (Institute of Engineering and Technology)."
[0] My job title includes the word "informatician".
I think that their point is that "Computer Science" does not seem very very interested in the provability of their science (the "Science" part) nor the applicability for general purpose computing (the "Computer" part).
> "Computer Science" does not seem very very interested in the provability of their science
...uh, what computer science are you talking about? Formal verification is a huge part of CS, and provability is a tiny part of what makes science science - systematic study through observation and experimentation. Science is a discipline, not in itself a fact to be proved.
Also, what parts of CS do you think are inapplicable to general purpose computing?
Science can be both experimental and formal (see Math). This said, there are certainly aspects of CS that require collecting data, designing experiments, ect.
Yup. Everyone wants to use the latest and greatest, but there is a reason that certain things like SQL have stood the test of time.
I think the same could be said of the design world. There was a time not too long ago when designs actually felt polished and had real shapes, shadows, gradients. When you clicked on a button you actually knew you were clicking on a button. Then iOS 7 came along and everything became white and flat and buttons were replaced with text with no borders. I think we are slowly moving back to where we were ten years ago.
PS How long until people start ditching React for jQuery?
Ha, JQuery. I just had to look at a legacy JQuery code base...and I shudder the thought of ever going back to that paradigm. Sometimes new patterns/frameworks/architectures do really change the game for good.
I still use jQuery regularly. It works well and doesn't get in the way. I'd rather work on a jQuery codebase written by a good engineer than a React codebase written by an average engineer, and there's no amount you could pay me to work on a Javascript codebase written by a bad engineer.
The problem you've identified is that most code, in general, is terrible. The code written by people who chase trends tends to be worse than average.
You literally can't write spaghetti code in React due to one-way data flow. That's what prevents the spaghetti. UI is a function of state and props. JQuery is an imperative DOM altering library that isn't even necessary with modern browsers.
> The code written by people who chase trends tends to be worse than average.
This. Instead of learning a handfull of technologies well they learn a lot of technologies very poorly. If I am hiring I now look at it as a red flag when people have too many frameworks listed.
I have an angular codebase to maintain, on Angular 1.3 - probably when it was peak of the hype cycle. Its a spaghetti mess. Using a different framwork doesn't make you immune to these problems. Especially when people don't seem to understand the new tech especially well.
I agree with what you are saying but you can't compare jQuery and React - they are very different tools.
If something works for you and makes life easier then you should use it. There is no right answer. You just need to be honest with yourself when planning things out - am I using this technology because it's new and shiny or because it is the right tool for the job right now.
You don't really need jQuery anymore, due to the browser APIs being far more mutually compatible and useful than a decade ago. So really it would be more like React vs vanilla JS. There's definitely a place for vanilla, when you just need minor dynamism on an otherwise basic UI.
> You don't really need jQuery anymore, due to the browser APIs being far more mutually compatible and useful than a decade ago
You say that, but from time to time I still discover slight variations in browser behavior or bugs that were opened 8 years ago that would've been avoided if I had just used jQuery. Most modern frameworks will abstract away these differences, but sometimes you'll need to access the DOM directly.
It doesn’t. I was referring to the idea of new technologies eventually circling back to old and saying that I think that is true in the design world too.
The test of time isn't always a good metric. html, css and javascript is a huge hack, yet it has withstood the test of time.
SQL to me is a huge design flaw despite it's ubiquity. On the web bottlenecks happen at IO and algorithmic searches. Databases are essentially the bottlenecks of the web and how do we handle such bottlenecks? SQL; A high level almost functional language that is further away from the metal than a traditional imperative language. A select search over an index is an abstraction that is too high level to be placed over a bottleneck. What algorithm does a select search execute? Why does using select * slow down a query? Different permutations of identical queries causes slow downs or speed ups for no apparent reason in SQL. SQL is a leaky abstraction that has created a whole generation of SQL admins or people who memorize a bunch of SQL hacks rather than understand algorithms.
I strongly disagree with your assessment of SQL. All of the questions that you asked have actual answers, but I read (from what I perceive as your tone) that you think these are all mysteries generated by an unknowable black box. While some of them may be implementation dependent (like "What algorithm does a select search execute?"), others have common and knowable answers.
> Why does using select * slow down a query?
Because the database first has to perform a translation step to do an initial read from its system tables in order to enumerate the rows to be returned as a result of the final query.
> Different permutations of identical queries causes slow downs or speed ups for no apparent reason in SQL.
The key word there is "apparent", and again, just because it's not apparent to you, doesn't mean that it's not knowable and apparent to someone else. I also take exception to the concept of "permutations" of "identical" queries. Because if your query is permuted, it's no longer identical. The way you write your SQL has an impact on how it's evaluated. Just because you don't understand the rules, doesn't make it a mystery.
Those are rhetorical questions. I ask them because these are questions you need to know in order to work well with SQL. I cannot become a master of SQL by only learning SQL. I have to learn the specific implementations, I have to run EXPLAIN... etc...
A good abstraction only requires you to know the abstraction not what lies underneath. What we have with SQL is a leaky abstraction. My argument that a high level leaky abstraction placed over a critical bottleneck in the web is a design mistake.
Lol, of course you need a general understanding of how your database works on the inside. There are a million different ways you can store your data and I would argue that choosing your data storage is the most important and tricky decision we have to make as software engineers.
Back at my first big tech company, I remember reading the best document I have ever read related to software engineering. It was entirely devoted to choosing your database/storage system. The very first paragraph of the document was entirely devoted to engraining in your head that "choosing a database is all about tradeoffs". They even had a picture where it just repeated that sentence over and over to really engrain it in you.
Why? Because every database has different performance characteristics such as consistency, latency, scalability, typing, indexing, data duplication and more. You really need to think about each and every one because choosing the wrong database/not using it correctly usually cause the biggest problems/most work to solve that you will ever have to face.
>Lol, of course you need a general understanding of how your database works on the inside. There are a million different ways you can store your data and I would argue that choosing your data storage is the most important and tricky decision we have to make as software engineers.
You aren't responding to my argument, everything you said is something I already know. So lol to you. You're making a remark and extending the conversation without addressing my main point. I'm saying that the fact that you need "a general understanding of how a database works on the inside" is a design flaw. It's a leaky abstraction.
A C++ for loop has virtually the same performance across all systems/implementations; if I learn C++ I generally don't need to understand implementation details to know about performance metrics. Complexity theory applies here.
For "SELECT * FROM TABLE", I have to understand implementation. This is a highly different design decision from C++. My argument is that this high level language is a bad design choice to be placed over the most critical bottleneck of the web: the database.
The entire reason why we can use slow ass languages like php or python on the web is because the database is 10x slower. Database is the bottleneck. It would be smart to have a language api for the database to be highly optimize-able. The problem with SQL is that it is a high level leaky abstraction so optimizing SQL doesn't involve using complexity theory to write a tighter algorithm. It involves memorizing SQL hacks and gotchas and understanding implementation details. This is why SQL is a bad design choice. Please address this reasoning directly rather than regurgitating common database knowledge.
> A C++ for loop has virtually the same performance across all systems/implementations;
> For "SELECT * FROM TABLE", I have to understand implementation
This is not even remotely an apples-to-apples comparison. One is a fairly simple code construct that executes locally. The other is a call to a remote service.
It doesn't matter if the language you use to write it is XML, JSON, protocol buffers or SQL, any and all calls across an RPC boundary are going to have unknown performance characteristics if you don't understand how the remote service is implemented. If you are the implementer, and you still choose not to understand how it works, that's your choice, not the tool's. Every serious RDBMS comes with a TFM that you can R at any time. And there are quite a few well-known and product-agnostic resources out there, too, such as Use the Index Luke.
Alternatively, feel free to write your own alternative in C++ so that you can understand how it works in detail without having to read any manuals. It was quite a vogue for software vendors to sink a few person-years into such ventures back in the 90s. Some of them were used to build pretty neat products, too. Granted, they've all long since either migrated to a commodity DBMS or disappeared from the market, so perhaps we are due for a new generation to re-learn that lesson the hard way all over again.
>This is not even remotely an apples-to-apples comparison. One is a fairly simple code construct that executes locally. The other is a call to a remote service.
>It doesn't matter if the language you use to write it is XML, JSON, protocol buffers or SQL, any and all calls across an RPC boundary are going to have unknown performance characteristics if you don't understand how the remote service is implemented.
Dude, then put your database on a local machine and execute it locally or do an http RPC call to your server and have the web app run a for loop. Whether it is a remote call or not the code gets executed on a computer regardless. This is not a factor. RPC is a bottleneck but that's a different type of bottleneck that's handled on a different layer. I'm talking about the slowest part of code executing on a computer not Passing an electronic message across the country.
So whether you use XML, JSON, or SQL it matters because that is the topic of my conversation. Not RPC boundaries.
>If you are the implementer, and you still choose not to understand how it works, that's your choice, not the tool's. Every serious RDBMS comes with a TFM that you can R at any time. And there are quite a few well-known and product-agnostic resources out there, too, such as Use the Index Luke.
I choose to understand a SQL implementation it because I have no choice. Like how a front end developer has no choice but to deal with the headache that is CSS or javascript.
Do you try to understand how C++ compiles down into assembler? For virtually every other language out there in existence I almost never ever have to understand the implementation to write an efficient algorithm. SQL DBs are the only technologies that force me to do this on a Regular Basis. Heck they even devoted a keyword called 'EXPLAIN' to let you peer under the hood. Good api's and good abstractions hide implementation details from you. SQL does not fit this definition of a good API.
If that doesn't stand out like a red flag to you, then I don't know what will.
>Alternatively, feel free to write your own alternative in C++ so that you can understand how it works in detail without having to read any manuals. It was quite a vogue for software vendors to sink a few person-years into such ventures back in the 90s. Some of them were used to build pretty neat products, too. Granted, they've all long since either migrated to a commodity DBMS or disappeared from the market, so perhaps we are due for a new generation to re-learn that lesson the hard way all over again.
In the 90s? Have you heard of NOSQL? This was done after the 90s and is still being done right now. There are alternative implementations to database API's that DON'T INVOLVE SQL. The problem isn't about re-learning, the problem is about learning itself. Learn a new paradigm rather than remark to every alternative opinion with a sarcastic suggestion: "Hey you don't like Airplanes well build your own Airplane then... "
> I'm saying that the fact that you need "a general understanding of how a database works on the inside" is a design flaw. It's a leaky abstraction.
And I just said that "choosing a database is all about tradeoffs" which you need to understand (aka: the leaky abstractions).
> A C++ for loop has virtually the same performance across all systems/implementations
> For "SELECT * FROM TABLE", I have to understand implementation.
No you don't, it has the same performance: a for loop. However, by grouping all of your data onto 1 server, for loops are much more costly than the likely orders of magnitude more regular servers you have than a database. Fortunately, your SQL database supports indexes which speed up those queries. Granted, I'm no database expert, but adding the right indexes and making sure your queries utilize them have solved pretty much every scaling problem I have thrown at them.
> It would be smart to have a language api for the database to be highly optimize-able. The problem with SQL is that it is a high level leaky abstraction so optimizing SQL doesn't involve using complexity theory to write a tighter algorithm. It involves memorizing SQL hacks and gotchas and understanding implementation details.
It is optimizable and 90% of those optimizations I have made simply involve adding an index and then running a few explains/tests to make sure you are using them properly.
If you'll only answer me this though, what database would you recommend than? I'm dying to know since you think you know better and google, a company that probably has more scaling problems than anyone else, doubled down on SQL with spanner which from what I have read, requires even more actual fine tuning.
>And I just said that "choosing a database is all about tradeoffs" which you need to understand (aka: the leaky abstractions).
And I'm saying the tradeoff of using a leaky abstraction is entirely the wrong choice. A hammer vs a screwdriver each have tradeoffs but when dealing with a nail, use a hammer, when dealing with a screw use a screw driver. SQL is a hammer to a database screw.
>No you don't, it has the same performance: a for loop. However, by grouping all of your data onto 1 server, for loops are much more costly than the likely orders of magnitude more regular servers you have than a database.
See you don't even know what algorithm most SQL implementations use when doing a SELECT call. It really depends on the index but usually it uses an algorithm similar to binary search off of an index that is basically a binary search tree. It's possible to index by a hash map as well, but you don't know any of this because SQL is such a high level language. All you know is that you add an index and everything magically scales.
>Fortunately, your SQL database supports indexes which speed up those queries. Granted, I'm no database expert, but adding the right indexes and making sure your queries utilize them have solved pretty much every scaling problem I have thrown at them.
Ever deal with big data analytics? A typical SQL DB can't handle the million row multi dimensional group bys. Not even your indexes can save you here.
>It is optimizable and 90% of those optimizations I have made simply involve adding an index and then running a few explains/tests to make sure you are using them properly.
I don't have to run an EXPLAIN on any other language that I have ever used. Literally. There is no other language on the face of this planet where I had to regularly go down into the lower level abstraction to optimize it. When I do it's for a rare off case. For SQL it's a regular thing... and given that SQL exists at the bottleneck of all web development this is not just a minor flaw, but a huge flaw.
>If you'll only answer me this though, what database would you recommend than? I'm dying to know since you think you know better and google, a company that probably has more scaling problems than anyone else, doubled down on SQL with spanner which from what I have read, requires even more actual fine tuning.
I don't know if you're aware of CSS or javascript and the glaring flaws everybody complains about front end web development but it's a good analogy to what you're addressing here. Javascript and CSS are universally known to have some really stupid flaws yet both technologies are ubiquitous. No one can recommend any alternative because none exists. SQL is kind of similar. The database implementations and domain knowledge have been around so long that even alternative NO-SQL technologies have a hard time over taking SQL.
Which brings me full circle back to the front end. WASM is currently an emerging contender with javascript for the front end. Yet despite the fact that WASM has a better design then JS (not made in a week) current benchmarks against googles V8 javascript engine indicate that WASM is slower then JS. This is exactly what's going on with SQL and NO-SQL. Google hiring a crack team of genius engineers to optimize V8 for years turning a potato into a potato with a rocket booster has made the potato faster then a formula one race car (WASM)
These problems are not unique to SQL, they are issues with any datastore. And if you're building your architecture correctly, the datastore is always going to be your bottleneck.
I'm not talking about solving the bottleneck. No api can change that. I'm talking about mitigating the effects of this bottleneck. Namely, SQL is a bad design decision for this area of web dev.
Part of the reason SQL has stood the test of time is the very fact that it allows such a high level of abstraction. The big problem that it solved, compared to much of what existed at the time, was that it allowed you to decouple the physical format of the data from the applications that used it. That made it relatively easy to do two things that were previously very hard: Ask a database to answer questions it wasn't originally designed to answer, and modify a database's physical structure without having to change the code of every application that uses it.
A lot of "easier" technologies - including, arguably, ORM on top of relational databases - make things easier by sacrificing or compromising those very features that allow for such flexibility. Which speaks to the grandparent's point about technologies that make it easy to get started in the short term, at the cost of having major disadvantages in the long term.
> Different permutations of identical queries causes slow downs or speed ups for no apparent reason in SQL.
EXPLAIN does a very good job of explaining why one query is faster than another.
> SQL is a leaky abstraction that has created a whole generation of SQL admins or people who memorize a bunch of SQL hacks rather than understand algorithms.
To misquote Winston Churchill, SQL is the worst data approach except for all others. As far as query languages, I'd like to see more language competition. I imagine it would take a decade or two to be competitive with SQL, which has a big head start. (I'm partial to SMEQL myself.)
> A select search over an index is an abstraction that is too high level to be placed over a bottleneck.
The high-level was the point because, in the original idea, there was a separation of concerns assumed: The dev writes, in SQL, what the DB should do and the DBA decides how the DB does it.
Of course that assumes there is a competent DBA...
High level is a mistake because the slowest part of a web project is the database. It's the bottleneck of the web. The reason why we can use slow languages like python, php or ruby on the web rather than C++ is because the database is 10x slower.
Putting a high level leaky abstraction over the bottleneck of the web is a mistake. A language that is a zero cost abstraction is a better design choice.
You've demonstrated above that using RDBMS and SQL involves making tradeoffs, like everything else. But you probably can't imagine a world without relational databases. I know I can't.
Many of the problems you mention above occur because the database handles stuff for programmers. Sure. you could create a custom solution around your biggest bottlenecks, but do you want to create a custom solution for every query, or do you want the database to do it for you. The generation of SQL admins is a replacement for a much larger group of programmers that would be needed if they weren't here, and more importantly, an army of people to deal with security, reliability, etc. that people using a good RDBMS get to take for granted.
I'm saying there could be another high level language that took the place of SQL. An imperative language with explicit algorithmic declarations. I'm not talking about custom solutions, I'm talking about how the web has evolved away from ideas that could have been better, because SQL to me is clearly not what I imagine to be the best we could possibly do for database queries. Same story with javascript and CSS.
parameters could have default values or default heuristically chosen values if a default value isn't specified explicitly. Joins can be done imperatively as well. This leads to a language that is more clear and optimize-able by hand.
x = binary_search(column_name=id, value=56, show_all_columns=True)
y = dictionary_search(column_name=id, value=56, show_all_columns=True)
z = join(x, y, joinFunc=func(a,b)(a==b))
The example above could be a join. With the search function name itself specifying the index. If no such index is placed over the table it can throw an exception: "No Dictionary Index found on Table"
This is better api design. However, years of optimization and development on SQL implementations makes it so that most no-sql api's will have a hard time catching up to the performance of SQL. It's like googles V8. V8 is a highly optimized implementation of a terrible language (javascript) which is still faster than wasm.
This is still a declarative language, just with the algorithms spelled out explicitly. (or some algorithms, as you didn't specify it for the join).
However, I think specifying the algorithms in the queries is really not a good idea. Your performance characteristics can change over time (or you might now know them at all yet when you start the project). With your solution, if you, e.g. realize later that it makes sense to add a new index, you'd have to rewrite every single query to use that index. With SQL, you simply add the index and are done.
>This is still a declarative language, just with the algorithms spelled out explicitly. (or some algorithms, as you didn't specify it for the join).
Declarative yes, but unlike SQL my example is imperative. An imperative language is easier to optimize then a functional or even a expression based language (SQL) because computers are basically machines that execute imperative assembly instructions. This means the abstractions are less costly and have a better mapping to the underlying machine code.
>However, I think specifying the algorithms in the queries is really not a good idea. Your performance characteristics can change over time (or you might now know them at all yet when you start the project). With your solution, if you, e.g. realize later that it makes sense to add a new index, you'd have to rewrite every single query to use that index. With SQL, you simply add the index and are done.
Because the DB sits over a bottleneck in web development you need to have explicit control over this area of technology. If I need to do minute optimizations then am api should provide full explicit control over every detail. You should have the power to specify algorithms and the language itself should never hide from you what it's doing unless you tell choose to abstract that decision away...
What I mean by "choose to abstract that decision away" is that the language should also offer along with "binary_search" a generic "search" function as well, which can automatically choose the algorithm and index to use... That's right, by changing the nature of the API you can still preserve the high level abstractions while giving the user explicit access to lower level optimizations.
Or you can memorize a bunch of SQL hacks and gotchas and use EXPLAIN to decompile your query. I know of no other language that forces you to decompile an expression on a regular basis just to optimize it. Also unlike any other language I have seen literally postgres provides a language keyword EXPLAIN that allows users to execute this decompilation process as if they already knew SQL has this flaw. If that doesn't stand out like a red flag to you I don't know what will.
One is going to need a way to debug and study ANY complex query for performance and bugs. EXPLAIN is a tool, not a crutch. "Dot-Chained" API's don't solve this by themselves. In bigger shops one typically has skilled DBA's who are good and quick at optimizing SQL anyhow because of their experience with it. An app-side programmer won't (or shouldn't) do this often enough to become a guru with query API optimization. Modern economies rely on specialization.
Ever profile an application? That's what EXPLAIN helps you with. SQL is different from a lot of languages in that it runs through query planner/executor, often based on properties of the actual data which change over time. Not a lot of other programs do this, and certainly not your typical imperative or procedural code. The JVM is one that comes to mind. Do you know of others?
I KNOW what it helps you with. No programming language has a keyword that decompiles expressions to their base components. Also profiling is only done when optimization is needed.
SQL on the otherhand... EXPLAIN is used on a regular basis, it's built in to the programming language and rather then just mark lines of code with execution time deltas it literally functions as a decompiler to deconstruct the query into another imperative language. This is the problem with SQL.
Many application languages have "reflection" API's that can examine internal or low-level code structure. I used database languages such as dBASE (and clones) that are based on more or less on sequential query languages. While I did like more control over the intermediate steps, including the ability to analyze them; the problem is that different people do things too differently. SQL reigns in the "creativity" to a large extent. What works well for an individual may not scale to "team code". Working with a Picasso-coder's code can be a bear.
This is a valid argument. A high level abstraction definitely serves as a restriction over code that gets to "creative" at the expense of obfuscation. Following this line of logic, the argument then truly does become apples to apples.
> * Everyone slowly goes back to the old technology, forgetting the new technology.
This step is just as misguided as cult-y as "Everyone adopts this new technology and raves about how great it is now that they have just adopted it."
In some cases the technology WAS the right idea, just implemented incorrectly or not sufficiently broadly, and the baby ends up getting thrown out with the bathwater.
I think that regardless of whether microservices works for anyone or not, they came about to address a real issue that we still have, but that I’m not sure anyone has fully solved.
I think that microservices are an expression of us trying to get to a solution that enables loose coupling, hard isolation of compute based on categorical functions. We wanted a way to keep Bob from the other team from messing with our components.
I think most organizations really need a mixture of monolithic and microservices. If anyone jumps off the cliff with the attitude that one methodology is right or wrong, they deserve the outcome that they get. A lot of the blogs at the time espoused the benefits without bothering to explain that Microservices were perhaps a crescent wrench and really most of the time we needed a pair of pliers.
The problem is it's not clear when to use what. Some get confused and use it in the wrong place. Here are some questions to ask before you use microservices.
Does the service need an independent and dedicated team to manage its complexities, or is it a "part time" job? Try a Stored Procedure first if its the second.
Is the existing organization structure (command hierarchy) prepared and ready for a dedicated service? (Conway's law) Remember, sharing a service introduces a dependency between all service users. Sharing ain't free.
Do you really have a scalability problem, or have you just not bothered to tune existing processes and queries? Don't scrap a car just because it has a flat tire.
I kind of get what systemizer is saying. People may think of evolution of technologies as cycles but it is never that. A new technology 'Y' is always developed because the incumbent 'X' has some shortcomings. And even after a period of disillusionment when we revert back to 'X', it is not always the same. We synthesize the good points of 'Y' back to 'X'.
Coming to this topic, I see Microservices as a solution to the problem of Continuous Delivery which is necessary in some business models. I can't see those use cases reverting back to Monolith architecture. For such scenarios, the problems associated with Microservices are engineering challenges and not avoidable architecture choices.
I mean, this is the nature of hype, including scientific paradigms, fast fashion, music tastes. But this narrative obscures underlying causes and basically assumes the underlying cause is human love of hype. However, this isn’t a 2d circle as you imply, but a 3d spiral: you do learn over time. You can see which patterns apply to which problems. There’s a tendancy in these narratives to bemoan the individual that represents culture, but the real pain is that you’re watching people learn things you already learned, and there’s this irrational reaction where you think “i learned this already; so should have they.”
But none of this implies that we are losing knowledge, just that the curve of engineers is fatter at the inexperienced level.
As someone maintaining a SPA that uses GraphQL, they're not really comparable. Everyone and their dog has a SPA, while GraphQL has been somewhat restricted by the small number of React users who dared to install the first version of Relay.
GraphQL could wither like Backbone and Angular and nobody would really notice or care. An industry-wide shift away from SPAs would be something else entirely.
Have you ever tried to go back to no longer writing SPAs?
Sure, it's easy to work with at first but then you start to realize that every framework and every language has different ways of doing all the niceties that you now expect. Asset pipelining, layouts, conditional rendering, and template helpers all end up becoming stuff that every language has to individually develop with varying levels of success.
Even with those features baked in, you probably still want to modify the page using JavaScript, anyway, so then you have to re-render parts of the page without the aid of the expansive view system that rendered your page. And, of course, the more JS you put in your app, the more you have some bastardized hybrid of a SPA and a server rendered page.
I’d still pick PHP between the two for usual web apps. There is beauty in the statelessness of PHP and also leverage on Nginx (+ Lua) for more critical and high-speed work.
The issue I have with Node JS is that the entire ecosystem moves way too fast. However, when I need realtime interaction via WebSockets fe. it really feels like a good choice.
The notion that the node.js ecosystem moves way too fast is a time-traveling statement from, like, 2012. When it really was, in a lot of ways, true. But today it really has congregated on a relatively small set of best practices and the tools you see today are mostly the tools you would have seen two years ago. They're also mostly the same tools you would have seen five years ago, and plenty of people use "old" tools completely and perfectly acceptably. You don't have to throw out Grunt if you like it, you don't have to rush onto Parcel if Webpack works for you. Because other people still use them and other people still maintain them.
Node is no less stable, today, than Ruby or PHP, and is only arguably less stable than Python because Python has largely ossified.
SQL database usage is still an ongoing bummer, though.
I just had a project crash that was setup on 2015 (using gulp), i had to update one dependency and that dependency forced me to update the entire package.json to latest versions, so i had to fix the entire build setup and config files.
Same happened with a Rails/React project that was setup last year, i tried to update one package that was marked as vulnerable, but it ended up requiring the same thing. I opted to leave that package.
I've been working with node for a long time, but i feel like is the same level of stability since the start. Thats very different with Ruby or PHP, you can have old setups working just fine, maybe requiring some extra steps to build certain dependencies but overall working.
1) React is frontend JS which can be consumed on the server side--but I think a reasonable person might hang Express on Node and raise eyebrows at React.
2) Can you explain to me how updating a Gemfile or composer.json file is not going to result in a similar dependency cascade? 'Cause, from experience, it certainly will if the project isn't dead. About the only environment I've ever worked in where keeping up on your dependencies on a regular basis isn't required is a Java one--and that's assuming you don't care that much about security patches.
Delayed_job has two dependencies if you're running on Ruby (rather than JRuby): rake and sqlite3. Neither rake nor sqlite3 have any dependencies of their own - in production mode, of course.
On the other hand, Kue has nine direct dependencies, each of which have their own. The full dependency tree of Kue has one hundred and eighty two separate dependencies.
I know this can be the case for any project with dependencies, but the js community is know for introducing breaking changes even on small releases. There's not such thing as security patches on most js packages.
I've been working about the same way with Node for the past 4-5 years, with only minor adjustments, most of which have been QOL net increases. The paradigm hasn't changed much, and it solve's my employer's problems.
I had managed to instill into my brain that Webpack is the JS moving target du jour. I am blown away that it has already been "surpassed" before our team managed to even have a serious review of it. Parcel it is then.
It isn't, though. I expect Parcel will eventually be superceded again by Webpack, and continue to use it for new projects because it's more entrenched.
And, moreover, it basically doesn't matter. Pick what you like, change it later if you care. (You probably won't.)
"But today it really has congregated on a relatively small set of best practices and the tools you see today are mostly the tools you would have seen two years ago."
Has it? I'm pretty sure if I were to ask what those best practices and tools are, I'd get a number of different answers.
I actually see Vue as more filling the gap for Angular. React uses JavaScript as the templating language instead of trying to use HTML attributes to turn HTML into a programming language like both Vue and Angular do. For that reason alone I have zero interest in Vue and learning yet another templating language. I'm pretty sure React will be here for a few more years. User base is still growing in fact.
And Yarn is a drop in replacement for npm. Took me 10 minutes to learn. If you already know npm, there is almost zero learning curve.
I have, actually. This week has made me decide to move away from it. I have been using it, and filed an issue[0] asking them to document the schemas emitted by their entity-tree code; I was then told that I was doing it wrong and that I should use their code-first/synchronization feature and trust in their magic rather than writing my own explicit migrations which are informed by my (tbh, pretty extensive) understanding of PostgreSQL.
The code seems fine. But I don't really trust anybody who is that insistent on owning the changes made to my database schema--I am certain it is well-intentioned but it makes me itch. Although I will say that there's an interesting project[1] that creates entities from a database that I need to examine further and see if it's worth using to get around TypeORM's unfortunate primary design goals.
That's a common pitfall with ORMs and dumb stacks like Spring data: they're made for unexperienced developers who can't be bothered to understand other technology than their favourite programming language. When it comes to databases however, SQL is the standard, is already at a higher level of abstraction than a primitive record-like abstraction build on top of it, and direct SQL access will be required for any meaningful data manipulation at scale, for locking, for schema evolution, BI, etc. anyway.
Objection.js is an ORM for Node.js that aims to stay out of your way and make it as easy as possible to use the full power of SQL and the underlying database engine while keeping magic to a minimum.
- I was worried at first because I progressively trust non-TypeScript projects less and less, but the test suite looks fine and they have official typings so there's that mitigation at least.
- I really don't love the use of static properties everywhere in order to define model schema. Which is probably a little hypocritical, because one of my own projects[0] does the same thing, but IMO decorators are a cleaner way to do it that reads better to a human.
- Require loop detection in model relations is cool. I like that.
- In general it's a little too string-y for my tastes. I think `eager()` should take a model class, for example, rather than a string name. Maybe behind the scenes it uses the model class and pulls the string name out of it? But I think using objects-as-objects is a better way to do things than using strings-as-references.
Overall, though, it seems very low-magic and I did understand most of it from a five minute peek, so I kind of like this. I think it's a little too JavaScript-y (rather than TypeScript-y) for my tastes, but maybe that can be addressed by layering on a bit of an adapter...I'll need to look deeper.
Its not possible to fit everybody's needs. For some people those design goals unfortunate, for some they are fortunate. And trust me, for 90% people they are fortunate and make their development productive and effective.
JavaScript prior to ES6 was "old" and obsolete and hated. Building JS objects pre-classes is pretty terrible (to say nothing of the .bind(this) that is spammed everywhere without the => operator)
JS has improved massively over the last few years, it's very nearly an entirely different language than what it used to be.
There's a reason everyone was desperate to avoid writing JS not all that long ago be it the form of coffeescript, silverlight, flash, or GWT. ES5 JS sucks. Like... a lot.
Interesting. I haven't touched, or even read, any JavaScript since well before ES6 arrived. I suppose I don't really know the language if it's changed so much (I've never even heard of JavaScript classes). I jumped on the JavaScript hater train and left the station.
This argument is less about specific technologies, and more about patterns and practices.
For example: web UI started as declarative (HTML -> PHP), and then transitioned towards more imperative (jQuery, Backbone, Angular), and is now moving back towards declarative (React, Polymer)
I don't really understand calling Perl declarative... I would consider the Perl CGI services I used to work on to be quite a bit more imperative than the Rails applications I've worked on. Similar with PHP, though I can see where peppering the imperative code into the declarative HTML could be considered more declarative.
In theory, maybe. In practice, no. I've been round and round on ADA issues, and there is no clear distinction between "meaning" and "presentation". Clean separation is either a pipe dream, or too complex for most mortals to get right. I should become an ADA lawyer because I can now pop the purists' BS and win case$.
Yeah, div soup is so declarative. All I need to do is look at the HTML to understand how my app behaves...oh wait, no I need to reference a CSS class and iterate over 15 cascading rules and then I need to fetch whatever JS is referencing that div to understand how it's being rendered to the page. So declarative.
I believe good architectures/stacks depend far more on the skill of the architects than the languages involved. Domain fit also matters.
As far as PHP, it's an ugly language from a language-only perspective, but easy to deploy and has lots of existing web-oriented libraries/functions. Think of PHP has a glue language for its libraries and some front-end JavaScript to improve the UI. Python and server-side JavaScript may someday catch up, but still have a lot of ground to make up.
PHP might be not the best designed or nice looking language out there, but you cannot even compare it to JS. It has real classes, doesn't require you to put underscore before private methods and type checking. And it issues a warning if you divide by zero or try to access array index that doesn't exist
"blank" and "empty" are not the same thing in PHP. If you expect them to be, then you read the manual wrong. I generally include a set of common utilities where I define my own "isBlank()" function, which trims before comparing. There are different definitions of "blank" such that one size may not fit all. (Consider tabs and line-feed characters, for example.)
the model in GP is useful, but it doesn't account for the unequal distribution of information. e.g. some people are perceiving the cycle at a different phase and can be convinced to work "harder, not smart" for an interval to shift their position within the cycle.
E.g. "Let's get really good at microservices so we don't need a monolith." IOW there's no consensus about when to dig in and go deeper. And even if there was, some people would leverage that information and try to pull ahead of the others digging in.
Stick to whatever works for you. I'm assuming you mean backend JS, but if you don't, you can use both. PHP can render your page and JS can make it interactive. I can see benefits of being able to re-use models for frontend and backend, though.
It's possible to do serverless without lock-in, in the same way it's possible to do containers without lock-in.
At the moment serverless vendors make that hard, and the frameworks (like serverless.com) are still emerging to make that simple.
For me the problem is most people don't think through the cost/benefits. Amazon don't ever get tired of saying "never pay for idle" in their sales pitches, but quite a lot of applications out there are never idle and can be quite accurately managed in terms of scaling, and therefore you're actually paying a premium for something you don't need.
Hundreds of new languages, technologies, frameworks, etc pop up every year, but the odds of any one of them being durable in the long run is extremely low.
Not OP, but recently a group at my company abandoned MongoDB and went back to a relational database after going through the exact scenario outlined by the OP (actually, the OP's comment so closely aligned with what happened that I actually wonder if he/she is actually a developer with that team at my company ;-)
Interesting. I'm currently working MongoDB into a couple of projects I have going at work (they're not really outward facing except for potentially one generated view).
For these projects it seemed to make a ton of sense given the rest of the stack, the nature of the projects, the deadlines I'm facing, and I've really loved working with it so far.
The simple fact is mongoDB fails to have sufficient durability guarantees for business data when compared to a traditional RDBMS. That is one of the reasons why its so fast in comparison. A power loss event during data writes can and will lead to data loss.
I thankfully don't have to worry about power loss (large company, teams dedicated to critical systems infrastructure in a big way), and a cron job can handle any archival concerns in the projects I'm running here.
It does sound like they've improved much since 2011.
MongoDB 4.x also brings transactions to the table, which further reduces the vulnerability to inconsistent data. Data loss is far less of a problem than inconsistency due to partially completed operations and data damage.
Well, with mainframes, there was a very good reason that the industry went to PCs. The networking infrastructure just wasn't there. Now, fiber connections are becoming ubiquitous and intranets are super fast, mainframe computing makes sense again. (not that it ever went away, but the real world use cases for mainframe computing has really caught up to the original vision.)
Or rather, relational databases and non-relational databases. This precisely played out at my company recently where a group moved from MongoDB back to a relational database.
In my experience, this is driven more about engineers wanting to play with the "new shiny thing" than people wanting to pad their resume, but I'm sure that happens a lot too.
In my experience, it's a bit of both, and they are mutually reinforcing. If you want to always play with shiny new things, you might be more likely to jump around to different places, which a padded resume helps with.
You are describing "design" which encompasses things like modern art, designer clothes and design patterns.
The thing with design is that it isn't a science or formal field of logic. I can use math to determine the shortest distances from point A to point B but I can't prove why a design for product A is definitively better than a design for product B.
With no science we are doomed to iterate over our designs without ever truly knowing which design was the best.
You could apply that pattern to a lot more than just the tech sector. It seems humans are doomed to this cyclical process of 3 steps forward 2 steps back.
Maybe someone needs to create a cloud based serverless SaaS SPA web app developed in F# to help track this stuff and prevent it happening in the future.
Only thing is, for any given real world tension between dual paradigms, it's arbitrary which one you label as old or new. At this point, the oscillation has been going on longer than anyone has been in the field.
Edit: Also, some of those features providing better development ergonomics turned out to cause massive security issues in production systems (eg: https://gist.github.com/peternixey/1978249).
Rails was never really a major leap forward in anything, more so an evolution with convention-over-configuration and a strong community presence that established best practices early. It also hasn't been "cool" for years, it's an entrenched player that "just works". One could argue with all the Rubyists going to Elixir that's part of the cycle, but for many people that's been a net positive.
And then there's those altogether dropping Elixir for Go...
If we believe the cycle model, then the people dropping elixir for go will come back when they're tired of dealing with kubernetes and when they realize that being a dynamic language isn't all that terrible (if done correctly).
I think that Rails using Ruby was a big deal. It was definitely different. Not sure if it was forward, backwards or lateral but it certainly was a big change beyond just "convention of configuration."
Yeah, I think this is actually calling out a major weakness in the original comment's hype cycle model. Sometimes step #5 ("Five years pass...") doesn't happen in that way. Sometimes what happens is that the new technology has been found to be generally useful, though imperfect, as everything is. People may be evolving it in different directions, perhaps incorporating elements of the old technology the new one replaced, but it isn't being fundamentally abandoned. The tough part is that you don't know which version of step #5 will happen when you're surveying the world at step #4.
If it was easy to figure this stuff out, they wouldn't pay us to try :)
After Rails burned through the hype cycle, its entire niche had disappeared. The skinny-jeans hipsters had moved on to single page apps and Node.JS and the "let's try and use Rails to write backend services" crowd had dispersed to a variety of better technologies.
That might be how it seems if you hang around on HN but activity in the Ruby and Rails communities is still increasing. New conferences, new implementations, and thousands of new developers.
Twitter had issues with stability but I would say that they were under MUCH heavier than a 'non-trivial' load. I also think that it remains an open question as to whether the old twitter was just simply not well written at first or if it was really a fundamental issue with Rails. There are many very large websites that are claimed to be written in Rails. For example: github, airbnb, kickstarter, basecamp,...These seem to be scaling fine.
FWIW twitter switched out of rails before some of the huge performance benefits made their way into ruby itself. A rails stack could handle twitter capacity these days, given ruby/rails improvements made since.
That's the point - we'll bounce between multiple paradigms growing frustrated at the problems and forgetting /failing to recognize the frustrations of the past ( and thus future) options. And not just with service paradigms, but with lots of technical areas. Heck, FP is being treated like it's a hot new thing.
So they split everything apart because their tests were failing and they didn't want to spend time fixing them, and they they merged it back together by spending time fixing and improving their tests?
It seems like the problem here was bad testing and micro repos, not microservices.
It's amazing to me how people still "meh" away testing as a secondary concern, and then regret it later. Over and over again.
WRITING software is easy, anyone can do it. CHANGING software is extremely difficult. THAT is why we have tests. Also, if you are smart about it, you can get documentation out of the deal for relatively little additional cost.
My go to example is on-boarding new developers:
New dev: "OK, i'm here! How do I start?"
with tests: Clone the repo, install dependencies, and run the test suite. As you develop new features, be sure to write additional test.
They are up and going in a matter of minutes.
without tests: Clone the repo, install deps, download testing database, achieve homeostasis with your dev environment, learn the entire system, build up the state you require to write your feature, iterate on it by hand over and over again.
>without tests: Clone the repo, install deps, download testing database, achieve homeostasis with your dev environment, learn the entire system, build up the state you require to write your feature, iterate on it by hand over and over again.
My first job out of college was like that. I had been doing professional-ish (I was paid and employed but I basically worked alone with no other engineers around) work for two years but this still didn't raise any flags.
I agree, and their conclusion even features this salient bit:
>However, we weren’t set up to scale. We lacked the proper tooling for testing and deploying the microservices when bulk updates were needed. As a result, our developer productivity quickly declined.
My impression after reading this post was that microservices were symptoms of problems in how their organization wasn't set up to implement them effectively, rather than the actual cause of those problems.
So the initial problem was a single queue? Well, then split the queue, no need to go all crazy splitting all the code.
Switching to 100+ microservices? There is no need to switch to 100+ repos too, runtime services don't need to have one repo per service, just use a modular approach, or even feature flags.
100+ microservices, some of them with much lower load than others? Then consolidate the lower load ones, no need to consolidate "all" of the microservices at once.
Library inconsistencies between services? No, just no, always use the same library version for all services. Automate importing/updating the libraries if you need to.
A single change breaks tests in a way you need to fix unrelated code? WTF, don't you have unit tests to ensure service boundary consistency and API contracts?
Little motivation to clean up failing tests? Yeah... you're doing it wrong.
Only then you figure out to record traffic for the tests? HUGE FACEPALM, that's the FIRST thing you should do when dealing with remote services!
> Library inconsistencies between services? No, just no, always use the same library version for all services. Automate importing/updating the libraries if you need to.
This answer drove me crazy in the article. "We had trouble keeping the libraries up to date and fixing breakages, so our solution was... to update them all and fix the breakages." And per your second point, tests can go both ways - if libX is used by serviceY, write a libX integration test for serviceY instead of / in addition to a serviceY integration test for libX.
> Library inconsistencies between services? No, just no, always use the same library version for all services. Automate importing/updating the libraries if you need to.
If you did this then you'd have to go and update all services whenever you wanted to introduce a breaking change. I don't think what you're suggesting is as easy as it sounds.
Depends on how the library is used. If the change to the library will change how the service behaves externally (to either upstream or downstream services), then you indeed need to update them all simultaneously. However, that has the smell of poorly-defined API contracts (or poor implementations) if that is possible.
Otherwise, just update to the latest library versions whenever you touch a codebase.
100's of problem children sounds like a step too far.
A few services > monolith
monolith > 100's of services.
The big trick with any technology is to apply it properly rather than dogmatically and if you are breaking up your monolith into a 100's(!) of microservices you are clearly not in control of your domain. That's a spaghetti of processes and connections between them rather than a spaghetti of code. Just as bad, just in a different way.
- Harder to reason about, refactor, change/add functionality
- You've (basically) turned a lot of the operations your services need to perform into RPCs, probably killing performance on top of everything else, leading to
- More complex/demanding (or just MOAR) infrastructure requirements
- Higher dev, maintenance and infrastructure costs
- Slower delivery of value to the business and customers
- Potentially crippling opportunity costs
It surprises me how often people don't see this coming. Seriously: keep your systems as simple as you possibly can. Unless you're Netflix, dozens or hundreds of microservices probably isn't as simple as you possibly can.
Probably because when they suggested it they had about 10, growing by 1 per 6 months. Then a few years later they end up with 100, growing by a few a month, and need to rethink.
I though this was an honest and interesting look at a decision which in retrospect was a bad idea. Hopefully it'll stop some other people making similar mistakes (too many repos, too many services, fast changing libraries shared between many services, etc...).
It'd be better if it wasn't framed as having found that the one true way is the monolith, but there are some lessons here for most devs.
That's basically how microservices are operated on orchestrators like Kubernetes—just substitute "container" for "VM", which is a mostly-academic difference from the perspective of your application. Operations tooling—distributed tracing, monitoring, logging...—is essential.
> While our systems would automatically scale in response to increased load, the sudden increase in queue depth would outpace our ability to scale up, resulting in delays for the newest events.
This strikes me as the core of their problem, and every step taken was a way to bandaid this limitation. Would the cost of moving to faster-scaling infrastructure been so high as rearchitecting the entire system?
> When we wanted to deploy a change, we had to spend time fixing the broken test even if the changes had nothing to do with the initial change.
This seems like a separate and even larger problem. Changes are breaking tests for unrelated code areas? Is the code too tightly coupled? Sounds like it. The unit tests are doing exactly what they're designed to do. Hard to feel sympathy for the person who's breaking them and then trying to figure out a way to sidestep them rather than fix the underlying issues.
The job of these services is to transform their internal event format to 140 different output formats. You can imagine that there is a lot of duplication in the functionality that these services need to do. Are you suggesting that they avoid any shared libraries and just rewrite the same code over and over hundreds of times and update them independently?
If making a change to a shared library breaks half the services, should it have been shared in the first place? It still smells odd to me there’s so much interdependence amongst the individual services break at will.
Maybe, but it’s conceivable to me that with 140 endpoints growing organically it would be very hard for an individual engineer to know for sure what should be abstracted or not. I think it’s a fundamentally hard problem even though on the surface it seems like it should be simple. Adding arbitrary new unaffiliated services is exactly the kind of thing that leads to irreducible complexity and a moving target that is very difficult to design for.
Reading this post-mortem was very useful, and I appreciate the segment engineering team sharing it.
It seems like the primary problem causes were flaky/unreliable tests, and difficulty making coordinated changes across many small repositories.
Having worked on similar projects before (and currently), with a small team driving microservices oriented projects, I would probably recommend:
1) single repository to allow easy coordinated changes.
2) a build system that only runs tests that are downstream of the change you made (Bazel is my favorite here, but others exist). This means all services use the HEAD version of libraries, and you find out if a library change broke one of the services you didn't think about. This also allows for faster performance.
3) Emphasis on making tests reliable. Mock out external services, or if you must reach out to dependencies use conditional test execution, like golang's Skip or junit's Assume if you can't verify a working connection.
If you still can't build a reliable service with those choices, then it's time to think about changing the architecture.
I am a strong believer that microservices is over hyped. I usually resist when senior management asks for us to use it (that proves the point of hype).
But by reading the first paragraphs of the article you see that the guys from Segment made a series of grave mistakes on their "microservices architecture", the most important one being the use of a shared library on many services. The goal of microservices is to achieve isolation, and sharing components with specific business rules between them not only defeats the purpose, but results in increased headaches.
Without deep knowledge of the solution, it's hard to judge, but it seems this was never the case for microservices. They needed infrastructure isolation when the first delay issues surfaced, but there wasn't anything driving splitting the code up.
Sam Newman discusses on his book how to find the proper seams to split services (DDD aggregates being the most common answer) and it seems to people are making rather arbitrary decisions.
Micro Services solve for logical smaller teams, speed, deployment isolation & ton of other problems. Every solution comes with a trade off. There is no perfect solution. It is up to us to decide whether we need a monolith or micro service for our need & use case instead of comparing them.
Not only that, but there is a lot of room between a 'monolith' and 'micro services'. How about medium services? You break somethings up and leave other things combined.
Some people call this... microservices! Common advice is that a microservice should align with a bounded context in domain driven design, which can involve a LOT of code.
Many large companies have millions of LOC behind their microservices. Your average startup probably doesn't.
Personally, I don’t like the terms ‘microservices’ or ‘nanoservices.’ What’s the value add in describing the relative size of the service? The _domain_ should drive what becomes its own service. Every service should handle the business logic within a particular domain. It’s definitely a goldilocks problem, though, in that there’s a too-small and too-large, and we’re looking for the just-right fit!
That's exactly it. But when you start doing 'microservices' you get the architecture astronauts who go and see into how many silly little services a monolith can be broken up. The end results are as predictable as the original monolith, both end up as an unmaintainable mess in a couple of years.
I predict the same will happen to the 'superstar', it just isn't old enough yet (and at least it was built with some badly needed domain knowledge).
"2020 prediction: Monolithic applications will be back in style after people discover the drawbacks of distributed monolithic applications."
-Kelsey Hightower on Twitter
Been using JS for years and still loving it. The only thing I would move towards from here is some kind of ML like Elm, Reason, Haskell, etc. Certainly wouldn't go back to Java.
Am I understanding correctly that they had 3 engineers and >140 microservices? Microservices definitely have their own costs and tradeoffs, but 140 services and 3 engineers sounds like just a terrible engineering choice.
I feel that when it comes to making any decision about separating things out, you really do have to consider the size of the team you've got before you commit to it. If you've got three devs and they all work across all of the microservices, then what on earth do they add except a hell of a lot of overhead?
If you had ten teams of seven and they managed two services each... it's easier to see how that architecture could actually help.
Same as if you have a two person team building a web-app and they go for client-server architecture rather than a basic full stack web framework. If you're both working the frontend and backend at the same time, save yourself the extra ops effort.
Agreed. "Micro" is a terribly defined term, and it sounds like this team went absolutely nuts in one direction. (And then in response to their problems, went as far back as possible to a single monolith.) This suggests a bit of a lack of nuance in their decision making process to me.
I'm not interested in figuring out exactly what the right marketing term for it is, but I've had good experiences with teams of 6-10 engineers owning something like 2-5 services with a larger ecosystem of dozens to hundreds of services. Of course, I've been working at very large companies with extremely high traffic for several years now, so my experience is skewed in that direction.
If I had three engineers on my team I'd be unlikely to end up with more than a small handful of services. Half the benefit of splitting up your services has to do with keeping your independent teams actually independent -- if it's just one team then that isn't a problem in the first place.
> If I had three engineers on my team I'd be unlikely to end up with more than a small handful of services.
We have a small team and have a handful of services. We also have a fully automated CI/CD pipeline. It's worked really well. I doubt any would be considered 'micro', but instead they are designed around functional areas like authentication or backend processing.
> I doubt any would be considered 'micro', but instead they are designed around functional areas like authentication or backend processing.
Yeah, who knows what micro means, but that's exactly how I like to split up services. If at some point a service gets too large, split it up. Hundreds of services out the gate is a gross premature optimization. (And like most premature optimizations, ends up costing much more time both in development and in maintenance.)
At the time we did the split, we had <30 destinations and 3-4 engineers working on this system. At the time we decided to merge, we had > 140 destinations with ~8 engineers, of which roughly 3 full-time engineers spending most of their time just keeping the system alive.
How much of the code was shared among these services? It sounds like you essentially had mostly the same code running in 140 different configurations with only some translation logic and glue varying between each. I'm not surprised you found this untenable. This is akin to running a microservice per web page.
I don't think so, the article says there were 3 engineers working just to keep the system up and running. I'm sure there were many more engineers than that working on and with those services.
This might come off as being snide, but I'm genuinely curious: was their solution really just having all the services in one repository? That doesn't seem like a problem with microservices at all but more of a devops problem. To be clear, I'm not arguing for microservices, I'm just trying to understand if this was really a problem with splitting off multiple repos. Maybe I'm just really dense and someone can set me straight.
I've actually had experiences with seemingly this same problem at a previous startup. Once we started spinning off individual repositories for small pieces of business logic stuff started to go downhill as the logistics of communicating and sharing one another's code became more and more complex.
It seems like splitting off small repos for everything is a solution looking for a problem. Some of the most successful software companies out there have monolithic repos, but not monolithic services.
If you have 140 somewhat similar entities that all share common code, then they don't fullfill the very important microservice criterion of being independent. In your case, I would recommend to use a plugin based system. Do it the other way around, have 1 application that contains the common code (previously shared library code) and create 140 plugins. This way you can update the single application, load all plugins, execute the tests, check if everything is fine and deploy the application. Every plugin can live in its own repository and can be versioned separately, but a new version can only be deployed if it works with the latest version of the application.
This coincides with my own experiences in the financial sector. Distributed computing is undoubtedly the way to scale, but the trick is making the distributed nature of the system completely invisible (or as much as possible) to the developers, the applications and the supporting staff.
I have seen this phenomena of thinking that a large system, broken down into tiny parts, is somehow easier to manage time and time again over 30 years of development. In every case, the one thing the central thinkers fail to realize is that complexity is like conservation of energy - it can be transformed, but it cannot be destroyed.
Also, when it comes to large teams I have seen one thing work when it comes to sharing a resource(s) critical to a larger system - shared pain. If the central/reusable code/service breaks everyone's stuff, then everyone forms a team to immediately address the problem before continuing on. The solution is almost never "find a way to let the other teams continue while something important is on fire." It seems like a major motivation for a microservices architecture seeks to avoid the pain - which perhaps is not the best reason to use microservices.
I like the idea of unseen, but indispensable, complexity. For instance, the human brain is probably the most complex thing in the world, but the interface is fairly simple :)
Oh man, this is just the tip of the iceberg with Microservices. There is nothing Micro about them, they are so difficult to deal with that it becomes impossible to actually iterate or build user value and introduces loads of difficult to debug problems.
We have architects here that dictate the design of the system but IMO they have not done the simplest implementation of anything. We have Kafka to provide ways of making each service eventually consistent so we delete something out of our domain service, but it requires absolutely huge amounts of code in various different other services to delete things in each place listening for events. Every feature is split across N different services which means N times more work + N times more difficult to debug + N times more difficult to deploy.
The system has been designed with buzzwords in mind - Go and GRPC have been a disaster in terms of how quickly people have developed software (as has concourse - so many man hours wasted trying to run our own CI infrastructure it's unreal), loads of small services that are individually difficult to deploy and configure (and come with scary defaults like shared secret keys for auth - use a dev JWT on prod for example). The difficulty in dealing with debugging the system - there simply aren't the tools to understand what is going wrong or why - you have to build dashboards yourself and make your application resilient to services not existing.
Never ever try to build Microservices before you know what your customers really want - we've spent the last 6 months building a really buggy CRUD app that doesn't even have C and D fully yet. Love your Monolith.
Ugh. Sounds like a severe case of resume-driven architecture.
In my experience, you won't know whether you need microservices until you're on at least v2.0 of your application. By then, you have a better understanding of what your real problems are.
Resume-driven architecture will benefit me assuming I every want to waste this much time writing a delete documents method (4 people on my team worked on it on and off for a month).
I'm not sure how worthwhile it is writing any more "microservices are dumb" articles - all the people who have spent the last 5 years leaving microservice messes in their wake appear to have moved on to creating "serverless" messes of lambda functions which people like you and me are going to be going around tidying up in about 5 years from now.
As with J2EE EJBs, microservices conflates two things:
- a strong API between components
- network calls
The former is a very good idea that should be implemented widely in most code bases, especially as they mature, using techniques like modules and interfaces.
The latter is incredibly powerful in some cases but comes at a huge cost in system complexity, performance and comprehensibility. It should be used sparingly.
I think that with the rise in popularity of functions as a service (lambda, gcf, azure), we are heading more and more towards nanoservices.
Small services are easier to develop with several teams, in my opinion. Each team knows what to input, and output. They can do whatever in between as long as these two contracts are respected.
But the overlooking of all these moving pieces changing at different paces is tough. And the smaller the services get, the harder it will become.
>I think that with the rise in popularity of functions as a service (lambda, gcf, azure), we are heading more and more towards nanoservices.
There are some pretty big asterisks next to running "nanoservices". Mostly how expensive they actually are to run at large scale and the weird caveats that can happen due to them not always being up.
And I wouldn't advocate for "always-up nanoservices".
The basic answer to both "nanoservices" and microservices is do what you think is right but don't go too far. There are good reasons to make a nanoservice and good reasons not to, same with microservices.
Not that I disagree with microservices easily going awry, the problem here seems to be traced to shared library code. Each microservice should be as standalone as possible. Your contract with that service is the service contract. Not some shared library.
As soon as you have shared library, you now have coordinated deployments. And that is just not fun and will cause problems.
The trick here is that this does mean you will duplicate things in different spots. But that duplication is there for a reason. It is literally done in two places. When you update the service, you have to do it in a backwards compatible way. And then you can follow with updates to the callers. This makes it obvious you will have a split fleet at some point, but it also means you can easily control it.
This is deployment tooling related though - the ops part of devops really.
If your build and test process doesn't actually exercise your deployment pipeline across versions, then it's not testing anything. I don't think shared libraries are a problem at all - they're probably a good idea - the problem is when they're used as an excuse to not worry about testing your upgrade and rollback scenarios.
I'd pan that criticism a bit wider too: it's not just about having a testing process either - it's about making sure your devs are able to easily use it and watch it work as part of their regular cycle.
This is in fact something I'm about to start working on at my new job for a new project - pushing the development of each service down so someone can write `make test` and not only run tests, but see if what they've done can upgrade between the currently deployed version.
Fair. But don't get to where you expect code to go out as a single deployment if it can't. Advanced tooling helps. So does a more transparent codebase.
What did each service own, if they all shared code? Make those ownership lines as crisp as you can. And unless you have x teams, consider not having too many more than x services.
Doesn’t that get excessive, making network calls to do what could be more naturally expressed as a method call?
If you have two different teams collaborating or you expand beyond what a single box can do, create services. But if you can express things reasonably as a single service, why make things more complicated and error prone?
They were already breaking things into separate services. They violated DRY by putting the same or similar code into all of them. Then instead of cleaning up the DRY violations by refactoring into a shared service or refactoring into a shared library as a static asset, they factored it out into a mushy shared library still in lots of flux.
Whether they were wanting to factor into another service with a defined and mostly static API or a common library with a defined and mostly static API, it was a failure to factor common code into an amorphous blob that gates the release of all the other services. Instead, they've de-modularized the code and called that a success.
If you're drawing hard lines between services and having them talk to one another, having one of these models for at least parts of that makes a lot of sense:
* filter system of the flow-based nature
* REST API that does a transformation and returns transformed data
* message broker with producer/consumer model where the consumer of one queue does a transform and puts the data into another queue
* a full actor model
* a full flow-based model
Seems to me the problem is the shared libraries. Yes, without sharing it means you have to repeat a fair amount of code, but in most cases the representation that each service cares about is not necessarily the same, which reduces the value of these shared libraries. It seems that they would have solved a lot of the really critical issues by simply not sharing as much code.
I maintained the shared library for five teams trying to move data around.
The biggest challenge is making the shared library forward and backward compatible with itself for at least a few releases in either direction, because not everyone will redeploy at the exact same moment.
If you can't solve that problem everything gets painful. Doing that right was the second hardest part of that job (meetings were the hardest). The job title (securing the data interchange) came in third place.
Or use versioning and a package manager. You should be able to introduce vNext and then have each service update at its own pace, with a deprecation strategy so that service owners are responsible to get off the old version by such-and-such time. But as others mentioned, part of the problem seems to be too many services given the size of the team.
I'm hardly a microservice apologist, but sharing code or data across services is a major smell. It means these things are related, and should likely be bundled together. Don't just break a service apart because it's de rigueur.
I don’t see any reason you shouldn’t always design a system as domain specific micro services.
Now those micro services shouldn’t always be out of process modules that communicate over HTTP/queues, etc. A microservice can just as easily be separately compiled modules within a monolithic solution with different namespaces, public versus private classes, and communicate with each other in process.
Then if you see that you need to share a “service” across teams/projects, or a module needs to be separately, deployed, scaled, it’s quite easy to separate out the service into versioned packages or a separate out of process service.
It's refreshing to read an article which challenges common wisdom.
I've endured a lot of suffering at the hands of the microservices fan club. It's good to see reason finally prevail over rhetoric.
It would have been nice if people had written articles like this 2 years ago but unfortunately, people with such good reasoning abilities would probably not have been able to find work back then.
Software development rhetoric is like religion. If you're not on board you will be burned at the stake.
So many times during technical discussions, I had to keep my mouth shut in the name of self-preservation.
> So many times during technical discussions, I had to keep my mouth shut in the name of self-preservation.
This sounds like an issue with being able to articulate why something is or isn't going to net the expected benefits or being able to foresee unexpected risks. Keeping silent is better than throwing out silly hyperbole risks, but not bring up real risks because "they don't want to hear it" is completely bogus. Any solid engineer will bite at another potential risk to ensure they don't find themselves engineered into a corner 65% through a project. Your comment also makes it out like the notion the article is making, monolith over microserves, is gospel for every situation; that in no condition would it ever make sense to use microserves and that only naive zealots would espouse the wisdom (dogma) to use them. You can use any piece of technology poorly, that doesn't mean the core concept is flawed, just that your problem space is different than what that software is trying to solve. Consider using HDFS as a primary data store in place of MySQL where it doesn't make sense and you might cry the wisdom of wishing someone had told you HDFS is terrible and to just use the tried and true MySQL of olden days.
>> This sounds like an issue with being able to articulate why something is...
The issue is not articulation of ideas; the issue is that when all the books, all the articles and all people believe that something is true, there is no amount of articulation which will be able to convince them otherwise.
You have to wait for the hype to go away before even considering bringing up the argument.
Didn't take long for "we used shared libraries, and then found out we couldn't deploy independently". Sounds like you weren't quite doing microservices?
This is a classic case of not understanding micro services and trying to fit a problem around a tool.
At work, we have close to ~50 services(no one calls them microservices), but they do not suffer from this brittleness. We segregate our services based on languages. So, all C services go under coco/ , all Java services go under jumanji/ , all go services go under goat/ , all JS services go under js/. This means, everytime you touch something under a repo, it affects everyone. You are forced to use existing code or improve it, or you risk breaking code for everyone else. What does this solve? This solves the fundamental problem a lot of leetcode/hackerrank monkeys miss, programming is a Social activity it is not a go into a cave and come out with a perfect solution in a month activity. More interaction among developers means Engineers are forced to account for trade offs. Software Engineering in its entirety is all about trade offs, unlike theoretical Comp Science.
Anyway, this helps because as Engineers we must respect and account for other Engineers decisions. This methods helps tremendously to do this. No one complains, everyone who wants 1000 more microservices usually turns out to be a code monkey entangled in new fad, or who doesn't want to work with other Engineers.
You want to use rust? There is a repo named fe2O3/, go on. Accountability and responsibility is on your shoulders now.
If you think about it, an Engineer is tied to his tools, why not segregate repos at language level instead of some arbitrary boundary no one knows about in a dynamic ecosystem?
I'd wager that microservices, a lot of the time, are basically used as a management structure rather than for their benefits as pure tech, so less mature teams can silo themselves off and avoid communication (e.g. "I can work just on my backend image processing bit without dealing with the React guys now", "now the CTO won't be on my back so much," or whatever).
The irony being that anything approaching SOA (or microservices) requires exactly the same amount of communication. More likely they require more since it's almost certain that such a decision introduced chaos.
I think what was missed, in the article, is that the fundamental problem was centered around a shared architecture of destinations and shared code.
You cannot possibly have every destination be a separate repo and then have the development lifecycle of your shared code be so active that it ultimately puts at risk the architecture of your entire organization.
What makes shared code so perfect is having stability such that you extract your variant code into your non-shared code. Shared code should evolve at a much slower pace than your non-shared code or you risk this very outcome.
Microservices are not dead, nor are they the solution to everything. We need better architects.
> ... the fundamental problem was centered around a shared architecture of destinations and shared code.
From an architectural perspective, there is absolutely no difference between a micro service and a library. The only real difference is in the dispatch mechanism.
The problems around configuration management are the same problems we've had as programmers for decades. It's just that the people who are keen on micro services are usually not old enough to have experienced the pain in a different context.
Should shared libraries be in different repos, or should you put everything in a single repo? How do you deal with versioning? What happens if one app wants version 1 of the library and another app wants version 2? How do you deal with backwards compatibility of the API? Do you make a whole new library when you decide the old API is incompatible with the new vision? Blah, blah, blah, blah. None of this is a new problem.
I can't remember which version of Windows it was (maybe 7?) where they were seriously delayed mainly because they had so many programs using different versions of libraries. Integrating it at the end was apparently complete hell. Since they wanted to have separation of responsibility in their groups, each group was just pounding away implementing the features that they needed, but not integrating as they went -- because that would mean lots of cross team communication. The exact same thing is likely in a large organisation with a ton of micro services.
There isn't just one way to solve the problem. Mono repos and monoliths help in certain ways and cause problems in other ways. There are other techniques as well (should we implement ld.so for micro services? :-) ) But as you mention, the real answer is that the solution requires humans, not technology.
Architecturally speaking, there are some similarities between micro services and libraries because they’re both forms of modularization and usually have an API, but there are some stark differences beyond the “dispatch mechanism”.
The main difference is that a service’s deployment lifecycle is completely up to the service admin. Microservices are like websites - they can continuously evolve (within their API’s contract) without asking permission from consumers. This is their main superpower and why they’re a way of scaling a development organization without slowing it down too much.
In the case of a shared library, it’s completely up to the host admin as to when to upgrade. In the case of a static library, it’s up to the consuming software to determine when to upgrade. A service can upgrade when it feels like it.
Issues of API backwards compatibility, forwards compatibility, extensibility, self descriptiveness, versioning, etc. are old issues but usually have different answers when upgrades are truly happening all the time and not just in theory. It tends towards much fewer hard versions and more evolutionary backwards compatibility.
IOW, microservices aren’t a cure all, but they do encourage a set of behaviors. Many articles detracting from them seem to have not wanted those behaviors in their org in the first place.
You are assuming that microservices must be deployed on the internet, however they can be deployed on another type of network or on a single OS installation, which would prevent them from being completely transparently updated.
Shared libraries can be an implementation of micro services. On a platform (e.g. Android) it can be that a shared library is updated and then all consumers are forced to update.
No, I’m assuming they must be deployed on an IP network of some sort, not necessarily the Internet. A microservice by definition is a networked service. The whole REASON the term was coined was to differentiate from services which often were developed and deployed monolithically, as opposed to truly autonomous runtime processes (aka. bounded contexts).
Services, whether SOA, web services, messaging services, or network services etc, as in SOA, describe a client/server architecture with an API. Usually these APIs are designed by the principles of Domain Driven Design, where different teams map to bounded contexts that have their own published API.
Microservices are a form of SOA where each API also runs in its own process and thus has independent deployment lifecycle. Many in the SOA world advocated for this 10-15 years ago (and often ignored), and now the industry has gone back and coined a term for this practice.
In an organization that does microservices properly at scale, like Amazon, you have teams that build and run and upgrade their service autonomously from others. Read the Steve Yegge rant about Amazon and Platforms to understand this. It allows tremendous parallelisation of effort and allows for thousands of deploys to production daily without breakage. This is hard to pull off though, and a new initiative doesn’t generally won’t more than a small handful of services.
If it’s a shared library, then call it a shared library. It has a completely different lifecycle from a microservice. In your Android example, shared libraries by definition are controlled by whomever can dictate OS updates, not the app developer.
> "should we implement ld.so for micro services? :-)"
Working exactly along those lines this week on multiple services exposed over REST APIs, I was wondering if tools exist to check compatibility between them. Said differently,
- I have a `swagger.yaml` for my service A managing chipmunks and it says endpoint `/chipmunks` supports an 'color' query parameter.
- In service B, I have a `handleToServiceA` that encapsulates calling A. Then say I write `const chipmunks = handleToServiceA.getChipmunks({'colour': 'blue'})`.
Are there (whatever the ecosystem) tools that would read serviceA's swagger.yaml, detect my error in serviceB (color -> colour) and report the issue at compile time rather than run time?
At work, to alleviate this issue, we use a client library for service A (i.e., one that provides service.getChipmunks(color='blue')). This gives you the compile time check, assuming the client library is a compiled language.
One step better is autogenerated client libraries where a new version is created every time a new version of swagger.yaml is deployed. However, I don't know an open source project that does this.
> Should shared libraries be in different repos, or should you put everything in a single repo? How do you deal with versioning? What happens if one app wants version 1 of the library and another app wants version 2? How do you deal with backwards compatibility of the API? Do you make a whole new library when you decide the old API is incompatible with the new vision?
Could you share methods that worked well for you?
Asking because those question come up a lot and there never seems to be any conclusion.
"From an architectural perspective, there is absolutely no difference between a micro service and a library. The only real difference is in the dispatch mechanism."
I always tell people if you can't write and maintain a library then don't do microservices.
Agreed. Although with tools like Visual Basic it was quite nice to use. Now, DCOM was another story. Pretty much everybody I knew who used it started to become suicidal after a while...
Microservices do have various organizational tradeoffs. Individual teams can now own deployment, operation, language, and tooling choices for better or worse. This is probably advantageous for large companies where the number of teams scales beyond what a hierarchical control structure can support. In other words, a single devops team can't make good decisions and prescriptions about languages, tooling, deployment, etc nor effectively react to feedback from the dev teams. Control (power and responsibility) becomes distributed instead of centralized, with all the tradeoffs that entails.
The practices of Doctors are based off of science and theory. Design patterns and microservices while technical are not based off of science. They are ideas without quantitiative basis or science.
I agree that we need more empirical investigations of software 'diseases' and 'cures'.
That being said, practical medicine is much less scientific than many think. There is a lot of master/apprentice learning going on, just as in software engineering.
I don't even think we need empirical investigations. A formal logic system from axiomatic rules for architecture is really enough. I use the term science here incorrectly. I don't mean experimentation. I mean formalized logic. Like graph theory, complexity theory, number theory.... Architecture Theory.
In the world of math you don't need empirical data to verify a point. It's all logic derived from a small set of axioms.
Ah, here I disagree. Architecture is usually influenced not only by the software requirements, but also by many non-technical constraints (how is your development organization structured, how skilled are the devs, how sure are you that requirements will not change over time, etc.). There are just too many soft factors involved to 'compute' a solution.
Actually computing science is the most well understood science. Physicists can only give results with a few millionths* of accuracy, a computing scientist can prove a theorem about his subject matter with perfect accuracy.
Everyone else, including doctors, who call themselves scientists are just trying to float on the cachet physicists earned with their astonishingly good predictions. Properly speaking, they are phenomenologists . Please note that I'm not saying what they do isn't of great societal and intellectual value! The study and categorization of phenomena is certainly a noble enterprise. But none of them can make predictions good to 9 decimal places.
*might be billionths or quadrillionths by now in QED, but doesn't change my point.
You are diving to deep and talking about the difference between formal logic and empirical sciences. I am not talking about that.
Things like algorithmic complexity are well understood and formalized but design patterns are not a science nor has the concepts ever been formalized..
There is no theory or formalized system that says monolithic is better than micro or vice versa, it's all opinion. That's why its' called design.
Formal logic is a tool used by computing scientists and that they made almost incredible contributions to. It’s a tool that’s available to other scientists too, and the more rigorously minded ones use it.
For your latter point, it’s a matter of elegance, which is a pretty way of saying cognitively manageable. Think of epicycles vs Newtonian mechanics as an analogy. With enough epicycles you can compute the same result, but Newton’s approach is still a clear scientific advance.
Your point about design is well taken. Any given design is analogous to a theory. So we should aim for the simplest and most cognitively manageable design that satisfies our needs. That’s not literally formalized, but it’s a well established principle with an excellent record.
What do you mean by model checking? Usually anything with the keyword "Design" like design patterns for microservices have no science or mathematics to back it up.
You may be technically correct, but I'd argue it's generally more useful what we do have: practical experience. When one says "we need better architects", to me that implies that our current architects are creating design patterns and reference implementations that are both 1) not practical, and 2) don't adhere to known-good best practices.
Systems design in real life is an "artistic science": there are always known limitations (and some expected unknown ones) that rule out the theoretic optimal design for good reasons. The problem is that limitations and compromises are often not disclosed, and that many programmers and architects are too inexperienced to really grok the implicit meaning behind specific design decisions.
So we struggle along, with bloggers, researchers and FOSS contributors halfheartedly collaborating to make point improvements as solutions are discovered. Stodgy enterprises suffer, big tech companies make decisions with global impact, startups are left wondering WTF to do, and the rest of us largely don't care. Why? Because nit picking doesn't solve business problems [almost ever. I'd argue this point in cases of things like SCADA systems and other mission critical control systems.].
The thing with practical experience is you can put two engineers with practical experience in a room and they can argue for days about a architecture or design pattern.
Nobody argues about which algorithm is better for sorted data sets: linear search or binary search. Theory already establishes one is faster than the other, but no theory establishes which architecture is better than the other.
A model checker is a software program you use to validate a given mathematical model of a system. If the proposed properties of the model hold the model checker will let you know. More interestingly if the properties fail to hold you'll get a trace back into where your assumptions fell apart.
TLA+ is one such system that includes a language for writing models and a model checker to verify them for you. In the context of microservices you would write a model of your services and the checker would help you to verify that certain properties of your model will hold for all possible executions of the model. Properties people seem to be interested in are consistency and transaction isolation. You can develop a model of your proposed microservices architecture and work out the errors in your design before you even write a lick of code.
Or if you already have a microservice system you could write a model of it and find if there are flaws in its design causing those annoying error reports.
Amazon wrote a paper about how they use it within the AWS team [0]. Highly worth the read. And if any of this sounds interesting I suggest checking out Hillel Wayne's course he's building [1].
The certificate means nothing, the training and instruction is priceless. Add to that things like TOGAF and a deep understanding of the current state of existing architectures and you'll understand what I originally meant, but failed to explain.
The closest mathematical formal system I've seen that dictates architecture is category theory. But even this system doesn't say anything about being "better" or more efficient than another design pattern.
I agree, though I'm not so cynical about it. Well-defined API contracts are themselves a communication mechanism. If I provide an API, I am declaring that if you interact with me in a given way, I will behave in a certain way. Given that one of the hardest parts of scaling an organization is the boundaries between individuals and teams, providing a structured mechanism to define system behavior is incredibly valuable.
I hated that in our monolith Java web app, any random team could come along and pepper our team's module with global variable lookups and short term hacks that never get cleaned up. Even though we see the changes often times they are urgent changes needed right now, accompanied with promises to clean up later (which usually doesn't happen).
The nice thing about a SOA architecture is that it makes it harder to do this kind of cowboy programming. Yes maybe some quick business wins are harder than in a monolith, but I think (at least in our org) the cleaner architecture pays off by letting us move quicker on a different class of product projects since there is less technical debt.
Yes. And that's okay, sometimes. To me, the difference is whether it's an organizational challenge or an organizational dysfunction. There will always be challenges inherent to operating a business—and technical solutions often play a critical part. However, if the issue is really an organizational malady that should be addressed, technical solutions probably aren't the right approach.
I would flip this argument around 180 degrees. Once you have any system that's built by large numbers of people, you have to discuss and negotiate your interfaces. You can either do this implicitly, by having a shared codebase and making sure that your test coverage and code reviews guarantee backwards compatibility, or explicitly, by defining service interfaces and black-boxing behind them.
This is why things like ticketing systems and other workflow tools almost universally suck. Manager needs x field to run some report so x field becomes mandatory and the software is a pain to use because know one apart from that one manager knows what the field even means.
Currently our JIRA kanban board is crippled because someone just had to take a simple system and impose a workflow that can't be deviated from.
Yeah. We're currently in a second push to introduce microservices and this time I'm on board. The first time was more about enforcing code ownership and such, and that was going to be a mess.
Now, people are pondering things like: Ok, management of elasticsearch has performance issues and it's a general pain if the elasticsearch documents change. So let's try to move the schema of elasticsearch documents into a strictly semver'd artifact. And let's move management of our search indexes into a service depending on that artifact, so the schema changes in a controlled way. And ops and the search team can scale and optimize the service as they need to minimize search outages.
Creating smaller services based on problems is a good thing. Creating smaller services because of ... reasons... not so much.
What is the value proposition of tieing an ingester, parser, transactional service (e.g. all written with go), and then tieing together your api, client app, mobile app (all in js), just because they share a common language?
arguably they aren't close to each other in tiers of the stack, and don't really overlap. The sorts of libraries you might choose to use to do something could differ, (e.g. json processing - in the front end, you'd pick usability and security over a lower level more optimized transcoder for the api).
That said, I agree with your core premise- not understanding, and chasing a nail with a hammer, but geekily named repos per language just sounds like someone who doesn't understand how something git works...
>> This means, everytime you touch something under a repo, it affects everyone.
>This is horrible at scale.
I just want to reiterate this. In the early 2000s I worked in the online platform group at EA. The list of things done poorly there was long, but picture:
* 40+ engineers
* Monorepo with hundreds of thousands of classes; all code deployed to all servers.
* Hundreds of different services running across thousands of servers.
* Communication based on Java serialization, so all code had to be deployed to all servers at the same time.
* Deployments (and thus downtime) sometimes lasted up to a hour. Worldwide audience; it was always in the middle of someone's day.
* Rational Clearcase for version control. It took nearly an hour to sync to tip.
Pretty much every morning you'd come in, spend an hour syncing, find that someone broke the build, hunt them down, and resync for another hour. Generally speaking the first few hours of every day were wasted for 40 engineers.
This was a very poor platform.
Sometimes I wonder how it's going there these days.
This can work fine at scale, Google does it with however many tens of thousands of engineers they have these days. Having everything in a monorepo doesn't solve the communication problem, it doesn't prevent solving it either.
Sort of. Google is also well known for having a dozen chat apps. What most people don't know is that there are also likely a dozen different libraries/utilities/interfaces performing near identical services just because an engineer or product team weren't 100% satisfied with what already existed. So yes, it's a monorep, but holy heck there's a lot of duplicative cruft.
> all code deployed to all servers
This is the main point of the post. Google most certainly doesn't do this, even if they have all their code (or at least, all their private code) in one repo (piper).
I'm a Python guy, but there are lots of things about JS that make me jealous. For example, the main implementations are quite fast, and the tooling is good. In Python, there is Pypy which doesn't get enough investment, or CPython where your only optimization lever is "rewrite it in C!". TypeScript also seems quite a lot nicer/less-broken than Python's mypy. Lastly, I like that JS has actual, multi-expression lambdas and a pleasant syntax.
Some things are more pleasant in Python--everything is sync by default, there's less churn, the standard library gets me a lot farther, it's less permissive (no 'undefined is not a function' nor '!= vs !=='), etc. Both languages serve similar niches well, but the feeling I get is that JS is quite a bit faster and nicer for IO-heavy workloads (JS just has a more mature async story than Python) while Python is generally more intuitive and perhaps better for general application development. But for most things, one isn't dramatically better than the other, and they're both a good deal more disappointing than Go. :p
2 or 3? Or all of the problems that come with choosing between the two (no, it is still not an easy choice).
Mypy coverage, even in the stdlib, is awful. When it comes to thirdparty, mostly nonexistent. Mypy feels so young - I love the team, love their work, but I still run into cases where inference fails when it shouldn't, where error messages are extremely unhelpful, etc.
Exceptions everywhere, for control flow even - iterators are implemented with exceptions.
Absolutely AOT unoptimizable. Pypy's cool, never got it working for my use case. In theory a JIT could help.
Speaking of calling out to C... you think you're writing in a memory safe language, but actually, you're writing in a memory safe language that's probably been hollowed out and replaced with a fast C implementation. But it's actually worse
Exploiting C code loaded by Python is like exploiting C code from the 1990s.
No parallelism. Multiprocessing? Good luck with that - pay the cost of pickling, pay the cost of an additional interpreter, pay the cost of debugging hell.
An ecosystem split in two, and don't let anyone tell you otherwise - a few hundred top packages moving over after many, many years, is a sad state for a language that was known for having an absurdly large ecosystem.
I could really just go on and on and on, but at some point it just feels mean.
Huge respect for the project and the team but Python has made mistakes (as all languages do). They were understandable mistakes, but they were mistakes. It's fine for some things, but there's plenty wrong with it, just like there's a ton wrong with javascript. But javascript gets probably 1000x the flack.
> Mypy coverage, even in the stdlib, is awful. When it comes to thirdparty, mostly nonexistent. Mypy feels so young - I love the team, love their work, but I still run into cases where inference fails when it shouldn't, where error messages are extremely helpful, etc.
Yeah, the typing stuff is a pretty big disappointment. The ergonomics are pretty terrible (probably because they wanted to push as far as they could without introducing more syntax support for typing). Mypy isn't just young, but it's buggy and its codebase was a sloppy mess last I checked. There's no support for recursive types (you can't define a JSON type, for example). And it absolutely falls over in the face of common libraries, like SQLAlchemy, which are too dynamic for it.
> Absolutely AOT unoptimizable. Pypy's cool, never got it working for my use case. In theory a JIT could help.
Yeah, Pypy is the best hope for Python's performance. They're making great progress, but I also couldn't get it working in our Python 3 codebase (Numpy and Pandas installation issues).
> No parallelism. Multiprocessing? Good luck with that - pay the cost of pickling, pay the cost of an additional interpreter, pay the cost of debugging hell.
Yeah, this is a real pain point. Some die-hard Python folks say otherwise, but there's really no good parallelism option for lots of workloads. Pickling is just too expensive.
> An ecosystem split in two, and don't let anyone tell you otherwise - a few hundred top packages moving over after many, many years, is a sad state for a language that was known for having an absurdly large ecosystem.
This hasn't been a problem for me for years. Most things that have seen active development in the last 5 years have good Python 3 support. The only time I've run into a Python-2 only utility, it was 7 years stale. Well, except for Centos's `yum`.
These are all reasons I like Go, by the way. Super fast, great tooling, and fairly stable (except for the package management story). I do wish there was a lightweight scripting language with a great VM and real parallelism--something like JS without the inheritance, OO baggage, etc; just objects and arrays and functions running on a JIT VM like V8 but with parallelism as a first-class citizen a la BEAM. And preferably optionally typed from the start, in a way that the runtime could leverage for optimization purposes.
Easy to say. Tell that to Google and Dropbox - Guido has worked for both and they're on 2. Tell that to the companies that can't afford the creator of the language.
"2 or 3" wasn't a choice for Dropbox/Google, and even if it had been, it's not a choice that concerns you. All the stuff they produce publicly is Python 3-compatible.
I should have mentioned that "async coroutines" (i.e., goroutines) are also a requirement. Also, I understand that Lua doesn't have much of a standard library and isn't really used as a general purpose programming language (compared to Python, for example)?
Your understanding is correct, it's an embedded-first language. It also makes it very hackable which means that I fully believe golang-like goroutines would be achievable in Lua.
People hack the `continue` statement into it all the time, it wouldn't be terribly surprising.
> Buts it's especially terrible because it was designed by a guy without much language expertise in a handful of weeks.
The core isn't too bad; I'd argue that the main reason why it's hard to work with is because you don't have tight control over your execution environment, mostly since the language wasn't really standardized until pretty late. It also solves a much different problem than most languages since it has to optimize for better UX (not crashing the whole program on exceptions, for example).
Ok so if I want to build a fledgling project all I need to do is leverage a language nobody uses yet. Lolcats ftw.
Seriously though constraining dev's to use a monolithic repository in the name of discouraging cowboys is like tying people's legs together to ensure they walk in an aligned direction. Sure it works but seriously unnecessary pain. Your company should invest in integration tests
This means either your contract agreement between services is not language agonistic or someone did not do their job when Interface schema was changed during reviews.
Just like any other software, interfaces live in separate repos e.g protoplasm/ for proto-buf definitions. avrobber/ for avro and so on so forth.
So, any change to protoplasm/ triggers automated tests on all other services irrespective of boundaries.
ok. But _much_ easier to have compile time checking between projects in the same language/repo. Not saying this approach is wrong, but it's a little like riding a unicycle when a perfectly good bicycle is sitting right there.
I think I did not present this correctly, I apologize for that.
What I mean this this. Any IDL interfaces live at their functional boundaries. e.g all proto definitions shared internally by java services live under jumanji/proto/, all proto definitions shared internally bu Go service live under goat/proto/ and so on so forth.
But anything which is shared in a public manner e.g any proto definitions between Java and C live o
>> A huge point of frustration was that a single broken test caused tests to fail across all destinations. When we wanted to deploy a change, we had to spend time fixing the broken test even if the changes had nothing to do with the initial change. In response to this problem, it was decided to break out the code for each destination into their own repos
They also introduced tech debt and did not responsibly address it. The result was entirely predictable, and they ended up paying back this debt anyway when they switched back to a monorepo.
>> When pressed for time, engineers would only include the updated versions of these libraries on a single destination’s codebase... Eventually, all of them were using different versions of these shared libraries.
To summarize, it seems like they made some mistakes, microed their services in a knee-jerk attempt to alleviate the symptoms of the mistakes, realized microservices didn't fix their mistakes, finally addressed the mistakes, then wrote a blog post about microservices.