>> A huge point of frustration was that a single broken test caused tests to fail across all destinations. When we wanted to deploy a change, we had to spend time fixing the broken test even if the changes had nothing to do with the initial change. In response to this problem, it was decided to break out the code for each destination into their own repos
They also introduced tech debt and did not responsibly address it. The result was entirely predictable, and they ended up paying back this debt anyway when they switched back to a monorepo.
>> When pressed for time, engineers would only include the updated versions of these libraries on a single destination’s codebase... Eventually, all of them were using different versions of these shared libraries.
To summarize, it seems like they made some mistakes, microed their services in a knee-jerk attempt to alleviate the symptoms of the mistakes, realized microservices didn't fix their mistakes, finally addressed the mistakes, then wrote a blog post about microservices.
They become a bucket of clichés and abstract terms. Clichéd descriptions of problems you're encountering, like deployments being hard. Clichéd descriptions of the solutions. This let's everyone in on the debate, whether they actually understand anything real to a useful degree or not. It's a lot easier to have opinions about something using agile or microservice standard terms, than using your own words. I've seen heated debates between people who would not be able to articulate any part of the debate without these clichés, they have no idea what they are actually debating.
For a case in point, if this article described architecture A, B & C without mentioning microservices, monoliths and their associated terms... (1) Far fewer people would have read it or had an opinion about it. (2) The people who do, will be the ones that actually had similar experiences and can relate or disagree in their own words/thoughts.
What makes these quasi-ideological in my view is how things are contrasted, generally dichotomously. Agile Vs Waterfall. Microservices Vs Monolithic Architecture. This mentally limits the field of possibilities, of thought.
So sure, it's very possible that architecture style is/was totally besides the point. Dropping the labels of microservices architecture frees you up to (1) think in your own terms and (2) focus on the problems themselves, not the clichéd abstract version of the problem.
Basically, microservice architecture can be great. Agile HR policies can be fine. Just... don't call them that, and don't read past the first few paragraphs.
The problem, as your identify, is that once a pattern has been identified people too easily line up behind it and denigrate the "contrasting" pattern. The abstraction becomes opaque. We're used to simplistic narratives of good vs evil, my team vs your team, etc. and our tendency to embrace these narratives leads to dumb pointless conversations driven more be ideology than any desire to find truth.
I just think there can be downsides to them. These are theories as well as terms and they become parts of our worldview, even identity. This can engage our selective reasoning, cognitive biases and our "defend the worldview!" mechanisms in general. At some point, it's time for new words.
Glad people seem ok with this. I've expressed similar views before (perhaps overstating things) with fairly negative responses. I think part of it might be language nuance. The term "ideology" carries less baggage in Europe, where "idealist" is what politicians hope to be perceived as while "ideologue" is a common political insult statesside, meaning blinded and fanatic.
Concepts like microservices synthesize a bunch of tradeoffs and patterns that have been worked on for decades. They’re boiled down to an architecture fad, but have applicability in many contexts if you understand them.
Similarly with Agile, it synthesizes a lot of what we know about planning under uncertainty, continuous learning, feedback, flow, etc. But it’s often repackaged into cliche tepid forms by charlatans to sell consulting deals or Scrum black belts.
Alan Kay called this one out in an old interview:
“computing spread out much, much faster than educating unsophisticated people can happen. In the last 25 years or so, we actually got something like a pop culture, similar to what happened when television came on the scene and some of its inventors thought it would be a way of getting Shakespeare to the masses. But they forgot that you have to be more sophisticated and have more perspective to understand Shakespeare. What television was able to do was to capture people as they were.
So I think the lack of a real computer science today, and the lack of real software engineering today, is partly due to this pop culture.”
I will take issue with one thing though... Shakespeare's plays were for something like a television audience, the mass market. The cheap seats cost about as much as a pint or two of ale. A lot of the audience would have been the illiterate, manual labouring type. They watched the same plays as the classy aristocrats in their box seats. It was a wide audience.
Shakespeare's stories had scandal and swordfighting, to go along with the deeper themes.
A lot of the best stuff is like that. I reckon GRRM a great novelist, personally, with deep contribution to the art. Everyone loves game of thrones. It's a politically driven story with thoughtful bits about gender, and class and about society. But, its not stingy on tits and incest, dragons and duels.
The one caveat was that Shakespeare's audience were all city slickers, and that probably made them all worldlier than the average Englishman who lived in a rural hovel, spoke dialect and rarely left his village.
What is an elitist pursuit is not really Shakespeare, it's watching 450 year old plays.
We really like to think in silos, categorize everything to make them feel familiar and approachable. Which is useful, but sometimes we need to shake them off so we can actually see the problems.
After a while... it's like the cliché about taxi drivers investing in startups... Sign it's time to get out. When people I know have no idea start talking about the awesomeness of some abstract methodology... I'm out.
In reality this process has absolutely nothing to do with the structure of the organisation. It's true purpose is to shuffle out people who are in positions where they're performing poorly, and move in new people. It just provides cover (It's not your fault, it's an organisational change).
This is exactly the same, they couldn't say "You've solved this problem badly, go spend 6 months doing it properly". So instead they say they need a new paradigm to organise how they build their solution. In the process of that they get to spend all the time they need fixing the bad code, but it's not because it's bad code, it's because the paradigm is wrong.
The problem is the same problem with the organisational structure- if you don't realise the real purpose, and buy into the cover you end up not addressing the issue. You end up with a shit manager managing a horizontal and then managing a verticle, then managing a horizontal. You end up with a bad monolithic-service instead of bad micro-services.
That seems... appropriate?
This is the general problem with the microservices bandwagon: Most of the people touting it have no idea when or why it's appropriate. I once had a newly hired director of engineering, two weeks into a very complicated codebase (which he spent nearly zero time looking at), ask me "Hey there's products here! How about a products microservice?" He was an idiot that didn't last another two months, but not before I (and the rest of the senior eng staff) quit.
I'm fully prepared to upvote more stories with the outline of Microservices were sold as the answer! But they weren't.
It's almost like engineering techniques aren't magic pixie dust that you can sprinkle over your project and get amazing results...
Micro services is an extremely powerful pattern which solves a bazillion critical issues most important ones being:
* separation of concerns
* ability of different teams maintain, develop and reiterate on different subsystems independently from each other
* loose coupling of the subsystems
Do you have auth server that your API accesses using auth.your.internal.name which does not share its code base with the API? You have a micro service.
Do you have a profile service that is responsible for the extra goodies on a profile of a user that the rest of the API business logic does not care about?
You have a micro service. Do you spin up some messaging layer in a cloud that knows a few things about the API but really is only concerned with passing messages around? You have a micro service.
The alternative is that you have a single code base and a single app that starts with ENV_RUNMODE=messaging or ENV_RUNMODE=API or ENV_RUNMODE=website and ENV_RUNMODE=auth ( except in the case of auth it only implements creation/changes of the new entries and a change of passwords but not validation as the validation is done by the code in any ENV_RUNMODE by accessing the auth database directly with read-write privilege and no one ever implemented deletion of the entries from the authentication database. Actually, even that would be an good step - there's no auth database because that would require knowing the mode we are running in and managing multiple sets of credentials so instead it is simply another table in a single database that stores everything )
That is the alternative to micro services. So I would argue that unless Segment has that kind of architecture it does not have a monolith. It implements a sane micro services pattern.
Should the engineering be lead by a blind squirrel that once managed to find a nut, in a winter, three years ago, the sane micro services pattern would be micro serviced even more -- I call it nanoservice pattern aka LeftPad as a service. We aren't seeing it much yet but as Go becomes bigger and bigger player in shops without excellent engineering leadership I expect to see it more and more due to Go giving developers tools to gRPC between processes.
This made me lol
You try to remove the critique from microservices, but for me these issues are actually good arguments against microservices. It's hard to do right.
This is correct; I'd argue doing microservices right is even harder than doing a monolith right (like, keeping the code base clean).
Then there have been quite a few warnings to not use shared code in microservices.
Even so, imagine the chaos if frequently engineers/devs need to add code to one lib(the shared one), wait for PR approval, then use that new version in a different lib to implement the actual change that was needed? Thats seems to be introducing a direct delay into getting anything productively done...
We went through this painful period. Kept at it devoting a rotating pair to proactively address issues. Eventually it stabilized, but the real solution was to better decouple services and have them perform with more 9s of reliable latency. Microservices are hard when done improperly and there doesn't seem to be a short path to learning how to make them with good boundaries and low coupling.
>>To summarize, it seems like they made some mistakes, microed their services in a knee-jerk attempt to alleviate the symptoms of the mistakes, realized microservices didn't fix their mistakes, finally addressed the mistakes, then wrote a blog post about microservices.
I read the article a few days ago and was struck by what a poor idea it was to take a hundred or so functions that do about the same thing and to break them up into a hundred or so compilation and deployment units.
If that's not a micro-service anti-pattern, I don't know what is!
In particular, as to their original problem, the shared library seems to be the main source of pain and that isn't technically solved by a monolith, along with not following the basic rule of services "put together first, split later".
I feel prematurely splitting services like that is bound to have issues unless they have 100 developers for 100 services.
The claim of "1 superstar" is misleading too, this service doesn't include the logic for their API, Admin, Billing, User storage etc etc, it's still a service, one of a few that make up Segment in totality.
Reading about their setup and comparing with some truly large scale services I work with, I'm left with the idea that Segment's service is roughly the size of one microservice on our end.
Perhaps the takeaway is don't go overboard with fragmenting services when they conceptually fulfill the same business role. And regardless of the architecture of the system, there are hard state problems to deal with in association with service availability.
Some rules of thumb I just came up with:
Number of repos should not exceed number of developers.
Number of tests divided by number of developers should be at least 100.
Number of lines of code divided by number of repos should be at least 5000.
Your tests should not run faster than the time it takes to read this sentence.
A single person should not be able to memorize the entire contents of a single repo, unless that person is Rain Man.
I'd say you've never had good tests.
I have a test-suite for a bunch of my frameworks that dates to the mid 90s, with tests added regularly with new functionality.
It currently takes 4 seconds total for 6 separate frameworks and 1000 individual tests. Which is actually a bit slower than it should be, it used to take around 1-2 seconds, so might have to dig a little to see what's up.
With tests this fast, they become a fixed part of the build-process, so every build runs the tests, and a test failure is essentially treated the same as a compiler error: the project fails to build.
The difference goes beyond quantitative to qualitative, and hard to communicate. Testing becomes much less of a distinct activity but simple an inextricable part of writing code.
So I would posit:
Your tests should not run slower than the time it takes to read this sentence.
If you're using a tape drive or SD cards, sure. But even a 10 year old 5400RPM on an IDE connection should be able to satisfy your tests' requirements in a few seconds or less.
I suspect your tests are just as monolithic as you think microservices shouldn't be. Break them down into smaller pieces. If it's hard to do that, then redesign your software to be more easily testable. Learn when and how to provide static data with abstractions that don't let your software know that the data is static. Or, if you're too busy, then hire a dedicated test engineer. No, not the manual testing kind of engineer. The kind of engineer who actually writes tests all day, has written thousands (or hundreds of thousands) of individual tests during their career. And listen to them about any sort of design decisions.
If you need to access a database in your tests you're probably doing it wrong. Build a mock-up of your database accessor API to provide static data, or build a local database dedicated for testing.
I tend to use my integration tests also as characterization tests that verify the simulator/test-double I use for any external systems within my unit tests.
See also: the testing pyramid and "integrated tests are a scam", which is a tad click-bait, but actually quite good.
Either way, the idea of considering slow tests a feature was novel to me.
It's not ridiculous. It's good.
I work on an analysis pipeline with thousands of individual tests across a half dozen software programs. Running all of the tests takes just a few seconds. They run in under a second if I run tests in parallel.
If your tests don't run that fast then I suggest you start making them that fast.
I'd be willing to bet that if you learned (or hired someone with the knowledge of) how to optimize your code, you could get some astounding performance increases in your product.
I felt this article is more about how to use microservices right way vs butchering the idea. It is not right to characterize this as microservices vs monolith service.
Initial version of their attempt went too far by spinning up a service for each destination. This is taking microservices to extreme which caused organizational and maintenance issue once number of destinations increased. I am surprised they did not foresee this.
The final solution is also microservice architecture with a better separation of concerns/functionalities. One service for managing in bound queue of events and other service for interacting with all destinations.
unless they have 100 developers for 100 services.
It's not just code coverage that matters. It's the code path selection that matters. If you have a ton of branches and you've evaluated all of them once then yeah you sure might have 100% "coverage". But you have 0% path selection coverage since a single invokation of your API might choose true branch on one statement, false branch on another statement, and a second invokation might choose false branch on the first and true on the second.
While the code was 100% tested, the scenarios were not. What happens if you have true/true or false/false? That's not tested.
There's a term for this but I forgot what it is and don't care to go spelunking to find it.
sqlite calls this "branch coverage"
Tom (real guy) was too busy all the time to do anything other than the 80/20 rule. He was too busy because he didn't share. So of course he was a fixture of the company...
Now all the developers are going to the CTO or CEO and undermining the other developers, trying to persuade the CTO that so-and-so's code is shit.
This is a business decision, reality has no influence here.
That's why they invented the term "Business reality".
They also just might have had too many repos.
Agreed. I treat services like an amoeba. Let your monolith grow until you see the obvious split points. The first one I typically see is authentication, but YMMV.
Notice I also do not say 'microservices'. I don't care about micro as much as functional grouping.
Is this rule mentioned or discussed somewhere? A quick google search links to a bunch of dating suggestions about splitting the bill. Searching for the basic rule of services "put together first, split later" reveals nothing useful.
Changing one "shared library" shouldn't mean deploying 140 services immediately.
They had one service to begin with forked for each destination. Of course that was a nightmare to maintain!
I've worked with microservices a lot. It's a never-ending nightmare. You push data consistency concerns out of the database and between service boundaries.
Fanning out one big service in parallel with a matching scalable DB is by far the most sane way to build things.
If Linux tried to be an entire computing system all in one code base, (sed, vim, grep, top, etc., etc.) what do you think that would look like code base/maintainability wise? Sounds like a nightmare to me.
The system is defined by a graph of these nodes, since you can pipe messages wherever they're needed; all nodes can be many-many. Each node is a self-contained application which communicates to a master node via tcp/ip (like most pub/sub systems a master node is requried to tell nodes where to send messages). So you can do cool stuff like have lots of seprate networked computers all talking to each other (fairly) easily.
It works pretty well and once you've got a particular node stable - e.g. the node that acquires images - you don't need to touch it. If you need to refactor or bugfix, you only edit that code. If you need to test new things, you can just drop them into an existing system because there's separation between the code (e.g. you just tell the system what your new node will publish/subscribe and it'll do the rest).
There is definitely a feeling of duct tape and glue, since you're often using nodes made by lots of different people, some of which are maintainted, others aren't, different naming conventions, etc. However, I think that's just because ROS is designed to be as generic as possible, rather than a side effect of it running like a microservice.
I get where you are coming from. But you will go async sooner or later if you need any reasonable error recovery or reliability.
The only question is how much pain you will suffer before you do so.
gRPC has no queuing and the connection is held open until the call returns. All of Google's cloud databases are immediately consistent for most operations
But strictly speaking, even inside a single multicore CPU, there is no such thing as immediate consistency. The universe doesn't allow you to update information in two places simultaneously. You can only propagate at the speed of light.
Oh, and the concept of "simultaneous" is suspect too.
Our hardware cousins have striven mightily and mostly successfully for decades to create the illusion that code runs in a Newtonian universe. But it is very much a relativistic one.
You mean like BSD?
Busybox is a slightly better example (even though it’s also a userspace program).
they're part of the same repo and built at the same time than the kernel. Run a big "make universe" here : https://github.com/freebsd/freebsd and see for yourself. That they are different binaries does not matter a lot, it's just a question of user interface. See for instance busybox where all the userspace is in a single binary.
There is the "base" system which is the OS itself and common packages (all the ones you mentioned), then there is the "ports" repo which contains many open-source applications with patches to make them work with OpenBSD.
Here is a Github mirror of their repos: https://github.com/openbsd
I think OpenBSD has reaped many of the same advantages described by Segment with their monorepo approach, such as easily being able to add pledge to many ports relatively quickly.
Also I can pipe these tools together from the same terminal session, like
tail -f foo | grep something | awk ...
Probably one could come up with an abstraction to do Lego with Microservices but we're not there yet.
That's news to me, and seems insane.
Unless you mean "their own database tables", not "database servers". But that's just the same as having multiple directories and files in a Unix filesystem.
You would have to have an oddly disconnected schema if modifications to the program don't result in programs accessing parts of the database that other programs are already accessing. If this isn't a problem it means you're using your database as nature intended and letting it provide a language-neutral, shared repository with transactional and consistency guarantees.
so maybe not microservices, but fine nonetheless.
EDIT: two more comments:
- this is exactly what relational databases were designed for. If people can't do this with their micro-services, maybe their choice of database is the issue.
- "micro-service" as the original post suggests, is not synonymous with good. "monolith" is only synonymous with bad because it got run-over by the hype-train. If you have something that works well, be happy. Most people don't.
If you can afford all your components sharing the same database without creating a big dependency hell, then your problem is _too small_ for microservices.
If your problem is so large that you have to split it up to manage its complexity, start considering microservices (it might still not be the right option for you).
If it could be avoided, these systems were not touched anymore. Instead, other applications where attached to the front and sides.
So I would say: it applies to systems where proper modularization was neglected. In the anecdotical cases I referred to, one major element of this deficiency was a complex database, shared across the whole system.
This video is strangely relevant in this thread...
You mean GNU coreutils?
Sing that from the rooftops. That is exactly my observation as well. All the vanilla "track some resource"-style webapps I've worked on were never designed to cope with a consistency boundary that spans across service boundaries. Turning a monolith into distributed services is hard for that reason - you have to redesign your data access to ensure that consistency boundaries don't span across multiple services. If you don't do that, then you have to learn to cope with eventual consistency; in my experience, most people just don't think that way. I know I have trouble with it. Surely I'm not the only one.
I've never quite understood why people think that taking software modules and separating them by a slow, unreliable network connection with tedious hand-wired REST processing should somehow make an architecture better. I think it's one of those things that gives the illusion of productivity - "I did all this work, and now I have left-pad-as-a-service running! Look at the little green status light on the cool dashboard we spent the last couple months building!"
Programmers get excited about little happily running services, these are "real" to them. Customers couldn't care less, except that it now takes far longer to implement features that cross multiple services - which, if you've decomposed your services zealously enough, is pretty much all of them.
You can't have consistent microservices without distributed transactions. If a service gets called, and inside that call, it calls 3 others, you need to have a roll back mechanism that handles any of them failing in any order.
If you write to the first service and the second two fail, you need to write a second "undo" call to keep consistent.
Worse, this "undo state" needs to be kept transactionally consistent in case it's your service that dies after the first call.
In reality, nobody does this, so they're always one service crash away from the whole system corrupting the hell out of itself. Since the state is distributed, good luck making everything right again.
Microservices are insane. Nobody that knows database concepts well should go near them
I'd venture a guess that most applications have all sorts of race conditions that could cause data corruption. The fact of the matter is that almost nobody notices or even cares.
Aside from that, I've found that even in non-mission-critical scenarios ("it's just porn!") it's incredibly convenient to have a limited number of states the system can be in. It makes debugging easier and reduces the number of edge cases ("why is this null??") you have to handle.
I think you'd be surprised/alarmed at how little transactions actually get used in the software world. Not just on small systems where it doesn't matter but I've seen a complete absence of them in big financial ones handling billions of dollars worth transactions (the real world kind) a day. Some senior, highly paid people even defend this practice for performance reasons because they don't realize the performance cost of implicit transactions. And this is just the in process stuff where transactions are totally feasible, it get's even worse when you look at how much is moved around via csv files to FTP and excel sheets attached to emails. I've spent the last 2 weeks being paid to fix data consistency issues that should never have been issues in the first place.
Maybe when we're teaching database theory we shouldn't start at select/join but at begin transaction/commit/rollback?
However, I would argue that transactions are overkill. What's the worst case scenario if I book a ride for Grab and my request gets corrupted? I'm guessing I'll see an error message and I'll have to re-request my ride.
If your business model is to be cheap with high volume sales, then corrupting say 1 out of 10,000 customer transactions may be worth it. If you give customers a good price and/or they have no viable alternative, you can live with such hiccups, and the shortcuts/sacrifices may even make the total system cheaper. You are like a veterinarian instead of a doctor: you can take shortcuts and bork up an occasional spleen without getting your pants sued off. But most domains are NOT like that.
No, you just need loosely-coupled services, where inconsistency in this circumstance doesn't manifest as a problematic end-state.
If two microservices have to share databases, they shouldn't be microservices.
One microservice should have write access to one database and preferably, all read requests run through that microservice for exactly the reason you mentioned.
>I've never quite understood why people think that taking software modules and separating them by a slow, unreliable network connection with tedious hand-wired REST processing should somehow make an architecture better.
If you're running microservices between regions and communicating with each other outside of the network it is living in, you're probably doing it wrong.
Microservices shouldn't have to incur the cost of going from SF to China and back. If one lives in SF, all should and you can co-locate the entire ecosystem (+1 for "only huge companies with big requirements should do microservices")
>ustomers couldn't care less, except that it now takes far longer to implement features that cross multiple services - which, if you've decomposed your services zealously enough, is pretty much all of them.
Again, that is an example of microservices gone wrong. You'll have the same amount of changes even in a monolith and I'd argue adding new features is safer in microservices (No worries of causing side effects, etc).
I will give you +1 on that anyway because I designed a "microservice" that ended up being 3 microservices because of dumb requirements. It probably could've been a monolith quite happily.
Problem domains (along with organizational structures) inherently create natural architectural boundaries... certain bits of data, computation, transactional logic, and programming skill just naturally "clump" together. Microservices ignore this natural order. The main driving architectural principle seems to be "I'm having trouble with my event-driven dynamically-typed metaprogrammed ball-of-mud, so we need more services!".
The "natural" order is very often bad for reliability, speed and efficiency. It forces the "critical path" of a requests to jump through a number of different services.
In well built SOA you often find that the problem space is segmented by criticality and failure domains, not by logical function.
Which, strangely enough, ends up looking almost the same thing as running services on a microkernel.
Then before you know it there are a dozen more shared libraries and you have the distributed monolith.
Then either have to Stand up every micro service every time integration test or make changes and hope for the best.
That is not completely on the developer, either. Pre 4.0 Mongodb, for example, does not do transactions. On the other hand, I've seen some pretty flagrant disregard for it just because there are not atomicity guarantees.
Microservices makes reasoning on that harder.
I'd argue that's on the developer, if he was the one to choose a database that doesn't support transactions, and then didn't implement application-level transactions (which is very hard to do correctly).
(Further below, I'll go into in which contexts I'd agree with your assessment and why. But for now the other side of the coin.)
In the real world, current-day, why do many enterprises and IT departments and SME shops go for µservice designs, even though they're not multimillion-user-scale? Not for Google/Netflix/Facebook scale, not (primarily/openly) for hipness, but they do like among other reasons:
- that µs auto-forces certain level of discipline in areas that would be harder-to-enforce/easier-to-preempt by devs in other approaches --- modularity is auto-enforced, separation of concerns, separation of interfaces and implementations, or what some call (applicably-or-not) "unix philosophy"
- they can evolve the building blocks of systems less disruptively (keep interfaces, change underlyings), swap out parts, rewrites, plug in new features to the system etc
- allows for bring-your-own-language/tech-stack (thx to containers + wire-interop) which for one brings insights over time as to which techs win for which areas, but also attracts & helps retain talent, and again allows evolving the system with ongoing developments rather than letting the monolith degrade into legacy because things out there change faster than it could be rewritten
I'd prefer your approach for intimately small teams though. Should be much more productive. If you sit 3-5 equally talented, same-tech/stack and superbly-proficient-in-it devs in a garage/basement/lab for a few months, they'll probably achieve much more & more productively if they forgoe all the modern µservices / dev-ops byzantine-rabbithole-labyrinths and churn out their packages / modules together in an intimate tight fast-paced co-located self-reinforcing collab flow. No contest!
Just doesn't exist often in the wild, where either remote distributed web-dev teams or dispersed enterprise IT departments needing to "integrate", rule the roost.
(Update/edit: I'm mostly describing current beliefs and hopes "out there", not that they'll magically hold true even for the most inept of teams at-the-end-of-the-day! We all know: people easily can, and many will, 'screw up somewhat' or even fail in any architecture, any language, any methodology..)
Do they actually force a discipline? Do people actually find swapping languages easier with RPC/messaging than other ffi tooling? And do they really attract talent?!
You make some amazing claims that I have seen no evidence of, and would love to see it.
Regardless of whether you are a monolith or a large zoo of services, it works when the team is rigorous about separation of concerns and carefully testing both the happy path and the failure modes.
Where I've seen monoliths fail, it was developers not being rigorous/conscientious/intentional enough at the module boundaries. With microservices... same thing.
The disadvantage is obviously that creating such's a 'perfect architecture' is hard to do because of different concerns by different parties within the company/organisation.
I think you get at two very good points. One is that realistically you will never have enough time to actually get it really right. The other is that once you take real-world tradeoffs into account, you'll have to make compromises that make things messier.
But I'd respond that most organizations I see leave a lot of room for improvement on the table before time/tradeoff limitations really become the limiting factor. I've seen architects unable to resolve arguments, engineers getting distracted by sexy technologies/methodologies (microservices), bad requirements gathering, business team originated feature thrashing, technical decisions with obvious anticipated problems...
I'm just relaying what I hear from real teams out there, not intending to sell the architecture. So these are the beliefs I find on the ground, how honest and how based-in-reality they are are harder to tell and only slowly over time at any one individual team.
A lot of this is indeed about hiring though, I feel, at least as regards the enterprise spheres. Whether you can as a hire really in-effect "bring your own language" or not remains to be seen, but by deciding on µs architecture for in-house you can certainly more credibly make that pitch to applicants, don't you think?
Remember, there are many teams that have suffered for years-to-decades from the shortcomings and pitfalls of (their effectively own interpretation of / approach to) "monoliths" and so they're naturally eagerly "all ears". Maybe they "did it wrong" with monoliths (or waterfall), and maybe they'll again "do it wrong" (as far as outsiders/gurus/pundits/coachsultants assess) with µs (or agile) today or tomorrow. The latter possibility/danger doesn't change the former certainties/realities =)
The only teams I had to spend time on were the ones which were on a common DB before we moved off of it.
1. Is auto-enforced modularity, separation of concerns, etc actually better than enforcing these things through development practices like code review? Why are you paying people a 6 figure salary if they can't write modular software?
2. Is the flexibility you gain from this loose coupling worth the additional costs and overhead you incur? And is it really more flexible than a modular system in the first place? And how does their flexibility differ? With an API boundary breaking changes are often not an option. In a modular codebase they can easily be made in a single commit as requirements change.
3. Is bring-your-own-language actually a good idea for most businesses? Is there a net benefit for most people beyond attracting and retaining talent? What about the ability to move developers across teams and between different business functions? Having many different tech stacks is going to increase the cost of doing this.
I do see the appeal of some of these things, but IMO the pros outweigh the cons for a smaller number of businesses than you've mentioned. And the above is only a small sample of that. Most things are just more difficult with a distributed system. It's going to depend on the problem space of course, but most backend web software could easily be written in a single language in a single codebase, and beyond that modularization via libraries can solve a lot of the same problems as microservices. I'm very skeptical of the idea that microservices are somehow going to improve reliability or development speed unless you have a large team.
You easier to debug end-to-end tests of a microservice architecture that monolith? That's not my experience. How do you manage to put side by side all the events when they are in dozen of files?
I use Serilog for structured logging. Depending on the log destination, your logs are either stored in an RDMS (I wouldn’t recommend it) or created as JSON with name value pairs that can be sent directly to a JSON data store like ElasticSearch or Mongo where you can do adhoc queries.
IOrderService -> OrderService in production.
IOrderService ->FakeOrderService when testing.
For a certain class of applications and organizational constraints, I also would prefer it. But it requires a much tighter alignment of implementation than microservices (e.g., you can't just release a new version of a component, you always have to release the whole application).
It works similarly in almost every language.
Why is that an issue with modern CI/CD tools? It’s easier to just press a button and have your application go to all of your servers based on a deployment group.
With a monolith, with a statically typed language, refactoring becomes a whole lot easier. You can easily tell which classes are being used, do globally guaranteed safe renames, and when your refactoring breaks something, you know st compile time or with the correct tooling even before you compile.
It's not so much about the deployment process itself (I agree with you that this can be easily automated), but rather about the deployment granularity. In a large system, your features (provided by either components or by independent microservices) usually have very different SLAs. For example, credit card transactions need to work 24x7, but generating the monthly account statement for these credit cards is not time-critical. Now suppose one of the changes in a less critical component requires a database migration which will take a minute. With separate microservices and databases, you could just pause that microservice. With one application and one database, all teams need to be aware of the highest SLA requirements when doing their respective deployments, and design for it. It is certainly doable, but requires a higher level of alignment between the development teams.
I agree with your remark about refactoring. In addition, when doing a refactoring in a microservice, you always need a migration strategy, because you can't switch all your microservices to the refactored version at once.
That’s easily accomplished with a Blue-Green deployment. As far as the database, you’re going to usually have a replication set up anyway. So your data is going to live in multiple databases anyway.
Once you are comfortable that your “blue” environment is good, you can slowly start moving traffic over. I know you can gradually move x% of traffic every y hours with AWS. I am assuming on prem load balancers can do something similar.
If your database is a cluster, then it is still conceptually one database with one schema. You can't migrate one node of your cluster to a new schema version and then move your traffic to it.
If you have real replicas, then still all writes need to go to the same instance (cf. my example of credit card transactions). So I also don't understand how your migration strategy would look like.
blue-green is great for stateless stuff, but I fail to see how to apply it to a datastore.
But if you are testing an artifact, why isn’t the artifact testing part of your CI process? What you want to do is no more or less an anti pattern than swapping out mock services to test a microservice.
I’m assuming the use of a service discovery tool to determine what gets run. Either way, you could screw it up by it being misconfigured.
>But if you are testing an artifact, why isn’t the artifact testing part of your CI process?
It is and it shall be part of the CI process. Commit gets assigned build number in tag, artifact gets the version and build number in it's name and metadata, deployment to CI environment is performed, tests are executed against specific artifact, so every time you deploy to production you have a proof, that the exact binary that is being deployed has been verified in its production configuration.
>I’m assuming the use of a service discovery tool to determine what gets run.
Service discovery is irrelevant to this problem. Substitution of mock can be done with or without it.
If you are testing a single microservice and don’t want to test the dependent microservice - if you are trying to do a unit test and not an integration test, you are going to run against mock services.
If you are testing a monolith you are going to create separate test assemblies/modules that call your subject under test with mock dependencies.
They are both going to be part of your CI process then and either way you aren’t going to publish the artifacts until the tests pass.
Your deployment pipeline either way would be some type of deployment pipeline with some combination of manual and automated approvals with the same artifacts.
The whole discussion about which is easier is moot.
Edit: I just realized why this conversation is going sideways. Your initial assumptions were incorrect.
you may want to test only A: with monolithic architecture you'll have to produce another build of the application, that contains mock of B (or you need something like OSGi for runtime module discovery).
That’s not how modern testing is done.
> Your initial assumptions were incorrect.
With nearly 20 years of engineering and management experience, I know very well how modern testing is done. :)
What is an app at the system boundaries if not a piece of code with dependencies?
If you have a microservice - FooService that calls BarService. The "system boundary" you are trying to test is FooService using a fake BarService. I'm assuming that you're calling FooService via HTTP using a test runner like Newman and test results.
In a monolithic application you have class FooModule that depends on BarModule that implements IBarModule. In your production application you use create FooModule:
var x = FooModule(new BarModule)
y = x.Baz(5);
In your Unit tests, you create your FooModuleL
var x = FooModule(new FakeBarModule)
And run your tests with a runner like NUnit.
There is no functional difference.
Of course FooModule can be at whatever level of the stack you are trying to test - even the Controller.
I take your point, but it saddens me that there aren't better ways of achieving this modularity nowadays.
The legacy concerns I don’t see being true, as it’s mainly a requirements/documentation problem and you can achieve the same effect with feature toggles.
Right took for the job should be the goal as opposed to chasing fashion. Microservices are definitely overused, but they do have many legitimate use cases. Your CRUD web application probably doesn’t need microservices, but complex build systems might.
Imagine the complexity of Amazon.com or Netflix were a monolith. But something like Basecamp is probably better as a monolith.
There is also the issue of scale. An ML processor might need more (and different) hardware than a user with system.
Right tool (and architecture) for the job.
That was called Obidos and it's shortcomings were why Jeff pushed through the services mandate at Amazon.
You really do have to modularize. In some languages, you can even use separate compilation units for separate modules to enforce the separation.
You can do all of that but get simultaneous deployment, which cuts out whole classes of integration nightmares.
The real friction in the system is always in the boundaries between systems. With microservices it's all boundaries. Instead of a Ball of Mud you have Trees and No Forest. Refactoring is a bloody nightmare. Perf Analysis is a game of finger pointing that you can't defuse.
Services are not micro services. Most large scale applications can and should be split into multiple services. However, when approaching a new problem you should work within the monolith resisting the service until you absolutely can't any longer. Ideally this will make your services true services, that could capture an entire business unit. When it's all said and done you should be able to sell off the service as a business.
The other use case, which should be obvious, is compliance. If you are thinking about implementing anything that would require PCI or SOX you should do that in a service to shield the rest of the dev org from the complexities. So, any webapp that takes payment and interacts directly a payment processor.
That said, you're correct in that you should not be rolling out a new service to avoid sharding.
This is apples and oranges. The production profile of operating system kernels and web applications are so dissimilar that the analogy is not useful. It may be true that most web applications don't need to be split into multiple services, but the Linux kernel provides no evidence either way.
No architecture principle will help you if you design things the wrong way.
The point of Micro services architecture is to design large working systems, from smaller working systems.
And even kernels have kernel threads, which are basically local microservices. Anything which needs to scale beyond a single system is more deserving of microservices than a kernel.
Gotta put that CS Masters to work somehow. Can't just sit here doing plumbing all day every day.
Maybe there could be a software development corollary to the Politician's Syllogism, or even just a webdev one.
Similarly it’s made with make. If anyone has a project more complex than the Linux kernel or GCC I’ll gladly listen to why they need some exotic build system... never met anyone yet...
This exactly. Me too. Data consistency concerns in sufficiently large real world projects can be practically dealt with only 2 ways IMO: transactions or spec changes.
In terms of the common code divergence why not just use private NPM and enforce latest? Have a hard-fast rule that all services must always use the latest version of common.
Agreed but I also prefer to keep any changing storage data as a separate concern (s3 or similar).
So the trinity of services would be DB, storage, application.
Microservices, when constructed from a well-designed model, provides a level of agility I’ve never seen in 33 years of software development. It also walls off change control between domains.
My take from the Segment article is that they never modeled their business and just put services together using their best judgment on the fly.
That’s the core reason for doing domain driven design. When you have a highly complex system, you should be focused on properly modeling your business. Then test this against UX, reporting, and throughput and build after you’ve identified the proper model.
As for databases, there are complexities. Some microservices can be backed by a key-value store at a significantly lower cost, but some high-throughput services require a 12-cylinder relational database engine. The data store should match the needs of the service.
One complexity of microservices I’ve seen is when real-time reporting is a requirement. This is the one thing that would make me balk at how I construct a service oriented architecture.
See Eric Evans book and Vaughn Vernon’s follow up.
From this reality it's good to design everything as if it's a (micro|macro)service part of a larger landscape of apps.
Reality is also that you can never have transactions for everything across all your systems, so transaction alternatives like compensations are always something to deal with.
Transactions are a nice and convenient shortcut when they're applicable but they're far from mandatory.
In general the separation of the domain should be such that you don't need a transaction across services.
Although personally, I've never felt the need to try and apply it specifically, but the idea is interesting.
When she's discussing Compensations she mentions that the Transaction (T_i) can't have an input dependency on T_i-1. What are some things I should be thinking about when I have hard, ordered dependencies between microservice tasks? For example, microservice 2 (M2) requires output from M1, so the final ordering would be something like: M1 -> M2 -> M1.
Currently, I'm using a high-level, coordinating service to accomplish these long-running async tasks, with each M just sending messages to the top-level coordinator. I'd like to switch to a better pattern though, as I scale out services.
The only nit I have on that video is that after a great motivation and summary, their example application at the end (processing game statistics in Halo) didn’t seem to need Sagas at all. Their transactions (both at the game level and at the player level) were fully idempotent and could be implemented in a vanilla queue/log processor without defining any compensating transactions, unless there were additional complexities not mentioned in the talk.
Plus of course options around Paxos etc:
I think the key is doing something that works, for you, and care a little less about what other people are doing.
Most paradigms have their advantages and disadvantages and it’s really just about working around that to best utilize the stuff you have.
It's certainly a truism that technology can be cyclical, but that's not relevant in this case.
The OP's statement and article "Goodbye Microservices" is anecdotal and incorrect.
I have been on teams developing microservice architectures for about six years and this particular paradigm shift has proven to be a dramatic leap forward in efficiency, especially between the business, technical architecture, and change control management.
When you develop a domain model, the business can ask questions about the model. The architects can answer them by modifying the model. The developers can improve services by adopting model changes in code. This is the fundamental benefit of domain driven design and works fluidly with a microservice architecture.
There's still a pervasive belief in technology circles that software should be developed from a purely technical perspective. This is like saying the plumber, electrician, and drywaller should design a house while they're building it. They certainly have the expertise to build a house, and they may actually succeed for a time, but eventually the homeowner will want to change something and the ad hoc design of the house just won't allow for it. This is why we have architects. They plan for change within the structure of a house. They enable modification and addition.
Software development is no different. The Segment developers have good intentions, but they needed to work with the business to properly model everything, then build it. Granted, it sounds like they're a fast moving and successful business, so there are trade-offs. But once the business "settles", they really should go back to the drawing board, model the business, then build Segment 3.0.
Presumably someone with 33 years of software development experience is around 50 years old. How old do you think someone needs to be before they are qualified to comment on trends in software development practices? 70 years old?
I also would totally agree that companies should never just adapt "best practices" because those leads to super complex enterprise systems which are not necessary for most companies.
SOA and microservices are the same thing; microservices is just a new name coined when the principles were repopularized so that it didn't sound like a crusty old thing.
* Old technology is deemed by people too troublesome or restrictive.
* They come up with a new technology that has great long-term disadvantages, but is either easy to get started with short-term, or plays to people's ego about long-term prospects.
* Everyone adopts this new technology and raves about how great it is now that they have just adopted it.
* Some people warn that the technology is not supposed to be mainstream, but only for very specific use cases. They are labeled backwards dinosaurs, and they don't help their case by mentioning how they already tried that technology in the 60s and abandoned it.
* Five years pass, people realize that the new technology wasn't actually great, as it either led to huge problems down the line that nobody could have foreseen (except the people who were yelling about them), or it ended up not being necessary as the company failed to become one of the ten largest in the world.
* The people who used the technology start writing articles about how it's not actually not that great in the long term, and the hype abates.
* Some proponents of the technology post about how "they used it wrong", which is everyone's entire damn point.
* Everyone slowly goes back to the old technology, forgetting the new technology.
* Now that everyone forgot why the new technology was bad, we're free to begin the cycle again.
We can very easily break the cycle by training a deep learning TensorFlow brain in the cloud, that will be fed the daily mouse gestures and key presses of all developers in the world. It's an awesome new technology that can solve any problem.
Pretty soon the global brain will start to see patterns emerging, for example when developers post hype phrases on forums with unsubstantiated claims about the potential of some awesome new technology. As soon as a hype event is detected, a strong electric shock is commanded via the device the developer is using, thereby stopping the hype flow and paralyzing the devellllllllllllllllllllllllllll
> In the beginning the Universe was created.
> This has made a lot of people very angry
> and been widely regarded as a bad move.”
The new way to do it is not AI, but NI -- natural intelligence.
You take human babies, and you send them to schools and colleges where they learn programming.
Then you make them use daily mouse gestures and key presses to solve problems.
Its hard to explain, but it is the new leap forward.
The OP at the time suggest serverless should instead be re-branded Function as a Service which is a considerable improvement.
"Maybe he was taking dictation?!"
It's amazing how many stupid to the point of crazyness situations seem perfectly natural nowadays. Thank computers!
People really drink the koolaid that is written on these sites and it is extremely detrimental to their companies. PostgreSQL with a nice boring Java/.NET layer would blow this stuff out of the water performance wise (for their actual real life usecase), would be far easier to manage, deploy, find people for etc. I mean; using these stacks is good for my wallet as advisor, but I have no clue why people do it when they are not even close to 1/100000th of Facebook.
When the legacy systems started to hurt us (because they were written by the founder in a couple of weeks in the most hacky way), we decided against microservices and went to improve the actual code into something more performing and more maintenable,also moving from PHP5 to PHP7.
As much as we all wanted to go microservices and follow the buzz, we were rational enough to see that it didn't make any sense in our case.
Get ready for the surprise twist: It wasn't going well. I was hired as an expert JS consultant to advise them on which JS framework to use.
Experience helps a lot. When you know that complexity and too much diversity breeds tech debt, you learn to say "No" decisively.
No one above my pay grade seems to see a problem with this. But hey! REST! JSON! HTTPS! Pass the Kool-Aid!
 NAPTR records---given a phone number, return name information; RFC-3401 to RFC-3404
I witnessed someone that wanted to leverage their service into a promotion so they started pushing for an architecture where everything flowed through their service.
It was the slowest part of our stack and capped at 10tps.
Technology, in my experience, never seems to reward too much optimism or too much cynicism.
Something that does surprise me is that Panic's Coda and Transmit apps still seem quite successful, so maybe my perception is out of whack.
$_max = 1 - (H_c / H_h)
Does that make it the fault of the technology/pattern? I don't think so. I think it just means that there's no magic bullets in tech and people who don't know what they're doing will always cause problems no matter what models they follow.
The problem is that the all the best knowledge has clearly not made it out. For example, this design introduces a "Centrifuge" process that redirects requests to destination specific queues... congratulations you've just reinvented a message bus, a technology that goes back to the 80s. There is absolutely nothing new about virtual queues as described here but unfortunately the authors are likely not at all aware of the capabilities of real enterprise messaging systems (even free, open-source ones like Apache Artemis) and certainly not aware of the architecture and technologies and algorithms that underlie them and the (admittedly much more expensive) best-of-breed commercial systems.
(I won't even go into the craziness of 50+ repos. That's just pure cargo cult madness.)
"deemed" is the keyword here. Old tech is "deemed" bad, new one is "deemed" good. Without any numbers attached, just by way of hand-waving and propaganda. And it's all "deemed" Computer Science :)
I personally believe nested blocks produce a visual structure (indenting) that helps one understand the code by its "shape". Go-to's have no known visual equivalent.
Computers don't "care" how you organize software, they just blindly follow commands. Thus, you are writing it for people more than machines, and people differ too much in how they perceive and process code. For the most part, software is NOT about machines.
Dijkstra spent many, many pages on that. And it's still not a clear cut "this one is always better" case, as there are some obvious exceptions.
Do note that some actual coders have claimed that if you establish goto conventions within a shop, people handle them just fine.
I've written a lot on the fortran-inspired MS Basic when a child. I know quite well how bad they can become.
Yes, for example I love dense code. A 1 line regex is a lot simpler to me than the 50-60 lines of code equivalent.
Yet some programmers find the 1 line regex significantly more difficult even if they know regex.
"I'd like to welcome you to this course on Computer Science. Actually that's a terrible way to start. Computer science is a terrible name for this business. First of all, it's not a science. It might be engineering or it might be art. We'll actually see that computer so-called science actually has a lot in common with magic. We will see that in this course. So it's not a science. It's also not really very much about computers. And it's not about computers in the same sense that physics is not really about particle accelerators. And biology is not really about microscopes and petri dishes. And it's not about computers in the same sense that geometry is not really about using a surveying instruments."
In Portuguese it is never called Computer Science as such, rather Informatics, Computation or Informatics Engineering if I do a literal translation.
And those that have Engineering in their name, are only allowed to be called that way if recognised by the Engineers country organization as such.
My degree is simply "MEng Computing". It's recognized by the engineering association, although that's so irrelevant in most IT that I had to look up the organization: the "BCS (BCS - Chartered Institute for IT) and the IET (Institute of Engineering and Technology)."
 My job title includes the word "informatician".
With how far mankind has come, it's a little silly to think that "natural" systems will remain the only things that science is concerned with.
...uh, what computer science are you talking about? Formal verification is a huge part of CS, and provability is a tiny part of what makes science science - systematic study through observation and experimentation. Science is a discipline, not in itself a fact to be proved.
Also, what parts of CS do you think are inapplicable to general purpose computing?
I think the same could be said of the design world. There was a time not too long ago when designs actually felt polished and had real shapes, shadows, gradients. When you clicked on a button you actually knew you were clicking on a button. Then iOS 7 came along and everything became white and flat and buttons were replaced with text with no borders. I think we are slowly moving back to where we were ten years ago.
PS How long until people start ditching React for jQuery?
The problem you've identified is that most code, in general, is terrible. The code written by people who chase trends tends to be worse than average.
This. Instead of learning a handfull of technologies well they learn a lot of technologies very poorly. If I am hiring I now look at it as a red flag when people have too many frameworks listed.
Yeah, but JS buys you a lot. There are certain things that you can accomplish with JS that you absolutely cannot accomplish without it.
OTOH, anything that you can accomplish with React, you can accomplish without React. I'm with the GP on that one.
If something works for you and makes life easier then you should use it. There is no right answer. You just need to be honest with yourself when planning things out - am I using this technology because it's new and shiny or because it is the right tool for the job right now.
I am well aware. It was mostly a joke :]
You say that, but from time to time I still discover slight variations in browser behavior or bugs that were opened 8 years ago that would've been avoided if I had just used jQuery. Most modern frameworks will abstract away these differences, but sometimes you'll need to access the DOM directly.
SQL to me is a huge design flaw despite it's ubiquity. On the web bottlenecks happen at IO and algorithmic searches. Databases are essentially the bottlenecks of the web and how do we handle such bottlenecks? SQL; A high level almost functional language that is further away from the metal than a traditional imperative language. A select search over an index is an abstraction that is too high level to be placed over a bottleneck. What algorithm does a select search execute? Why does using select * slow down a query? Different permutations of identical queries causes slow downs or speed ups for no apparent reason in SQL. SQL is a leaky abstraction that has created a whole generation of SQL admins or people who memorize a bunch of SQL hacks rather than understand algorithms.
> Why does using select * slow down a query?
Because the database first has to perform a translation step to do an initial read from its system tables in order to enumerate the rows to be returned as a result of the final query.
> Different permutations of identical queries causes slow downs or speed ups for no apparent reason in SQL.
The key word there is "apparent", and again, just because it's not apparent to you, doesn't mean that it's not knowable and apparent to someone else. I also take exception to the concept of "permutations" of "identical" queries. Because if your query is permuted, it's no longer identical. The way you write your SQL has an impact on how it's evaluated. Just because you don't understand the rules, doesn't make it a mystery.
As a side note, I'd highly recommend reading up on the Relational Algebra that underpins SQL and other relational databases.
A good abstraction only requires you to know the abstraction not what lies underneath. What we have with SQL is a leaky abstraction. My argument that a high level leaky abstraction placed over a critical bottleneck in the web is a design mistake.
Back at my first big tech company, I remember reading the best document I have ever read related to software engineering. It was entirely devoted to choosing your database/storage system. The very first paragraph of the document was entirely devoted to engraining in your head that "choosing a database is all about tradeoffs". They even had a picture where it just repeated that sentence over and over to really engrain it in you.
Why? Because every database has different performance characteristics such as consistency, latency, scalability, typing, indexing, data duplication and more. You really need to think about each and every one because choosing the wrong database/not using it correctly usually cause the biggest problems/most work to solve that you will ever have to face.
You aren't responding to my argument, everything you said is something I already know. So lol to you. You're making a remark and extending the conversation without addressing my main point. I'm saying that the fact that you need "a general understanding of how a database works on the inside" is a design flaw. It's a leaky abstraction.
A C++ for loop has virtually the same performance across all systems/implementations; if I learn C++ I generally don't need to understand implementation details to know about performance metrics. Complexity theory applies here.
For "SELECT * FROM TABLE", I have to understand implementation. This is a highly different design decision from C++. My argument is that this high level language is a bad design choice to be placed over the most critical bottleneck of the web: the database.
The entire reason why we can use slow ass languages like php or python on the web is because the database is 10x slower. Database is the bottleneck. It would be smart to have a language api for the database to be highly optimize-able. The problem with SQL is that it is a high level leaky abstraction so optimizing SQL doesn't involve using complexity theory to write a tighter algorithm. It involves memorizing SQL hacks and gotchas and understanding implementation details. This is why SQL is a bad design choice. Please address this reasoning directly rather than regurgitating common database knowledge.
> For "SELECT * FROM TABLE", I have to understand implementation
This is not even remotely an apples-to-apples comparison. One is a fairly simple code construct that executes locally. The other is a call to a remote service.
It doesn't matter if the language you use to write it is XML, JSON, protocol buffers or SQL, any and all calls across an RPC boundary are going to have unknown performance characteristics if you don't understand how the remote service is implemented. If you are the implementer, and you still choose not to understand how it works, that's your choice, not the tool's. Every serious RDBMS comes with a TFM that you can R at any time. And there are quite a few well-known and product-agnostic resources out there, too, such as Use the Index Luke.
Alternatively, feel free to write your own alternative in C++ so that you can understand how it works in detail without having to read any manuals. It was quite a vogue for software vendors to sink a few person-years into such ventures back in the 90s. Some of them were used to build pretty neat products, too. Granted, they've all long since either migrated to a commodity DBMS or disappeared from the market, so perhaps we are due for a new generation to re-learn that lesson the hard way all over again.
>It doesn't matter if the language you use to write it is XML, JSON, protocol buffers or SQL, any and all calls across an RPC boundary are going to have unknown performance characteristics if you don't understand how the remote service is implemented.
Dude, then put your database on a local machine and execute it locally or do an http RPC call to your server and have the web app run a for loop. Whether it is a remote call or not the code gets executed on a computer regardless. This is not a factor. RPC is a bottleneck but that's a different type of bottleneck that's handled on a different layer. I'm talking about the slowest part of code executing on a computer not Passing an electronic message across the country.
So whether you use XML, JSON, or SQL it matters because that is the topic of my conversation. Not RPC boundaries.
>If you are the implementer, and you still choose not to understand how it works, that's your choice, not the tool's. Every serious RDBMS comes with a TFM that you can R at any time. And there are quite a few well-known and product-agnostic resources out there, too, such as Use the Index Luke.
Do you try to understand how C++ compiles down into assembler? For virtually every other language out there in existence I almost never ever have to understand the implementation to write an efficient algorithm. SQL DBs are the only technologies that force me to do this on a Regular Basis. Heck they even devoted a keyword called 'EXPLAIN' to let you peer under the hood. Good api's and good abstractions hide implementation details from you. SQL does not fit this definition of a good API.
If that doesn't stand out like a red flag to you, then I don't know what will.
>Alternatively, feel free to write your own alternative in C++ so that you can understand how it works in detail without having to read any manuals. It was quite a vogue for software vendors to sink a few person-years into such ventures back in the 90s. Some of them were used to build pretty neat products, too. Granted, they've all long since either migrated to a commodity DBMS or disappeared from the market, so perhaps we are due for a new generation to re-learn that lesson the hard way all over again.
In the 90s? Have you heard of NOSQL? This was done after the 90s and is still being done right now. There are alternative implementations to database API's that DON'T INVOLVE SQL. The problem isn't about re-learning, the problem is about learning itself. Learn a new paradigm rather than remark to every alternative opinion with a sarcastic suggestion: "Hey you don't like Airplanes well build your own Airplane then... "
And I just said that "choosing a database is all about tradeoffs" which you need to understand (aka: the leaky abstractions).
> A C++ for loop has virtually the same performance across all systems/implementations
> For "SELECT * FROM TABLE", I have to understand implementation.
No you don't, it has the same performance: a for loop. However, by grouping all of your data onto 1 server, for loops are much more costly than the likely orders of magnitude more regular servers you have than a database. Fortunately, your SQL database supports indexes which speed up those queries. Granted, I'm no database expert, but adding the right indexes and making sure your queries utilize them have solved pretty much every scaling problem I have thrown at them.
> It would be smart to have a language api for the database to be highly optimize-able. The problem with SQL is that it is a high level leaky abstraction so optimizing SQL doesn't involve using complexity theory to write a tighter algorithm. It involves memorizing SQL hacks and gotchas and understanding implementation details.
It is optimizable and 90% of those optimizations I have made simply involve adding an index and then running a few explains/tests to make sure you are using them properly.
If you'll only answer me this though, what database would you recommend than? I'm dying to know since you think you know better and google, a company that probably has more scaling problems than anyone else, doubled down on SQL with spanner which from what I have read, requires even more actual fine tuning.
And I'm saying the tradeoff of using a leaky abstraction is entirely the wrong choice. A hammer vs a screwdriver each have tradeoffs but when dealing with a nail, use a hammer, when dealing with a screw use a screw driver. SQL is a hammer to a database screw.
>No you don't, it has the same performance: a for loop. However, by grouping all of your data onto 1 server, for loops are much more costly than the likely orders of magnitude more regular servers you have than a database.
See you don't even know what algorithm most SQL implementations use when doing a SELECT call. It really depends on the index but usually it uses an algorithm similar to binary search off of an index that is basically a binary search tree. It's possible to index by a hash map as well, but you don't know any of this because SQL is such a high level language. All you know is that you add an index and everything magically scales.
>Fortunately, your SQL database supports indexes which speed up those queries. Granted, I'm no database expert, but adding the right indexes and making sure your queries utilize them have solved pretty much every scaling problem I have thrown at them.
Ever deal with big data analytics? A typical SQL DB can't handle the million row multi dimensional group bys. Not even your indexes can save you here.
>It is optimizable and 90% of those optimizations I have made simply involve adding an index and then running a few explains/tests to make sure you are using them properly.
I don't have to run an EXPLAIN on any other language that I have ever used. Literally. There is no other language on the face of this planet where I had to regularly go down into the lower level abstraction to optimize it. When I do it's for a rare off case. For SQL it's a regular thing... and given that SQL exists at the bottleneck of all web development this is not just a minor flaw, but a huge flaw.
>If you'll only answer me this though, what database would you recommend than? I'm dying to know since you think you know better and google, a company that probably has more scaling problems than anyone else, doubled down on SQL with spanner which from what I have read, requires even more actual fine tuning.
Part of the reason SQL has stood the test of time is the very fact that it allows such a high level of abstraction. The big problem that it solved, compared to much of what existed at the time, was that it allowed you to decouple the physical format of the data from the applications that used it. That made it relatively easy to do two things that were previously very hard: Ask a database to answer questions it wasn't originally designed to answer, and modify a database's physical structure without having to change the code of every application that uses it.
A lot of "easier" technologies - including, arguably, ORM on top of relational databases - make things easier by sacrificing or compromising those very features that allow for such flexibility. Which speaks to the grandparent's point about technologies that make it easy to get started in the short term, at the cost of having major disadvantages in the long term.
EXPLAIN does a very good job of explaining why one query is faster than another.
> SQL is a leaky abstraction that has created a whole generation of SQL admins or people who memorize a bunch of SQL hacks rather than understand algorithms.
This just smacks of sound bite material.
The high-level was the point because, in the original idea, there was a separation of concerns assumed: The dev writes, in SQL, what the DB should do and the DBA decides how the DB does it.
Of course that assumes there is a competent DBA...
Putting a high level leaky abstraction over the bottleneck of the web is a mistake. A language that is a zero cost abstraction is a better design choice.
Many of the problems you mention above occur because the database handles stuff for programmers. Sure. you could create a custom solution around your biggest bottlenecks, but do you want to create a custom solution for every query, or do you want the database to do it for you. The generation of SQL admins is a replacement for a much larger group of programmers that would be needed if they weren't here, and more importantly, an army of people to deal with security, reliability, etc. that people using a good RDBMS get to take for granted.
which to you would be more clear?
SELECT * FROM TABLE WHERE id = 56
binary_search(column_name=id, value=56, show_all_columns=True)
x = binary_search(column_name=id, value=56, show_all_columns=True)
y = dictionary_search(column_name=id, value=56, show_all_columns=True)
z = join(x, y, joinFunc=func(a,b)(a==b))
The example above could be a join. With the search function name itself specifying the index. If no such index is placed over the table it can throw an exception: "No Dictionary Index found on Table"
However, I think specifying the algorithms in the queries is really not a good idea. Your performance characteristics can change over time (or you might now know them at all yet when you start the project). With your solution, if you, e.g. realize later that it makes sense to add a new index, you'd have to rewrite every single query to use that index. With SQL, you simply add the index and are done.
Declarative yes, but unlike SQL my example is imperative. An imperative language is easier to optimize then a functional or even a expression based language (SQL) because computers are basically machines that execute imperative assembly instructions. This means the abstractions are less costly and have a better mapping to the underlying machine code.
>However, I think specifying the algorithms in the queries is really not a good idea. Your performance characteristics can change over time (or you might now know them at all yet when you start the project). With your solution, if you, e.g. realize later that it makes sense to add a new index, you'd have to rewrite every single query to use that index. With SQL, you simply add the index and are done.
Because the DB sits over a bottleneck in web development you need to have explicit control over this area of technology. If I need to do minute optimizations then am api should provide full explicit control over every detail. You should have the power to specify algorithms and the language itself should never hide from you what it's doing unless you tell choose to abstract that decision away...
What I mean by "choose to abstract that decision away" is that the language should also offer along with "binary_search" a generic "search" function as well, which can automatically choose the algorithm and index to use... That's right, by changing the nature of the API you can still preserve the high level abstractions while giving the user explicit access to lower level optimizations.
Or you can memorize a bunch of SQL hacks and gotchas and use EXPLAIN to decompile your query. I know of no other language that forces you to decompile an expression on a regular basis just to optimize it. Also unlike any other language I have seen literally postgres provides a language keyword EXPLAIN that allows users to execute this decompilation process as if they already knew SQL has this flaw. If that doesn't stand out like a red flag to you I don't know what will.
SQL on the otherhand... EXPLAIN is used on a regular basis, it's built in to the programming language and rather then just mark lines of code with execution time deltas it literally functions as a decompiler to deconstruct the query into another imperative language. This is the problem with SQL.
Having worked on a couple projects that used Mongo at the core, I wish I could say the same.
> * Everyone slowly goes back to the old technology, forgetting the new technology.
This step is just as misguided as cult-y as "Everyone adopts this new technology and raves about how great it is now that they have just adopted it."
In some cases the technology WAS the right idea, just implemented incorrectly or not sufficiently broadly, and the baby ends up getting thrown out with the bathwater.
I think that regardless of whether microservices works for anyone or not, they came about to address a real issue that we still have, but that I’m not sure anyone has fully solved.
I think that microservices are an expression of us trying to get to a solution that enables loose coupling, hard isolation of compute based on categorical functions. We wanted a way to keep Bob from the other team from messing with our components.
I think most organizations really need a mixture of monolithic and microservices. If anyone jumps off the cliff with the attitude that one methodology is right or wrong, they deserve the outcome that they get. A lot of the blogs at the time espoused the benefits without bothering to explain that Microservices were perhaps a crescent wrench and really most of the time we needed a pair of pliers.
Does the service need an independent and dedicated team to manage its complexities, or is it a "part time" job? Try a Stored Procedure first if its the second.
Is the existing organization structure (command hierarchy) prepared and ready for a dedicated service? (Conway's law) Remember, sharing a service introduces a dependency between all service users. Sharing ain't free.
Do you really have a scalability problem, or have you just not bothered to tune existing processes and queries? Don't scrap a car just because it has a flat tire.
Thesis --> Antithesis --> Synthesis
Antithesis: microservices in separate repos
Synthesis: microservices in a monorepo
Coming to this topic, I see Microservices as a solution to the problem of Continuous Delivery which is necessary in some business models. I can't see those use cases reverting back to Monolith architecture. For such scenarios, the problems associated with Microservices are engineering challenges and not avoidable architecture choices.
But still, how is the initial comment an example of the Hegelian Dialectic?
But none of this implies that we are losing knowledge, just that the curve of engineers is fatter at the inexperienced level.
GraphQL could wither like Backbone and Angular and nobody would really notice or care. An industry-wide shift away from SPAs would be something else entirely.
Sure, it's easy to work with at first but then you start to realize that every framework and every language has different ways of doing all the niceties that you now expect. Asset pipelining, layouts, conditional rendering, and template helpers all end up becoming stuff that every language has to individually develop with varying levels of success.
The issue I have with Node JS is that the entire ecosystem moves way too fast. However, when I need realtime interaction via WebSockets fe. it really feels like a good choice.
Node is no less stable, today, than Ruby or PHP, and is only arguably less stable than Python because Python has largely ossified.
SQL database usage is still an ongoing bummer, though.
Same happened with a Rails/React project that was setup last year, i tried to update one package that was marked as vulnerable, but it ended up requiring the same thing. I opted to leave that package.
I've been working with node for a long time, but i feel like is the same level of stability since the start. Thats very different with Ruby or PHP, you can have old setups working just fine, maybe requiring some extra steps to build certain dependencies but overall working.
2) Can you explain to me how updating a Gemfile or composer.json file is not going to result in a similar dependency cascade? 'Cause, from experience, it certainly will if the project isn't dead. About the only environment I've ever worked in where keeping up on your dependencies on a regular basis isn't required is a Java one--and that's assuming you don't care that much about security patches.
As an arbitrary example, consider the deps of two similar packages, delayed_job (Ruby, https://github.com/collectiveidea/delayed_job) and Kue (JS, https://github.com/Automattic/kue).
Delayed_job has two dependencies if you're running on Ruby (rather than JRuby): rake and sqlite3. Neither rake nor sqlite3 have any dependencies of their own - in production mode, of course.
On the other hand, Kue has nine direct dependencies, each of which have their own. The full dependency tree of Kue has one hundred and eighty two separate dependencies.
And, moreover, it basically doesn't matter. Pick what you like, change it later if you care. (You probably won't.)
Has it? I'm pretty sure if I were to ask what those best practices and tools are, I'd get a number of different answers.
But I bet they'd be the same ones those people used two years ago, or at least a really close variation.
webpack is six years old.
npm is eight years old.
nodejs is nine years old.
Anecdotal, but I personally feel like things have stabilized a great deal.
Likewise, yarn in place of npm.
And Yarn is a drop in replacement for npm. Took me 10 minutes to learn. If you already know npm, there is almost zero learning curve.
The code seems fine. But I don't really trust anybody who is that insistent on owning the changes made to my database schema--I am certain it is well-intentioned but it makes me itch. Although I will say that there's an interesting project that creates entities from a database that I need to examine further and see if it's worth using to get around TypeORM's unfortunate primary design goals.
 - https://github.com/typeorm/typeorm/issues/2453
 - https://github.com/Kononnable/typeorm-model-generator
Objection.js is an ORM for Node.js that aims to stay out of your way and make it as easy as possible to use the full power of SQL and the underlying database engine while keeping magic to a minimum.
^^ copy+pasted from github
- I was worried at first because I progressively trust non-TypeScript projects less and less, but the test suite looks fine and they have official typings so there's that mitigation at least.
- I really don't love the use of static properties everywhere in order to define model schema. Which is probably a little hypocritical, because one of my own projects does the same thing, but IMO decorators are a cleaner way to do it that reads better to a human.
- Require loop detection in model relations is cool. I like that.
- In general it's a little too string-y for my tastes. I think `eager()` should take a model class, for example, rather than a string name. Maybe behind the scenes it uses the model class and pulls the string name out of it? But I think using objects-as-objects is a better way to do things than using strings-as-references.
 - https://github.com/eropple/taskbotjs
JS has improved massively over the last few years, it's very nearly an entirely different language than what it used to be.
There's a reason everyone was desperate to avoid writing JS not all that long ago be it the form of coffeescript, silverlight, flash, or GWT. ES5 JS sucks. Like... a lot.
As does most of Hacker News with every technology they hate on.
For example: web UI started as declarative (HTML -> PHP), and then transitioned towards more imperative (jQuery, Backbone, Angular), and is now moving back towards declarative (React, Polymer)
Each HTML document describes what it wants to have rendered, but does not describe how it should be rendered.
E.g. "Let's get really good at microservices so we don't need a monolith." IOW there's no consensus about when to dig in and go deeper. And even if there was, some people would leverage that information and try to pull ahead of the others digging in.
At the moment serverless vendors make that hard, and the frameworks (like serverless.com) are still emerging to make that simple.
For me the problem is most people don't think through the cost/benefits. Amazon don't ever get tired of saying "never pay for idle" in their sales pitches, but quite a lot of applications out there are never idle and can be quite accurately managed in terms of scaling, and therefore you're actually paying a premium for something you don't need.
See: The Lindy effect https://en.wikipedia.org/wiki/Lindy_effect
For these projects it seemed to make a ton of sense given the rest of the stack, the nature of the projects, the deadlines I'm facing, and I've really loved working with it so far.
Why did you end up moving off of it?
I thankfully don't have to worry about power loss (large company, teams dedicated to critical systems infrastructure in a big way), and a cron job can handle any archival concerns in the projects I'm running here.
It does sound like they've improved much since 2011.
Anyway thanks for the insight
This should be the top post on articles like this. Just to keep it fresh always in everyone's mind.
The thing with design is that it isn't a science or formal field of logic. I can use math to determine the shortest distances from point A to point B but I can't prove why a design for product A is definitively better than a design for product B.
With no science we are doomed to iterate over our designs without ever truly knowing which design was the best.
Maybe someone needs to create a cloud based serverless SaaS SPA web app developed in F# to help track this stuff and prevent it happening in the future.
Also, by merging Merb in Rails 3, Rails got a fresh lease of life. This is similar to how Struts lived on by rebranding WebWork as itself.
Finally, Ruby shows up prominently as a language developers dislike: https://stackoverflow.blog/2017/10/31/disliked-programming-l...
And then there's those altogether dropping Elixir for Go...
If it was easy to figure this stuff out, they wouldn't pay us to try :)
And Ruby was a major leap forward ~ in productivity.
Are stuck using the new tech, because implementing it burned bridges with the old tech.
Many would call that bad practice, I call it static typing benefits