Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Shopify Ruby on Rails distributed monolith runs 19M queries per second on MySQL (twitter.com/shopifyeng)
204 points by HatchedLake721 on Nov 29, 2023 | hide | past | favorite | 142 comments


I worked at a fortune 10, backing up our mysql db daily took 8 hours. We had a job that processed data that also took 8 hours. We had a small window of time during work hours to fetch the data, if you happened to query when either service was running, you'd probably not get any data back.

Few months in, both services reached 12 hours to run. We had to change the start time just so they'd happen back to back. I rolled up my sleeves and got to work on the db.

A week later, the job's runtime was down to 2 hours. I was nearly promoted out of the job. I kept making improvements to the database, reorganizing the application, and optimizing. One day I manually ran the job, then went to get coffee. When i came back, prompt was awaiting my next command. I thought it had silently failed. I ran it several times throughout the day and checked the response. It ran in ~17 minutes. The backup was also reduced to less than 20 minutes. This was all on mysql 5.7

We literally gained 23 hours of availability. We had no idea what to do with it. I was fired shortly after.


Can you go in some high level details as to why it was slow and what you did to make it fast. That's always the most interesting part of a post like this.


The db connections were poorly managed. Each query started by starting a new connection. It then checked if it failed, slept, retried, then ran. There were several queries called in loops, so the connection pool was always dry. The code spent most of it's time sleeping. Then the queries themselves were highly inefficient, bad joins, no indexes, etc. It was satisfying to fix the mess.


I had two similar experiences in the past years: I added some database indices, the application become super fast, the rest of the team starts acting weird, I get shown the door.

At this point I'm pretty sure anything IT relate became a bullshit job.


I've done something similar, you start out with something naive and simple like mysqldump that takes an age, and move on to more specialised tools like Percona XtraBackup that allows for incremental backups.


Any tips on similar tools for Postgres?


Depends on environment. If you can do disk snapshots that's the way to go (can be hard with disk striping). wal-g works for storing both base backups & wal to various storages in parallel & can be throttled with env variables

Source: worked on Azure's managed Citus pg


> I was fired shortly after.

Such a simple summary of how companies will chew you up and spit you out.


> We literally gained 23 hours of availability. We had no idea what to do with it. I was fired shortly after.

Unfortunately the case when you do your job too well, always leave a good 15-25% on the table to keep yourself 'required'.


Ha, that was quite a twist at the end there. Was this due to your self-created redundancy?


"job security through database complexity"


or embarrassment and concern on the part of the people who created this system before him.

I've made similar improvements (40x speedup, 8x reduction in RAM footprint) and had been shown the door. This was for software that was a bit over 30 years old too so it was a bit involved.

The team winning does not help a narcissist feel better. You'd be better off getting nothing done while stroking their ego daily.


it seems like they did you a solid showing you that you were wasted working in that environment


What reason did they give for firing you?


Why do so many people on this forum have kittens whenever someone points to a success story about Rails and Ruby? It's like the hamburger people can't stand it when someone enjoys a plain hotdog with ketchup and mustard. It's tiring. Get a life, and just enjoy your bland, overpriced, gourmet, too-big-to-fit-in-your-mouth, Instagrammed burgers.


heh, I know right? I see all those comments shitting on ruby and rails, and it's like, yeah, I get it dude. They hate rails, but luckily nobody is forcing them to use it. The more choices the better. Some people hate dynamic types, some people hate static types. There's room for both, imo. It's not like it's a secret there are faster or more efficient stacks.

I guess they can't help themselves and need to make their position known. Personally, I love ruby, and rails is pretty good too.


Neat, it seems like Shopify's product design choices make it easy to shard across database servers: https://shopify.engineering/mysql-database-shard-balancing-t...


Was running a rails monolith with up to 300 concurrent users and a huge growing database on a $20 VPS and free cloudflare.

Sure it only had highly optimized queries (something rails makes kinda easy too), but that's what you are supposed to do anyway.

There wasn't ever a need to recode to run it on a $10 instance. Nether do I think 99% of all database driven projects, including mine, are ever going to reach as many concurrent users.

IMO: talking about performance for ANY major coding stack is just premature optimization


What does distribute monolith mean? Is it just a monolith app deployment with distributed (master and replica) db servers in this case?


I'm sure there are many variants and definitions, but the company I'm at runs one.

The gist of it is that there's one codebase with multiple separate "modules". This codebase is packed and linked as a library and then we build different super slim hosts that load different parts of the monolith in production containers. Usually just different environment variables or config.

But locally, we can run the whole thing in one process. We're.using .NET so `dotnet run` brings up the whole app. Whereas we might run parts of the app in different console hosts in prod, locally they are hosted background services in-process.

From a debug perspective, this is super awesome since you can just launch and trace one codebase. If we broke it out into 3-4 separate services, we'd have to run 3 processes and 3 debuggers. 3 sets of configuration, 3 sets of CI/CD, 3 sets of testing. Terrible for productivity.

We have parts of the system connected to SQS for processing events and if we need more throughout, we simply start more instances of the container all running the same monolith.

I think GCP is probably one of the best platforms for building modular monoliths because of its tight orientation around HTTP push.


I implemented this in one of my past startup jobs. Basically a core banking system that implemented multiple roles of the system, including open banking. Depending on config, it would act as a bank, as an OB service provider, as an OB registry, as a merchant, etc. In addition, it was built in a way that instances, if allowed could talk to each other in 2 ways:

1. As part of the same entity, so you could scale your operation.

2. As part of an ecosystem, so you could for example create an entire open banking network or just a regular network with bank transfers and card payments using proper protocols such as ACH, ISO8583 and ISO2022 for example.


We call it "multi-tenant applications" or "role-based applications": https://www.youtube.com/live/Zk0Il6I5MQI?si=BFVL3JkHaj1hcGrZ

And there is a separate concept of a configurable application which can completely reshape its component graph according to some high level configuration flags (like database=prod|dummy), we call it "multi-modal applications".

And we created the perfect tool for wiring them: https://izumi.7mind.io/distage/index.html


I'm looking to break an old java monolith application in something like that: modular code base, single deployment artifact, multiple configurable use cases. I've yet to find a good existing tool to compose the application at build time, other than using a godawful lot of maven profiles combinations


Hmm; at least in the more recent versions of .NET, Microsoft has really cleaned up the runtime host paradigm so that it's consistent across console (think background services, timer jobs, pull-oriented processing) and web (classic HTTP).

For us, it just becomes a matter of configuring the correct construction of the dependency injection container at host startup using some flag (usually environment variable) to pick the right bits and pieces to load into the container and which services to run from the monolith.

Then each of the host "partitions" gets its own Dockerfile.


You might want to read this post from Shopify's engineering blog https://shopify.engineering/deconstructing-monolith-designin...


Often this is view as counter microservices pattern. Microservice architecture assumes each service has its own data storage and is logically independable from other services.

Monolith on the other hand is a single application with all of the logic in one place.

Distibuted monolith is a set of applications\services like in microservices pattern but they can share common data storage and depend on each other.


Data Storage and Compute should be separate orthogonal issues, it's not needed in this comparison.

Stateful vs Stateless.

Your monolith is a binary that gets distributed to hosts to perform some function. The binary has multiple entry points that can be envoked. Most calls are via internal library call.

Microserverices (also stateless) have a different artifacts for each component, services call other services via a private API (often grpc/httprc).


Good point. I was just trying to (awkwardly) say that monolith can also be split into a separate parts communicating via a private API.


blasphemy, this should immediately be moved to microservices and AWS and cost optimized and each component should be scaled separately (maybe some components only need 0.5 CPU, think of the cost saving t3.micro could bring and you have a highly available 3AZ potato to make sure it never goes down!!)

Then to handle any load we need to build autoscaling and spin up the toaster to medium potato and 20 instances (this costs 60k a month, but no worries we only pay for what we use so we will only run this for 27 minutes during our big sale).

Oh what wonderful world we live in and the pain we inflict on ourselves.

GJ Shopify for running a sane (tm) tech stack :)

(btw of course Rails scales it's shared nothing setup, spin up infinite app servers as long as the db can handle it. It's pretty expensive for compute though)


Maybe you haven't worked on a large production Rails app? Rails takes 3-4x the infrastructure hosting costs as other languages/frameworks. Not to mention the insane testing costs from Ruby test bloat.


This could / was definitely true in the 201x. When most of the world's developers 's wages hasn't gone up evenly, and cost / infrastructure performance was expensive.

But we continue to see cost / performance improving. Ignoring the Rails framework and Ruby VM both together has likely gotten 2x speed improvement. We will get 96 Core Graviton v3 or 128 Core Zen 4 EPYC. Cost / Core performance is coming down and will continue to do so at least until Zen 6.

Depending on your App, somewhere along the line it will surely lean towards Ruby Rails's flavour. Assuming you do value what Rails have to offer.

I am just waiting for fibre/ async ( or something similar ) to be built into Rails and Active Record, along with even faster RubyJIT.


I worked at a very large Ruby shop where errors in production were very expensive. This meant that we spent many times more money on instances running the test suite for every build than we spent on all production servers combined.


I got burnt out on Rails after the third app in a row that I was responsible for upgrading. I appreciate Rails' contribution to web development. It took about a decade for the front end framework-library ecosystem to figure out that MPAs are more effective than SPAs for most apps. Fortunately, Astrojs is part of the ecosystem. It's sortof like the fullstack JS version of Rails/Sinatra...without the ORM which IMO adds incidental complexity.

If you have a thriving business already using Rails, it's difficult to justify moving off of Rails...now that painful upgrades seem to be mostly in the past. However, I do find isomorphic JS components & state management to be a pleasure to develop & maintain compared developing an app in two languages.


I had to upgrade a project from rails 4.2 to 5.0 and it took like 3 weeks because things just broke randomly.

Upgraded a .NET 4.7 project to .NET 6 in 3 days and haven’t touched it since. And the .NET project was larger.

I haven’t done rails since that upgrade and I hope i never have to touch it again.


Just to highlight; That's 6 major versions of .NET (so at least 7+ years in between). Including the Core era.


come checkout phoenix. you get a lot of the same batterries included benefits of ror but the architecture is more moduler. runtime performance is closer to go-lang and there's a really fantastic websocket and reactive frontend system built in. Erlang OTP takes it a whole other level beyond that too if you're trying to scale up.


> runtime performance is closer to go-lang

If you squint, yes. Erlang has had NIFs and ports since forever precisely because it isn't great at CPU intensive tasks.

In most cases however, you're still looking at an order of magnitude less servers than the equivalent Rails/Django application.


If you plan the updates since the beginning, and are up-to-date with what's coming next, you are good. Having the app dualboot, running it in your CI to catch upcoming changes is a easy way to avoid such headaches


This is an ironic comment on a Shopify story, they have wasted a lot of time upgrading Rails https://shopify.engineering/upgrading-shopify-to-rails-5-0


yeah people learned with time. Github had the same problem too.https://github.blog/2018-09-28-upgrading-github-from-rails-3..... in 2023 it is so well documented, and so easy to automate, that shouldn't happen anymore.


It's not easy to automate. As you know Github now works directly off the Rails main branch. They have both core Ruby and core Rails maintainers on staff. They have to find, fix, approve, and merge, introduced bugs every day, as well as deal with Rails bugs themselves in the main branch that emerge. This is a huge overhead cost for keeping software up to date, and does not (and should not) apply to any other company.


That's absolutely not a huge overhead, quite the contrary it has many benefits.

We do the same at Shopify, and running off the edge allow to catch bug much sooner and identify them much easier.

It also very significantly cuts down on maintenance cost because we no longer have to work-around bugs, we can fix them upstream and so a small update.

As for you pointing at the Rails 5 migration taking a very long time, it's true that certain major migrations were a pain this one in particular, but it's because a major API had been removed (attr_protected). We (both Shopify and GitHub) work with the edge also in part to make sure the community won't have to suffer this kind of harsh migrations ever again.

All this to say you are blowing things out of proportion. The Ruby and Rails teams at Shopify and GitHub are not an overhead, they pay for themselves, and any tech company as large as Shopify or GitHub you will find major contributors to the stack they use. e.g. There used to be a JVM team at Twitter (half smaller than Shopify even at peak).


I'm not surprised you don't consider the team you work on as overhead. But it is just that. This is a Rails only problem. This is not a problem that applies to other engineering ecosystems, because upgrades are easy to trivial in other ecosystems. It is a problem that shouldn't exist. It makes sense that the only way Shopify has found to do efficient Rails upgrades and keep using Rails is to dedicate a team to it and subsidize Rails core development. But this isn't a model other companies should follow.

I don't see the JVM team as the same thing as a team dedicated to working off a project's core. The JVM team looks closer to the YJIT project. Making a JIT also calls into question Shopify's scalability: Rails was slow enough that they allowed a JIT to be built internally? That's quite the trade-off to make.


> The JVM team looks closer to the YJIT project.

That's the same team (Rails + Ruby)...

> Making a JIT also calls into question Shopify's scalability: Rails was slow enough that they allowed a JIT to be built internally?

You are conflating scaling ability with language speed, there is of course some relation between the two, but it's mostly orthogonal.

At the scale of Shopify (several thousands developers) having a few people focus on improving Rails and Ruby is a drop in the bucket and payoff immensely.

Rails and Ruby are both Open Source projects not backed by for-profit organization. It's perfectly normal for an user of such project to contribute patches... That's how Open Source is supposed to work...

There is plenty of organizations contributing patches to the Linux Kernel (Google, IBM, etc). Using your logic that means Linux is slow enough that Google allowed a new scheduler to be built internally?

That's quite the silly line of reasoning...


I have just spent weeks at $dayjob resolving Java dependency issues I wouldn't wish on my worst enemy. As much as I dislike Rails, it's nonsense that this is a Rails-specific issue.


guess what? 99% of the apps are not github, neither shopify. You add to your pipeline some automation with https://github.com/fastruby/next_rails and you are done. I'm maintaing a rails app from 2006, with more than 1000 LC (excluding the 2k specs and views) and we are running on rails 7.1. We had one critical moment around rails 5.2 , but after that we managed to keep our stack always up to date.


Did you really mean 1,000 and 2,000 LOC for tests? Or did you mean 1000k or something else? If you meant 1,000, that is an objectively tiny app, and not relevant to the difficulty of maintaining large Rails applications.


sorry, i forgot some zeros ;)

checking again:

CODE: 198235 LOC total

SPEC: 127708 LOC total


Well the rest of the comment still holds. Rails doesn't get easier if you'd split it up into 200 Microservices.


Infrastructure is cheap. Labor is expensive.


I agree labor is expensive, which is why you shouldn't use Rails. Shopify brags it took them only a year to upgrade Rails one major version https://shopify.engineering/upgrading-shopify-to-rails-5-0


That's from 6 years ago. Shopify has used Rails' main branch since 2019.

https://shopify.engineering/living-on-the-edge-of-rails


Infra is cheap when you're in hyper build mode. When your growth rate stalls and your expenses are greater than your income, infra gets expensive very quick and will kill your company.


a) Shopify has been deployed to the cloud since day one.

b) They run multiple app servers so each component can in fact be scaled separately.

c) They built their own PaaS which runs on Kubernetes.

d) You would have to be utterly incompetent to not care about optimising costs.

But I am glad we both agree that Shopify is doing a great job running a sane stack.


Just to clarify, Shopify spent about ~10 years running on metal in multiple DCs until ~2015/2016 when it moved most of the workloads to GCP.


I guess the more interesting data point is not how many queries it served but a what the cost was for serving them.


I love it. Apologies if you didn't mean it this way, but given the usual context of this forum, the insinuation here is so clever: Imply that it takes significantly more "compute" costs for a Rails platform, as opposed to what it would take if it were running on something "more performant," say, Java, and the JS front end flavor of the week.

I would argue that the computing costs of any computing endeavor (short of AI, at this point) are dwarfed by the human costs, and my personal experience is that Ruby/Rails is at least a 10x headcount savings over the programmer sprawl required for Java/JS.

Java and JS sure are good for job security. I'll give them that. Throw in an unequivocal demand for Oracle, and Cisco networking gear, and you have the whole Fortune 500 world that got dumped on us from the people who couldn't manage the mainframes, either.


Ultimately the interesting thing to know is why they decided to stay on Rails instead of rewriting everything in a more performant language.



Modern CPUs are fast, and it's trivial to scale compute instances if you need that.

The database would be the bottleneck and that's got nothing to do with rails - it's all mysql.


CPU cost is the obvious driver of who will win in this space, ergo, the correct optimization is to rewrite in assembly.


Don’t they also have a ton of custom ruby stuff like their own jit?



Shopify did develop a jit, but it's part of the main ruby runtime (they did other performance improvement too, all available in the standard ruby).


That they can solve it at all is critical information. One could have expected the stack to not be able to horizontally scale this far. You are right though, that there is also a question of a trade-off between hosting cost on one and and the developer hours, risk, opportunity cost and lower hosting cost that come with a rewrite on the other for which one would need to know hosting cost among other critical factors.


Gigachad Monolith vs virgin micro service.


A while ago I read shopify hosts stores in pods. Pods are self contained instances that runs everything inside it, that includes mysql, redis, etc. does that mean this number is collective counting all the pods and not a single database?


Cool, MySQL scales. What does RoR have to do with it?


Would love to see the infrastructure bill. Anything is possible if you throw enough $$$ at it.

I’m not a ruby hater, but the average joe can’t accomplish this. If you consider each store an individual instance with an isolated/shard of a db it makes sense. But the underlying foundation is immense.

Partitions are usually the key to scale.

I’d imagine parts of their infrastructure would be better served by different runtimes. They’d save a lot of money.

But if your entire team is hyper focused on ruby there is something to be said for a huge monolith.


> the average joe can’t accomplish this

The average joe does not need this.

If you have decent margins on the pages you're serving, Rails is fine. Where you might want to investigate other things is if you're, say, an ad driven business with really thin margins and you want to minimize costs. Or if you've got things dialed and you're just not changing things much any more and you want to eke out some savings.

And in any event, Rails is a good choice to figure out the problem space you're working in. Even Twitter started with it, and objectively, Twitter is very much not the sweet spot for Rails.


If your app is just a postgresql database that needs to be exposed with auth, access control, etc then yes rails is fine.

If you’re doing a tremendous amount of parallel processing it will fall over without throwing lots of compute and scaling horizontally. Rails doesn’t scale vertically. You need to give it compute, and every other resource it uses will also need more compute. Average joes are fine with a 2 core VPS. Lots of businesses are not.

I need you to send 5-10 million API calls per day to 20 different API’s … are you using Rails? Every API is rate limited differently, with different batch sizes (each having unique parameters) and it needs to literally be done ASAFP to make certain deadlines. If you want to throw money at the problem, sure. If you want to do the same thing with 1/10th of the resources you’ll use a better runtime like Erlang/Elixir or Clojure/Golang something with CSP.

HN is so quick to dismiss things like kubernetes and wax poetic about simpleton life but there are very legitimate reasons to choose alternative tools for your problem space.


The key to scaling Rails applications is effective use of caching (at multiple levels).

For an online store (compared to, say, a live action game) so much of the content that you serve will be cached that regardless of your apps runtime a lot of the user experience will be defined by how effective your caching strategy is.

I think that Rails still offers enough compelling advantages for developer productivity to offset the (possibly) higher hosting costs.

Although they probably spend most of their time on the application layer, I suspect the most important and trickiest work was done to scale the database.


Elixir's a lot of fun, but it's not the most performant thing either; it just does concurrency pretty well.

It seems most really big companies make the boring but safe choice to go with the JVM.


Depends on what you’re doing. Elixir isn’t winning the war on raw compute but if you need to do literally 2 million things at once it can’t be beat. It’s ostensibly a drop in replacement for rails (Phoenix) but will do everything rails does better including background tasks.

JVM is indeed a powerhouse and that’s where I’d go Clojure without hesitation. Scala might be an easier sell but I’m a lisp fan and have used Clojure in prod (was briefly CTO at FarmLogs a YC company where most of our core infra was Clojure) and it’s really an amazing language but it requires very senior engineers to do correctly.


> I need you to send 5-10 million API calls per day to 20 different API’s … are you using Rails?

Yes.


We stand on the shoulders of giants - in this case Shopify. Once they have done things once, it is easier for the rest of us


Having built similar software, at a smaller scale, I think you're massively overestimating the complexity of shopify.

At it's core, what they provide is (largely) read only set of products, and a (largely) append only set of purchases.

The only write operation that needs to lock the database is when you adjust the quantity of available inventory. You need to pay attention and think things through to do that with good performance, but it's not that complex. and they wouldn't be doing millions of sales per second.


You can probably build an 80% Uber or Twitter simply. It’s the 20% that’s the rub. Did you know the Uber mobile app has thousands of native screens, for example? [1] it’s common to believe that a small simple version is the same as the scaled version but that’s rarely the case.

[1]: https://news.ycombinator.com/item?id=25376346


They might save money on infra, but their speed of development would probably go down, potentially requiring them to either hire more people or deliver features slower. The impact on overall cost might not be that great.


Shopify hasn’t released a new feature since Obama was in office.


Most software organizations (remember most are not FAANG) will be better served by paying the $$$ to have their engineers focused on building more/better features for customers.


True up until a point. Once you plateau and your infra spend is $25-50k+ per month it’s wise to start re architecting.


Rearchitecting doesn't necessarily entail changing technology/platform


> If you consider each store an individual instance with an isolated/shard of a db it makes sense. But the underlying foundation is immense.

Each instance is probably just a k8s pod

/s


My Master's thesis was on high performance distributed computing. And my conclusion was you likely don't have a problem that is hard enough to justify it. Thank you for promoting reality!


Microservices usually solve organisational problems not the technical ones. Like here it is probably very painful to deploy to the production.


Microservices solve revenue problems at cloud hosting and CI companies.


And maintenence and onboarding.. I guess to find the right balance is the key. I tend to go with monolith and use microservices to offload heavy dutie tasks or to benefit from other languages and framework when they are a better fit for a specific problem domain


That doesn't sound like "microservices", but like a monolith with a few services split out where it makes sense. People have been doing that for a million years, long before the web and certainly long before "microservices" was coined. It's just "using common sense" basically.


I'm not aware of such nuanced definitions. For me a microservice is a one endpoint service, designed to do one task, with full introspection, and deployed on the "cloud". As fair as I know, I can orchestrate them with monoliths and they still microservices.. or where you draw the line between services and microservices? The size of the service itself or with whom it interacts?


There is no One True Definition™, but I think most people understand "microservices" as "an application where splitting the functionality up in small services is core of the architecture", or something along those lines.

"We split off 2 things in to small services because it made sense" is rather different. I mean, I don't really care if you call this 'microservices" I guess, but it is different, right?


yup, monolith goes very long only to the point where you have hundreds of engineers or some very specific niche technical problem that I would start thinking about microservices.


That Rails app pipeline deploys hundreds of PRs a day to `main`

https://shopify.engineering/software-release-culture-shopify


> Microservices usually solve organisational problems

Or create organizational problems.

- "Who owns the flip-flop service?"

- "That's Bob's team, we fired them last summer!".


"Who owns /monorepo/shopify/infra/cloud/flip-flop/api/foo_bar.rb"?

"That's Bob's team..."


It would be without the tooling, yes. Shopify has a merge queue thingy that you can just shove you MR into and it will eventually get around to deploying it. It even gives you a ping on slack when your changes are about to go live IIRC.


Oh, "painful" is to work under the sun. Installing a program in a computer is far away from painful.


But almost everyone benefits from high availability.

So unless you're using managed services e.g. RDS you're going to be exposed to the same complexities as distributed computing.

Especially with the cloud where instances can die at any point.


How is that the case? This example uses a distributed MySQL cluster which was of course tuned for high performance. Similarly the Rails app is distributed as well. Arguably the Rails app likely wouldn't qualify as "high performance", but it's distributed.


It amazes me that even when we have numbers people still dismiss Rails.

Nothing will run at that scale on a single VPS. All companies will have a wide range of languages used.

If this is not Rails supporting high traffic then what do we need more?


Sorry, I love Rails, but because something can scale (which I never thought it couldn't) doesn't make it a high performance system. That's totally fine, Rails makes other tradeoffs that IMO are more universally useful, even though some people seem to not be able to understand that server cost for most companies is tiny compared to developer cost


For some reason, some people with discount any example of Rails scalability as not counting.


They're talking about "distributed" as in a system of services communicating, rather than just copies of the same monolith across multiple instances. The former adds communications and synchronisation over heads and complexities of failover for every extra service introduced


That's a totally bizarre definition. Having worked on a high-performance in-memory data grid for the last eight years, I can guarantee that you'll get all the fun distributed systems problems even with a single code base. That definition also excludes pretty much all famous distributed systems like most databases, messaging systems like Kafka and Rabbit etc.

What you seem to be getting at, isn't distributed systems, but the totally self-inflicted pain of a service oriented architecture


> Having worked on a high-performance in-memory data grid for the last eight years, I can guarantee that you'll get all the fun distributed systems problems even with a single code base.

Having spent the last 28 years building distributed network-connected systems, this comes across as wildly obtuse.

The point is that there are orders of magnitude differences in complexity when scaling a system with few communications paths and little distribution of state across process or network boundaries as there is when scaling one with many paths and state distributed in many locations. We don't tend to start talking about distributed systems when you have a tiered stack of a horizontally scalable component sandwiched between a load balancer and a database even though in a very strict technical sense already that is "distributed".

Once you start adding message queues etc., then it certainly becomes more and more reasonable to talk about a distributed system, but there is there as well a distinct grey area if dealing with e.g. queues just triggering jobs in the same code base against the same database with respect to the intent clearly expressed by the original comment.

Put another way, ignore the word "distributed", re-read the original comment, and consider that irrespective of which label you're comfortable with, what the comment is doing is drawing a distinction between two classes of systems with wildly different complexity in the distribution of responsibility and state. Where precisely you draw the line is entirely irrelevant.

> What you seem to be getting at, isn't distributed systems, but the totally self-inflicted pain of a service oriented architecture

No, it really was not. This separation between basic 2/3 tier apps and systems with a more complex data flow pre-dates the SOA buzzword literally by decades.


Maybe the distinction here would be one of which scope the respective maintainer cares about. For Shopify MySQL is mostly a black box, they don't need to re-implement their own atomic commit protocol, network partition detection etc., since MySQL did that for them. Implementors of MySQL did have to solve these distributed systems problems though and pick their CAP trade-offs, but I guess that's not the scope Shopify cares about here.


Aren't the full set of these numbers definitionally "high performance"?


Oh, I read the parent comment to thank them for confirming that "you likely don't have a problem that is hard enough to justify it". But reading it again, it could be read both ways.

Edit: To be clear, I agree that this is an example of distributed, high-performance which is why the comment made little sense to me.


Yes, if you take distributed to just mean "the same code on multiple machines". The GP above probably means "different code on different machines interacting" which brings its very own set of problems.


By that definition pretty much any problem you study in distributed systems theory, can occur in a system that doesn't fit that definition and the most well known examples of distributed systems like distributed data stores, message queues etc aren't distributed systems.


My impression is that it's simply harder to get promoted as an engineer in the industry by using boring, sustainable, unexciting solutions that have been used by everybody else and their dog. How do you even stand out that way? Looks bad on the resume, like you didn't even try. Great for the business, but maybe terrible for one's career?

You could turn that one Rails app into a complex microservices architecture and do a conference talk about it, and get a promotion. Then you can undo the microservices architecture, write a blog post about returning to the majestic monolith, and do another conference talk about it, maybe get another promotion. Abstract, de-abstract, bundle, unbundle, rinse and repeat.

It feels like a tragic situation that's nobody's fault, just the reality of human psychology being wired to reward the wrong things.


Microservices is 20 years old.

The only thing boring are the people on HN who are still having this played out monolith versus micro-service argument.

Whereas actual developers in the real world have moved on and realised that both are useful in different situations.


I absolutely agree with this. I sometimes use the phrase optimising for the resume.


> optimising for the resume.

Otherwise known as RDD (Resume Driven Development) - https://rdd.io

We value:

- Specific technologies over working solutions

- Hiring buzzwords over proven track records

- Creative job titles over technical experience

- Reacting to trends over more pragmatic options


I like to think I coined 'Resume Driven Development' but chances are I'm dev #3747 to say that phrase



Hey #628 is way better! Thanks!


Progress!


Oh no I want my pizza size teams with five thousand Microservices written in Clojure, Haskel, Erlang and Scala.

The cost argument about this monolith is just a straw to clutch at. Microservices are not cheaper than a monolith. Operationally or infrastructure wise. Logs, monitoring, tracing and what not for each Microservice.


microservices are a solution to certain problems. if the stack is already pretty diverse, needs a lot of separate teams, hiring is hard, coordinating deployments is hard, etc, then it makes sense.

Shopify probably looks more like what you ridiculed than not. we can guess that it's not one big team, and it's not hundreds of identical copies of this big monolith (but configured during deployment to run in different roles).


> if the stack is already pretty diverse, needs a lot of separate teams, hiring is hard, coordinating deployments is hard, etc, then it makes sense.

So what you are saying is that microservices are not a solution to a technical problem but a solution to organizational problems?


I'm saying that microservices as a tech-cultural phenomena (or even era) was a response to a very specific set of constraints. The whole cloud thing was new, hiring was hard, expertise was scarce, business was booming so loud people got deaf from simply thinking about it, the church of true scalability was omnipotent, tools were crude, marvelously maintained monorepos were only artifacts of seriously wet [FAAN]G-fueled dreams, etc.

It made sense for Netflix, because they had a big Cassandra cluster, too much money, and a very picky organizational/hiring culture, and so on.

https://www.infoq.com/presentations/microservices-netflix-in... (https://www.youtube.com/watch?v=TOM6UhCetQ0)


This is great, but it would also be great if they accept other solutions than monolith Ruby on Rails on MySQL for their systems design interview rounds.

I’ve interviewed twice and there seems to be a resistance from engineers both the times I argued that MySQL wasn’t the right approach. Sure, MySQL could possibly run any use case possible in the world if you throw enough engineering resources at it and highly optimize for that use case, but why do that if there is another engine specifically designed for your use case. The argument was “Shopify runs on MySQl and we can handle millions of queries… blah… blah…”.


I dislike MySQL as much as the next guy, but if you go into an interview-situation and argue that their technology choices are wrong then that is a big red flag not so much because you're wrong, because you might be 100% right, but because it is likely to indicate to interviewers that you will be difficult to work with, possibly pushing for changes they don't want to happen, and that you won't read social and political situations within the team well - for starters by arguing with an interviewer.

It's fine to argue with interviewers, but the threshold where it indicates to both sides that you're probably wrong for that job is pretty low.


As much as producing 'technically sound' decisions is paramount so is being able to work with other people, namely your boss whose decisions are always technically sound if asked. ;)


I think if asked it's fine to probe how much disagreement your boss has the stomach for. Sometimes you're being brought on because they know they need to fix things. But, yeah, don't start to rip apart technical decisions they appear wedded to from the first moment. Not least because a lot of the time as you say 'technically sound' decisions aren't everything.

A whole lot of technical decisions we - me included - have very strong opinions about don't really have that much of a measurable effect on technical outcomes.

I once, many years ago, had a conflict with someone reporting to me because I refused to entertain rewriting our entire frontend in Rails. This was just after Rails was released, and we had a lot of PHP code. We had PHP code because PHP frontend devs were "cheap" and plentiful, not because any of us liked PHP.

I agreed with him about the preference for Ruby, and we used Ruby for other things (ironically all our Ruby use was on the backend), but he kept pushing on the basis of no insight into the relative market conditions for hiring at that time and on the basis of making assumptions about how "trivial" a rewrite was because he had no insight into why our then-current frontend had all of the capabilities it had which he'd chosen to ignore because he didn't know the roadmap.

He went to my boss - the CEO and co-founder - and tried to get me fired. The CEO went to me and asked if I wanted to fire him instead. I didn't, but I did have a rather serious chat with said developer about our respective roles, and how it was about more than technical preferences, and how he might get a lot further if he actually tried to work more constructively with me instead of thinking he had the clout to get me fired.

Said CEO hired me to run a development department again in my previous job, while he to my knowledge has never again worked with that developer - looking like a troublemaker to the wrong person can have long term effects.


I am not arguing their decision is wrong on their real world system. It is a system design interview with a hypothetical situation completely unrelated to their prod system, where I suggest a different db than MySQL. They are the ones arguing that my decision is wrong and MySQL is the correct choice, at which point you have to defend your choices.


> both the times I argued that MySQL wasn’t the right approach

Looking at it holistically, MySQL is ALWAYS going to be the correct approach, if you already have an internal team of MySQL DBAs. Either your problems are small enough that it really doesn't matter if you use CSV files, MongoDB or MySQL or they are large enough and important enough that you want to stick with technology you know and understand, even if it requires 25% extra hardware.

We did a project where we where looking into OpenStack, which was objectively the technological correct choice. Factoring in training, ramp up cost and hardware, it made more sense to just pay VMware.

Without knowing you, I suspect the issue is in how you answer such questions, not whether or not that you're technically correct. I'd go with the route or presenting two options, the one you find to be "the right approach" and the one that fits into the company's current infrastructure. Coming mostly from the operation sides of things, I find that developers can be pretty clueless about the cost and complexity of operations. Often to the point where you wouldn't trust them to design anything unsupervised, because doing so would end up in an operations nightmare.


i am not arguing their decision is wrong on their real world system. It is a system design interview with a hypothetical situation completely unrelated to their prod system, where I suggest a different db than MySQL. They are the ones arguing that my decision is wrong and MySQL is the correct choice, at which point you have to defend your choices.


Okay, then yes that's not normal. I'd never tell a candidate that they are just flat out wrong, that honestly seem unprofessional and kinda aggressive. I still might say something like "So I'd go with MySQL here myself, can you guide me through your reasoning for picking another database system or which circumstances that might cause you to go with MySQL as well?".


Have you worked at Shopify before to know that MySQL is not the right approach?

If not, how do you know it was wrong for them?

FYI Uber migrated from Postgres to MySQL https://news.ycombinator.com/item?id=26283348


> but why do that if there is another engine specifically designed for your use case

And what engine is that ?

MySQL has been proven by Meta and numerous others to scale to ridiculous levels.


what do they use to shard MySQL? are they using upstream? fb's fork? what storage engine? innodb? myrocks? at millions of q/s "mySQL" is a bit meaningless :D


imagine what they could do with a real stack and programming language ;-)


Such as?


They may be bragging about that, but they still make people use Liquid and tell their customers that features were released when they're in beta for another 3-6 months.


Liquid is pretty phenomenal piece of engineering IMO


What's wrong with Liquid?


It can't loop arrays with indices from a variable for one. Its basically a 90s scripting language that has had no growth.


> But Rails doesn't scale so what are we even doing

This without context is meaningless. What is the cost in $$ and engineering time to scale to that level? Would a native image be able to scale to the same level at half the total cost?


You seem to imply that you know better than the CEO if Rails is a good fit both from product and cost perspective. Unless you are the CFO or have inside information I will be skeptical of the idea that I - an outsider of Shopify - would know better than them what would works best for them in terms of tech stack and costs savings.

Even if I would be a consultant for them I will first try to understand the current situation and then imply a native image will have half the total cost is a good solution. What if reducing the hosting costs will actually damage their speed of pivoting and adapting to changes?


The ability to scale is independent from the cost to build / change that thing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: