More

lizztheblizz · 2025-03-13T18:17:03 1741889823

Hi there, PS employee here. In AWS, the instance types backing our Metal class are currently in the following families: r6id, i4i, i3en and i7ie. We're deploying across multiple clouds, and our "Metal" product designation has no direct link to Amazon's bare-metal offerings.

lizztheblizz · on Nov 1, 2022

Lambda was a means to an end for us here, and we're not specifically endorsing its use in _this_ way. Our goal was explicitly to test our ability to handle many parallel connections, and to observe what that looked like from different angles.

We're a DBaaS company, and we do need to be prepared for anything users may throw at us. Our Global Routing infrastructure has seen some major upgrades/changes recently to help support new features like PlanetScale Connect and our serverless drivers.

From our point of view, this was a sizing exercise with the interesting side benefit that many people do happen to use Serverless Functions similarly.

lizztheblizz · on Nov 1, 2022

Article author here. I fully endorse Aaron's correction and appreciate the call-out.

For context: I initially wrote this paragraph to include more flavor and history around crash recovery challenges with relational databases, implying that while your data might be safe, it is still 100% preferable, even today, to avoid crashes by accurately sizing AND limiting the database to live within its means. Crash recovery can still take a certain amount of time, and when people are weighing whether to bring their app up faster or maintain their data integrity, taking a shortcut in a high pressure situation is sadly not unheard of.

Alas, in my editing, I opted to spend less time in the weeds there, and without the proper context, the use of the term "data corruption" lost all meaning, and no longer belonged in that sentence. Totally fair correction.

lizztheblizz · on Nov 1, 2022

So, disclaimer, I am not an Amazon Billing wizard, but given that we ran the Lambdas from an isolated sub account, I can be particularly certain that I was able to filter this down accurately.

We hit the one million connections total probably between 10-20 times over the course of a couple of days, and probably spent at least another 20-30 runs working our way up to it, testing various things along the way. Keep in mind, these were all very short-lived test runs, lasting maybe up to 8 minutes at the most.

Our total bill for Lambda in the month of October came out to just over 50USD.

lizztheblizz · on Nov 1, 2022

Absolutely agreed. As others have already pointed out, there is no underlying implication of preference to this kind of application architecture. Since we do run an actual DBaaS, one of our main internal goals in running these experiments was to specifically test our Global Routing Infrastructure, and construct a scenario that allowed us to help size specifically those components for capacity planning.

As long-time DBA's ourselves, we do as much as we can to educate and empower users to architect their applications wisely... but we still need to be prepared for the worst. As it turns out, Lambda was an easy way to accomplish that. :)

lizztheblizz · on Nov 1, 2022

Article author here, interesting question! We didn't run into that issue, explicitly.

Our setup was effectively as follows: - AWS Lambda functions being spawned in us-east-1, from a separate AWS sub account. - Connections were all made to the public address provisioned for MySQL protocol access to PlanetScale, using port 3306. The infrastructure did also reside in us-east-1. - Between the Vitess components themselves, and once inside our own network boundaries, we use gRPC to communicate.

Since the goal we set was to hit one million, and realizing we were staying just barely within the limits of the Lambda default quotas, we didn't aggressively try to push beyond that. Some members of our infrastructure team did notice what appeared to be some kind of rate limiting when running the tests multiple times consecutively. Many tests before and after succeeded with no such issues, so we attributed it to a temporary load balancer quirk, but it might be worth going back to confirm if this is the behavior we saw.

sulam · on Nov 2, 2022

Two hypotheses — one of which you can falsify easily. Perhaps Vitess is doing port concentration? Ie dispatching requests made by multiple clients over fewer db connections? This is quite typical to do.

The other is that you may have simply had a fast enough query that Little’s Law worked out for you.

lizztheblizz · on May 18, 2021

12+ years of massive scale production use for Vitess and 26+ years of hardening for MySQL and InnoDB. PlanetScale adds some (imho) great features on top of that, but it's standing on the shoulders of giants that have proven themselves over and over.

... Also, I guess it's MySQL-compatible rather than Postgres-compatible? :)

lizztheblizz · on May 18, 2021

Already on it. :) https://vitess.io/docs/reference/vreplication/vreplication/

lizztheblizz · on May 18, 2021

To be clear, this is not a Vitess/PlanetScale-specific opinion or choice. Foreign key constraints are a bit of a controversial topic in large-scale MySQL environments in general, which is the greater context in which this design decision was made by the Vitess team.

PlanetScale's (and Vitess') non-blocking schema changes rely on open source tools for MySQL like pt-online-schema-change and gh-ost, which are widely used in production environments everywhere, and neither of them are too comfortable supporting FK's, though pt-osc does accommodate them to some extent (https://www.percona.com/doc/percona-toolkit/3.0/pt-online-sc...). gh-ost's lack of support was discussed on HN previously here: https://news.ycombinator.com/item?id=16983620

A good collection of resources on why they're considered problematic and many companies designing large-scale MySQL schemas tend to drop them can also be found here: https://federico-razzoli.com/foreign-key-bugs-in-mysql-and-m...

Jarwain · on May 19, 2021

Do you know if foreign keys tend to be a problem on postgresql as well?

lizztheblizz · on May 19, 2021

I don't have nearly the experience in Postgres environments to have seen the same level of real-world impact there, but a quick search presents me with the following documentation, which seems to indicate mostly similar performance challenges related to the use of Foreign Key Constraints: https://www.postgresql.org/docs/13/populate.html#POPULATE-RM...

lizztheblizz · on May 18, 2021

Vitess' compatibility with MySQL has made major leaps in the past couple of versions and the team has started focusing on locking in ongoing compatibility with various popular development frameworks. You can find those here, and more are getting added regularly: https://github.com/planetscale/vitess-framework-testing/

The basics of MySQL compatibility are described in here, though it's important to keep in mind that just because something "works" doesn't always mean it's the best way to do things in a sharded environment: https://vitess.io/docs/reference/compatibility/mysql-compati...

ngrilly · on May 19, 2021

Thanks! Not having window functions and CTEs is a significant limitation for any kind of data analysis. But I guess the main use case is pure OLTP where it is – a bit – less relevant.