Hacker News new | past | comments | ask | show | jobs | submit login
Scaling Databases at Activision [pdf] (sched.com)
241 points by ksec on April 21, 2023 | hide | past | favorite | 120 comments



These numbers make more sense on hardware from 5 years ago (to be fair, what they were dealing with), but postgres, on modern consumer hardware (Yes, consumer! Ex: KC3000 + Ryzen 7950x), already does over 50k QPS without being overburdened. You could probably squeeze out 100k QPS with faster NVME's.

That's like 10 tall servers for their peak QPS? (500k QPS on slide 19)

Lots of natural sharding points here too: company, game, usage of data, etc.

This is assuming you avoid or go light on expensive features like foreign keys (as Vitess already does).

The "scale magic" in Spanner (Big Table) inspired DB's are just hidden automation of traditional sharding-

CockroachDB: Sharded indexes, then runs map reduce for you.

Scylla/Cassanda: Sharded indexes again, but more limitations for speedups: Eventually consistent. You don't have fast delete- only update (discord uses tombstones). JOIN's are "in app" only.

Vitess: Proxy that dismantles/routes your query to the correct server. This scales, but is eventually consistent. JOIN's on co-located data, or "in app".... in simple terms, Vitess is like an externally managed "in app" query parser/router.


I think describing newsql databases as "just" hidden sharding is underselling them substantially. The hard part is in the details of distributed transactions and consistency, among other things.


Yup. Consistency is by far the biggest advantage. Sharding isn't incredibly hard but it makes every query think about it for consistency purposes. Often you have a natural sharding key and it isn't much of a concern at all (example isolated user accounts) but at some point you need to go cross-shard and it gets very complicated (or you give up consistency and you get complicated bugs).

In theory a distributed SQL database can be a huge developer win because of this. Of course there are usually other tradeoffs to consider as well.


I see 500k qps with 30TB data doubling every 2 years on slide 19.

In my experience me this isn't close to being possible with a single Postgres box. There is no way around sharding this workload.

Vitess has different consistency models, it's not as eventually consistent as you describe.


You'd certainly want to shard from the start in any case, but it's an order of magnitude less postgres boxes for the QPS.


Wonder what the tradeoff on connection overhead would be? I've never been around a connection pooler that would be anywhere near this load.


Speaking only for postgres 15+, but in my testing, app drivers have been great about facilitating "connection pooling" without a separate server. Eliminates connection overhead at the app level for many projects.

Golang and ASGI Python (FastAPI, Starlette, Sanic) can do a small connection pool per worker, each worker can handle thousands of users at once.

PHP in 2023 has database connection persistence using FPM + PDO (effectively, one big connection pool shared between all workers on a server). Again, eliminates connection overhead- works well.

Just remember to raise your max_connections in postgres, like in this: https://gist.github.com/gnat/cfe3754c3dc817c7fb8b2225ef4db62...


Not too much if connections are persistent. A problem is that pgbouncer is 1 CPU only so you need a couple of them which makes the pooling less efficient.


Does pgbouncer tend to run out of CPU? Or just at the large scale this article is about?


I have a 100k qps workload and we run 10 pgBouncer instances. I ran into the CPU limit with 6. Connections to pgBouncer are persistent so it's not caused by connection churn. The amount of incoming connections is in the low hundreds per instance. I tried without SSL as well and it didn't make a measurable difference.

I'm not quite sure if this is a common problem or if it's our workload. It always made me wonder if moving pooling into Postgres is really a good idea.


I really wonder about the justification for using MySQL/Postgres sometimes. How is it not simply superior in almost every way to start with ScyllaDB which is faster in every way and the only downside I can see is that you have a very limited way to query things. But so what, at least it's fast. I mean isn't that the nr.1 thing you want from a database, it's not about comfort.. it's about storing data. It is genuinely annoying though for example that you can't even alter a column type after the column was created in ScyllaDB.


> the only downside I can see is that you have a very limited way to query things

Which is a huge downside.

> But so what, at least it's fast

So is an RDBMS for most read/write workloads. Few will see an overall benefit from Cassandra and the labor needed to replicate the conveniences offered by an RDBMS.

> it's about storing data

It's not about storing data, it's about using data to accomplish business objectives with the promises that entails. You opt for something like Cassandra/Scylla because it solves a technical challenge otherwise untenable or more expensive, but it's a poor choice if your problem is reasonably solved with what Postgres can offer.


It's a great point- I've been looking into this path myself: ScyllaDB + logic inside app.

The main issue I've run into is deployment. It's just very rough compared to CockroachDB or Postgres, which are single binary or very close to it.

If ScyllaDB became as easy to set up as CockroachDB (or close-ish), they'd have a lot more users, I think.

You can make Postgres setup this easy: https://gist.github.com/gnat/cfe3754c3dc817c7fb8b2225ef4db62...

CockroachDB, also dead easy: https://www.cockroachlabs.com/docs/v22.2/start-a-local-clust...

ScyllaDB is an enormous mess of daemons and ports: https://university.scylladb.com/courses/scylla-operations/le...


yes the easy setup with crdb is really attractive but then you realize that they hide a few very important features behind the paid option. No incremental backups and no partitions (I think of any kind...). Those are some big features to lose, obviously I'm not complaining, just comparing. That was the main reason why I started developing my project in MySQL because at least MySQL has a lot of different free providers. YugabyteDB exists for Postgres too but initially I was more impressed with Vitess and TiDB because TiDB is used by some really huge chinese companies and if they don't have a problem then it must be okay. Unfortunately TiDB essentially doesn't want you to run a single node, ever, and their smallest recommended deployment is 5 nodes. That's a lot. Citus also exists for Postgres but their docs basically tell you that it's only recommended for analytics so I don't know what to think of it. And it sounds like all it's doing is a basic master-slave postgres setup with quite a few manual things you have to do to even benefit (manually altering tables to make them sharded/partitioned)


I totally hear you, been down a similar path.

As for the "huge chinese" databases, not to sound like a conspiracy, but CockroachDB has outperformed them in my server load testing (stress test with very basic "SELECT FROM users" and "INSERT INTO users" on KC3000 + Ryzen 7950x)- so I'm beginning to seriously question all of the self-published benchmarks from mainland china.

That, and when one considers they have a similarly rough deployment story compared to ScyllaDB, is it maybe worth just going directly to ScyllaDB in the first place?

One other damming thing: I've also observed that all Spanner (Big Table) inspired databases have lower throughput compared to Postgres when under 5-ish servers: Aka you only begin to see throughput benefits after you have a sizeable cluster.

ScyllaDB / Cassandra is a different architecture, though.

Heck even Vitess, pretty rough deploy story as well (they really push k8s on you), but that's why PlanetScale exists, IMO.


One nice thing I found today is that ScyllaDB has a pretty nice JSON feature, they let you insert and select directly via json strings. I can't do that in MySQL which is annoying, I could do something equivalent in Postgres. One of my biggest complaints about SQL in general is the datatype mapping / serialization and if I can simply use a json library in my language of choice then I save on a lot of work. https://www.scylladb.com/2019/01/10/json-support-in-scylla/

But you are right about how important it is to have a simple deployment, it's true for all of the databases. Sure there is docker but that usually has its own problems, like with scylla where it wouldn't properly forward ports for whatever reason so I could never connect to it. Then I had to do all kinds of manual fixes and install it on a VM to edit configs to make it run. This is really the stuff that makes you want to use crdb and be done with it all, even though it's slower and eventually requires payment.


So ScyllaDB allows you to choose how much Consistency in CAP you desire, in order to gain performance. Pretty interesting.

In contrast to CockroachDB, Postgres, etc, where Consistency in CAP is always enforced and there's no way to turn it off.

You turn it on/off by simply specifying IF NOT EXISTS which is insanely simple: https://www.scylladb.com/2020/07/15/getting-the-most-out-of-... (called "lightweight transactions" or LWT)

I wish CockroachDB had a way to trade consistency for speed when desired.

Another sidenote, in my limited investigation so far- when using LWT in ScyllaDB: performance of CockroachDB INSERT and ScyllaDB INSERT both line up fairly evenly. Still investigating, but this makes sense- Only when you avoid IF NOT EXISTS, ScyllaDB pulls away massively in performance.


One strategy I've considered:

Write Postgres in a way that's fully compatible with CockroachDB, with the intention of transitioning to CockroachDB if needed.

Basically: 1. BIGINT or UUID or TEXT or composite Primary Keys only. 2. Limited use of sequential indexes. (Ex: timestamp. Not the end if the world but creates hotspot, will require a hash sharded index) 3. Avoid advanced features such as stored procedures. 4. Avoid or very light use of foreign keys. (Performance)

Going straight to CockroachDB seems logical as well, but you'll hit the performance wall much sooner on 1-2 servers, and will require a cluster to match it.


if crdb had some kind of perpetual fallback license like Jetbrains then I would be far happier with the prospect of eventually switching to crdb as an option. But "contact us" pricing is not exactly unscary pricing. If you think about it, even the infamously opaque Microsoft SQLServer has a pricing table nowadays. Therefore I tried to calculate some kind of pricing comparison, I'm looking at the cpu alone now. I'm seeing that they ask for $0.46/h for m5.large and $6.34/h for their m5.8xlarge (managed dedicated offering) Using the aws price calculator and selecting these instance types directly it would cost x4 less to rent the aws instances directly. Paying 4 times your raw server cost for licensing is not exactly a dream come true.

but scylladb really has so many weird footguns. I just discovered that you can't even query for NULL values. It just doesn't work, they have no support for WHERE x = NULL even with an index. That's crazy to me, if I ever make a mistake and have a few NULL'ed values then I can't even find them again so I can set them to an empty string to make them queryable. I just don't know, I almost feel like scylladb has too many tradeoffs.


Found a solution to address the lack of WHERE x = NULL

SELECT * FROM users;

Then you can plug your PK into.

UPDATE users SET address='' WHERE user_id IN (9affeac3-5d92-111d-779c-55eb6d78a806, ...) IF address=null;

Then you can do your SELECT to do your repair.

SELECT * FROM users WHERE address='';

_______________________

Another footgun: No default values for columns in CREATE TABLE. Very annoying.

Another footgun: Only "=" and "IN (...)" is supported in WHERE for partition keys. https://docs.scylladb.com/stable/cql/dml.html#the-where-clau...

This means, you cannot do UPDATE WHERE user_id > 0 ....

I'm starting to really appreciate the effort CockroachDB went to, to make their version of "CQL" postgres compatible, even though architecturally they are both key-value store databases under the hood... it's just a shame CockroachDB just has far lower performance because of enforced consistency and no way to turn it off.

To be fair, ScyllaDB is removing footguns with every new version, just not fast enough for my taste.


Can confirm the inability to select null values. Have not yet found a reasonable pattern / workaround.. The reasoning seems to be that null values are not stored-- it's displayed as null in csql, but in reality, that data just does not exist.

For the record, CockroachDB does not store null either, but it allows querying null.

A workaround may be stepping through the full table manually using the PK (which can never be null) checking for null "in app" and cleaning them that way.

Noticed INSERTS are treated as UPSERT unless you use IF NOT EXISTS... ScyllaDB doesn't give a f__k if it's overwriting an existing row, lol. I can live with the UPSERT footgun, though. Agreed the null footgun is far more annoying.


We're running ScyllaDB on Docker via Nomad and it's been an absolute breeze. One simple container with a handful of open ports. I don't get what's complicated about ScyllaDB deployment.


Hi Nathanba, I am a technical program manager for the Citus extension at Microsoft.

> Citus also exists for Postgres but their docs basically tell you that it's only recommended for analytics

Could you point me at the docs that made you think that? We find Citus very good at multi-tenant SaaS apps (OLTP), IoT (HTAP) workloads and analytics (OLAP).

> And it sounds like all it's doing is a basic master-slave postgres setup with quite a few manual things you have to do to even benefit (manually altering tables to make them sharded/partitioned)

This is true for now. We are looking into ways to make onboarding easier. That said, the time spent on defining a good sharding model for your data often leads to very good perf characteristics. Regarding the architecture, I personally find Citus closer in spirit/design to what Vitess is doing. Additionally, every node in the cluster is able to take both writes and reads, so I don't see the parallel to a basic primary/secondary setup.


I'm looking at this page: https://www.citusdata.com/use-cases "Multi-Tenant SaaS" to me very clearly sounds like there are features specific towards isolating the database into different customer domains ("tenants"). Which to me implies that this is sort of a scaling trick that isn't going to provide performance for a regular application where I can't partition my data like that.


The main difference is eventual vs strong consistency. Scylla scales great, but then the developer has to guard the consistencies. And that is quite challenging job, that usually trips even the most experienced developers.

There is a middle ground, storage systems/databases that both scale and provide strong consistency. FoundationDB, Citus, etc.

SQL databases are not only easier to query, but also enforce data consistency. While querying is their core job, it's hardly the only one.


> the only downside I can see is that you have a very limited way to query things

That’s a pretty big one. If you know you need the scale, the trade off might be worth it. But I’m 99% of projects, Postgres is more than adequate and has great flexibility.


yeah but I don't think that we can just trust that when people say this. They are invested into these single node databases for 10+ years probably. They are not going to tell you: Hey yeah, if you architect your app differently you could have x10 the performance.. but don't because it's a bit annoying when you query data. I mean is that being objective? Lots of people are not okay with the performance of mysql/postgres, that's the whole point of solutions like crdb/tidb/singlestore/vitess/yugabyte/citus and so many others. To tell someone that they should start with mysql or postgres is the equivalent of telling them that they'll have to migrate in the future and relearn a whole bunch of stuff about the operations, like backup etc. I don't really like the feeling of choosing a technology with the mindset that I'll have a lot of problems when there is a performance problem and at some point the pain will be so great that I'll have to switch to vitess.


It’s really not just about annoyance it’s about correctness and speed of iteration. Many apps that can be built trivially in a couple SQL queries cannot be implemented correctly in Scylla without reinventing half of tidb/vitess/etc. On top of that, it’s very hard to hire and pay the engineers who are capable of doing that in the first place. Finally, even if you manage to do it, your competitors will have shipped 100x more functionality than you in that time.

That’s why people use these systems and why so many very smart people spend so much time trying to make them better


Well I do everything myself, I think what you said is mostly true only from the perspective of people who use orms and then everything works initially sure. But then are you really in control of your data and do you really know what's going on? I can only speak from a .net perspective now but the popular ORMs are all extremely slow, so with some research you end up using dapper. But then you end up basically writing SQL manually because that's just how it is. It wouldn't be any different for many languages, like Rust. Essentially I think unless you use java+hibernate or some other extremely common combination, you'll end up writing SQL yourself and then it's pretty painful anyway because you have a second textual DSL embedded into your other code with its own datatypes that you constantly have to map to the language's datatypes and vice versa. So iteration wise it doesnt even seem to be about mysql vs scylladb, it is about orm vs. no orm.

And regarding correctness and iteration speed.. yes sure but it's easy to say that you are correct when you only run on a single node. The reason why those distributed solutions are less correct is because it's harder to be correct. For example in vitess you end up having to do a bunch of manual work with their sequences for every table. In scylladb afaik you just trust that the timeuuids are unique or you do the same and pregenerate bigints into a table.

My point is: To say that mysql/postgres are safer because they are correct and you can iterate faster is only true if you go down a very narrow path with your orm and you stay on a single node. You are essentially postponing all the pain. At some point you'll stop using the orm and write your own, at some point you'll stop using the single node.


I think it’s the last paragraph I’m really disagreeing with. Many many application will do just fine with a single Postgres. If that is not enough even on a big box, they can scale out to a large master system like aurora. If that is not enough, it’s much easier to try and scale with something like Vitess or Cockroach that offers a lot more consistency and correctness out of the box.

Scylla is great for specific workloads, but it’s a niche tool that makes sense for specific high volume workloads but I wouldn’t recommend it as a general purpose application development database. It will slow you down a lot and a lot of really simple features will be hard to implement.

Also I’m not talking about ORMs. Writing SQL by hand in your application is 10-100x more productive for feature development than using something like Scyllas APIs directly IMO.


CQL to me is just another SQL dialect, databases all have different, incompatible dialects. And on top of that, it's not like SQL is a highly productive programming language anyway, it's a pretty hard to use language for anything that's even slightly more complicated. I don't really see where this x10-x100 more productiveness would come from compared to scylla APIs which are just another form of SQL dialect so it isn't that different. Sure I can't write e.g (recursive)CTEs or repeating events in Scylladb but then: do I really want them anyway? wouldn't it be better anyway if I write everything as a set of simple queries? Wouldn't it be better anyway if instead of an event on an interval in SQL, I'd just run a simple SQL query in a background thread from my application on an interval?


> CQL to me is just another SQL dialect

SQL is for relational databases. That's a different paradigm to CQL. The similarity is superficial. CQL is not a dialect of SQL. Not even close.


People saying the performance of Postgres is not adequate is my red flag for them not knowing what they’re doing.

Of course, sometimes they do have actual performance issues, but 99 out of 100 times they just don’t know what an index is.


Looking back over my ~30 yr career I could pretty neatly divide my projects into two categories:

* Low volume, complicated queries

* High volume, simple queries

In the rare case that I've seen high volume, complicated queries it generally involves a lot of pre-indexing, denormalization, and exotic data structures.

Postgres really shines in the first case, which is probably the dominant case for "data important enough to pay people to work on".


Mongo's sharding also uses routers like Vitess but can be strongly consistent depending on connection and write settings.

Data is divided into chunks and the routers route queries based on where your query lands on chunk ranges.


When you give up one of the major components of relational databases like foreign keys you pick up lots of performance opportunity.

It is amazing what Vitess / PS will help you accomplish but they don't talk a lot about tradeoffs you make to get there (which are similar to what you face with sharding MySQL without additional tooling).


IIRC CockroachDB had (has?) interleaved tables for this reason. It seemed to allow collocating shards with the same foreign key in the same machine, leveraging composite keys for that.

For example, a multi-tenant application could have a tenants table with PK id, and an orders table with PK (tenant_id, id), and a FK from orders(tenant_id) to tenants(id). Then CockroachDB would keep the orders with tenant_id = 123 in the same machine where the tenant with id = 123 was.

I have the impression they deprecated interleaved tables, but I can't find more info about it.


Here you go:

https://github.com/cockroachdb/cockroach/issues/52009

Not only deprecated, but completely removed a few years ago as seen in:

https://github.com/cockroachdb/cockroach/issues/69150


Things can be really fast when you throw out referential integrity


We have foreign keys coming soon!


Hi Sam!

I stumbled upon your DB service several times, and considered it several times suggesting it as the DB of choice for my customers (uxwizz.com), but the pricing is a bit confusing/hard to estimate in my case, which makes it hard for me to recommend it. Any suggestions?


The reasons Vitess didn't have foreign key support historically actually weren't a bold performance tradeoff or anything like that. It was more of a classic, boring, backlog prioritization thing: everyone heavily using Vitess was using gh-ost for schema changes, and gh-ost didn't support foreign keys.

Now that Vitess has native schema change tools it's more reasonable to revisit user-friendly, out-of-the-box foreign key support.


Kind of. It was definitely an opinionated choice at some point, justified with performance. Now it is just something we need to add and it's nearly done. It will still have trade offs associated like all distributed systems.


Databases are my interest, so if you don't mind… How would removal of foreign keys speed up read access?

I'm also going to disagree with you a little, again on the issue of reading, if you have foreign keys then you can do optimisations like cutting out chunks of joins. Adding constraints means you know more which you can feed to the optimiser which means often you can get better performance (read performance, that is).


Not OP, but they said that you pick up performance, not specifically in reading. The performance you gain is larger aggregate write throughput (and read generally - although reads that require spreading queries across shards and aggregating results will likely perform a little worse). You also gain scale in that you have turned something that was limited by scale up into something limited by scale out (while giving up some data guarantees and performance on some queries).

As to why you can't really have foreign keys, basically, sharding schemes like this sit at an intermediate layer between the application and the various DB clusters that are the shards. You CAN have strong data consistency (and useful foreign keys) within a shard because it's all inside a single database; however across shards, you CAN NOT. The sharding layer doesn't perform checks for you, so if you have two logical tables that are partitioned differently across the shards, you can't have a foreign key that will be enforced correctly as the foreign key of a given row in one table may live on a different shard in the other table. The local database within the shard would reject the insert. Transactions across shards can also be tricky to impossible.


In my experience most databases are read-heavy so that's what I was asking about. That said, thanks for a clear explanation.


Foreign keys are so overrated. Write good code and you don’t have to rely on the DB to enforce referential integrity. It’s the equivalent of running test cases with every db write.


I'm curious, how would you obviate the need for foreign keys with "good" code? Can you provide a toy example or a reference to an article so I can understand better? I've used NoSQL databases a long time ago and currently rely on on good ole PostgreSQL, but I'm having a hard time understanding how "good" code can be a better solution for managing relationships between data than a foreign key.


> for managing relationships between data than a foreign key.

You're conflating the concept of a normalized database with insanely slow DB-enforced referential integrity/foreign keys.

Toy example? Sure. Take a well-formed 3NF schema and disable foreign key constraints.


Sorry, assume I'm dense.

> Take a well-formed 3NF schema and disable foreign key constraints.

I'm familiar with 3NF, but can you expand on how 3NF enables you to remove foreign keys? Or feel free to point me to an article/blog, I don't want to waste your time if it's too much to explain. I did some googling but wasn't sure where to proceed from your post.


They're not saying 3NF enables you to remove foreign keys. They are talking about removing foreign key constraints from your RDBMS of choice.

Something like SQL Server can enforce foreign key constraints if you explicitly tell it your relationships between tables. The downside is that having this referential integrity costs you performance as the database has to check your relations when inserting/updating/deleting rows. E.g. checking that a foreign key is pointing at a valid primary key, checking that you aren't leaving invalid foreign keys when deleting a primary key, etc. This is to prevent you inserting bad data into the database.

You can delete these constraints and still have the exact same behavior so long as your code is correct. It just means that the database isn't going to stop you writing bad data.


That makes sense. For workloads where write performance isn't very important but read performance is, this sounds like it may still be a worthwhile tradeoff to have that extra level of data integrity.

My sense is that many typical CRUD apps aren't writing gargantuan volumes of data or making very complex edits, and if they do, it's ok if it takes a second longer. Usually read speed is more of a bottleneck for user-facing applications, but I'm sure there are probably some examples where this tradeoff is worth it.


> so long as your code is correct

This is a pretty tough definition of correct, though. Without foreign key constraints you'll have a really tough time dealing with concurrency artifacts without raising your isolation levels, which generally brings larger performance concerns.


Why do you think that?

My experience is that if you have a moderate amount of foreign keys, a lot of DBMS (not Postgres) will refuse the `ON DELETE CASCADE` (in the diamond case), and you have to do it "manually" anyway (from your query builder).


I think it because a significant fraction of my career has been spent fixing db-concurrency-related mistakes for people once they hit scale :-).

I’m not talking about using cascade - this applies perfectly well to use of on delete restrict. FKs are more or less the only standard way to reliably keep relationships between tables correct without raising up the isolation level (at least, in most dbs) or doing explicit locking schemes that would be slower than the implicit locking that foreign keys perform.


Say you have a super basic setup:

  user = {user_id, email}
  order = {order_id, user_id}
order.user_id is a foreign key to user.user_id. That's a perfectly valid and reasonable way to organize things.

Enabling RDBMS-enforced foreign key constraints is the issue. It slows everything down dramatically.


I see, thanks. Basically just store the foreign keys yourself as columns in relevant tables and perform the joins in SQL without having the DB enforce FK integrity for every insertion/update/delete. Would it be fair to say that read speeds are unaffected by this?


Yes. Quite literally the only difference is not enabling foreign key constraints. Read speeds unaffected correct.


>It slows everything down dramatically

How slower is it really? MySQL automically creates indexes for foreign keys so I don't think it slows down "dramatically", just an additional indexed retrieval?

Do you know of any benchmarks which show the difference?


How does this stay correct in the presence of concurrent activity?


Using transactions or UUID/ULIDs though maybe I’m misunderstandingyour question. How do foreign key constraints help with concurrency?


Table User: userid, etc

Table Resources: resourceid, userid, etc

If I want to restrict deletion of a user to only be possible after all the resources are deleted, I'm forced into using higher-than-default isolation levels in most DBs. This has significant performance implications. It's also much easier to make a mistake - for example, if when creating a resource I check that the user exists prior to starting the transaction, then start the tran, then do the work, it will allow insertion of data into a nonexistent user.


Add your user check as a where clause on the resources insert?


Can you give an example? I’m not aware of a mechanism like that that will protect you from concurrency artifacts reliably - certainly not a general one.


start transaction;

select id from users where id = ? for update;

if row_count() < 1 then raise 'no user' end if;

insert into sub_resource (owner, thing) values (?, ?);

commit;

??


Do that in most relational dbs in the default isolation level (read committed), and concurrently executing transactions will still be able to delete users underneath you after the select.

If we take postgres as an example, performing the select takes exactly zero row level locks, and makes no guarantees at all about selected data remaining the same after you’ve read it.

edit: my mistake - I missed that the select is for update. Yes, this will take explicit locks and thus protect you from the deletion, but is slower/worse than just using foreign keys, so it won't fundamentally help you.

further edit: let's take an example even in a higher isolation level (repeatable read):

  -- setup
  postgres=# create table user_table(user_id int);
  CREATE TABLE
  postgres=# create table resources_table(resource_id int, user_id int);
  CREATE TABLE
  postgres=# insert into user_table values(1);
  INSERT 0 1

  Tran 1:
  postgres=# BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
  BEGIN
  postgres=# select * from user_table where user_id = 1;
 user_id 
  ---------
       1
  (1 row)

  Tran 2:
  postgres=# BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
  BEGIN
  postgres=# select * from resources_table where user_id = 1;
   resource_id | user_id 
  -------------+---------
  (0 rows)
  postgres=# delete from user_table where user_id = 1;
  DELETE 1
  postgres=# commit;
  COMMIT

  Tran 1:
  postgres=# insert into resources_table values (1,1);
  INSERT 0 1
  postgres=# commit;
  COMMIT

  Data at the end:

  postgres=# select * from resources_table;
   resource_id | user_id 
  -------------+---------
             1 |       1
  (1 row)

  postgres=# select * from user_table;
   user_id 
  ---------
  (0 rows)
You can fix this by using SERIALIZABLE, which will error out in this case.

This stuff is harder than people think, and correctly indexed foreign keys really aren't a performance issue for the vast majority of applications. I strongly recommend just using them until you have a good reason not to.


I guess you can use indices instead of foreign keys? And somehow implement all the `ON DELETE CASCADE` manually within any transaction that removes the original row? Not sure how it's "good" code but it could be faster.


A foreign key doesn't necessarily imply an index. If you are using postgres, you would have to add an index in addition to the foreign key, if you want one.

On delete cascade, depending on how many rows it cascades to, can be problematic because it's a very long running blocking operation. That's something one might want to do as a background operation and in batches. Although that won't make it faster.


Faster in aggregate throughput can often be different from faster for a specific operation.

Personally, I find delete on cascade dangerous. I mean, lots of fun for a pen tester, sure…


> Personally, I find delete on cascade dangerous. I mean, lots of fun for a pen tester, sure…

Intriguing. Can you tell me more?


> A foreign key doesn't necessarily imply an index. If you are using postgres, you would have to add an index in addition to the foreign key, if you want one.

Sure, I meant an index in which the key is primary. But, on second thought, that's probably me misreading the GP's message.

> On delete cascade, depending on how many rows it cascades to, can be problematic because it's a very long running blocking operation. That's something one might want to do as a background operation and in batches. Although that won't make it faster.

That can definitely be a problem (just like destructor deallocation stampedes in C++ or Rust). Still less risky than cascading manually and asynchronously, I suspect.


You can have foreign keys (in the sense of a column on one table pointing to an ID column on another), just without the database itself enforcing the correctness of those relationships on every write.


"Write good code and you won't have bugs" :)

I think it's good practice to enforce consistency rules both in the DB and in code. If you make a mistake in your code, the DB won't allow it, and vice versa.


The exact analogy that came to mind for me.

Plus, relational databases don't just sit under a single application. There's usually multiple applications/services talking to them. Worse, humans connect to them an do all sorts of things they shouldn't do. That's the whole point of managing referential integrity in the DBMS, since you can only control "just write good code" across so many application domains.

Of course whether the performance tradeoff is worth it is a complicated decision for many of the reasons people have mentioned. But in 20 years of working with relational databases at big companies, I've seen few examples where the performance win exceeded the business risk.


In theory it is good practice, but in practice enforcing referential integrity is not free and it makes satisfying other non-functional requirements harder. Engineering is about tradeoffs. Foreign keys are overkill in modern architectures where DBs/tables are typically only written to by a single application. They solve a problem that lots of people don't really have anymore.


In the legacy project our team inherited, there's a lot of data consistency issues stemming from the lack of foreign keys (we're adding them back now). Data consistency is important because we deal with financial data. Typically, code and data change more often than DB schemas, so every release has a chance that someone will forget to update all relations manually in code, introducing dangling references, for example. Foreign keys help as an additional safety measure, because they don't change as often. I think you should care about the performance of foreign keys only when speed is more important than data consistency. It's a good default.


Personally, I thinks foreign keys are great in dev and bad at any scale that you have many db servers.

And as you point out, there are exceptions, like financial data. But not marketing funnels where you might throw everything away.

You should probably have different types of engineers working on such different projects as well.


> DBs/tables are typically only written to by a single application

I’ve seen several teams regret making this assumption. Unless you enforce draconian access control over a database, you’re going to discover that customer support and accounting and biz dev have been quietly relying on it.


In my experience almost every time I've heard people saying they don't want foreign keys because of "the performance hit" - they've been nowhere near the scale or the skill that would make it sensible.


From experience you are very right. Also from experience, most teams are not good enough to write the code with that quality that you can give up to referential integrity.


I don't think foreign keys can be replaced by good code. Probably their main advantage is what happens if there are edge-cases or DB write fails, to make sure the referential integrity is maintained in outstanding circumstances.


    > ● SQL query compatibility
    > ● Minimal changes to the application
    > ● Runs MySQL in backend
    > ● Kubernetes native
    > ● Provides Kubernetes operator
    > We evaluated multiple candidates and chose Vitess
https://github.com/vitessio/vitess

But it's very light on numbers and doesn't show trade-offs or anything.


Vitess was used for all of Youtube, they moved to Spanner couple of years ago, Vitess is proven and scalable tech.


> Vitess has been a core component of YouTube's database infrastructure since 2011, and has grown to encompass tens of thousands of MySQL nodes.

Would love to hear more about this implementation


Me too, especially, what it does different from if you would manually shard/partition your database. Is it just the automated sharding?


KubeCon talks should be out in the next month or so and it has more details.


3 months from start to roll out for tbeir large scale deploy is hella impressive.

Just trying to do basic scaling or other simple tasks sounded like an utter nightmare on their old system. Having autonomic computing at their back seems like an obvious win. Tell it your desired state & let the controller do the job.

The new paradigm for computer operations is so wonderful.


The timing slide caught my attention as well. 3 months for anything is fast...


They may well have had a good small scale success already, it seems like/perhaps, but still seems quite solid.


I wonder what game this was for.

To me this is validation that going the SQL route in almost 99% of new apps is the right way to go. It will be a rare case that you won't be able to scale out given how mature some of these technologies are.


Demonware handle the online services for the CoD franchise.


They handle online services for pretty much any Activision-published game, with carve-outs for Blizzard and King - they still do most of their own backend stuff themselves, with some collaborations.


MongoDB ftw. Even has graph capabilities.


Its web scale!


Github had DAYS of an outage because they unsuccessfully tried to scale out SQL rather than dealing with NoSql.

There are very few things that structured NoSql can't do if you ignore reporting. Once you scale out traditional SQL, you aren't going to be using it for reporting either way though.


This is a gross mischaracterization of what caused that outage, even just based on the public information from the (excellent) blog post [0] about it.

Automating master failover on MySQL without a human in the loop - in the topology GitHub used back then - is risky.

0: https://github.blog/2018-10-30-oct21-post-incident-analysis/


How is it a gross mischaracterization? You get master master multi regional replication out of the box with cassandra, and github spent an inordinate amount of engineering effort to unsuccessfully slap that on top of MySql.


They have a PlanetScale logo on the last slide. Is this just to say "Thanks for your work on Vitess!" or are they using the PlanetScale service for some of this?


Interesting slide deck, but I feel we are missing some of the most interesting details out. Particularly, slides 24 and 25 could be an entire article on their own.

* How did you tune Vitess for resiliency? What were the tradeoffs and how was the performance?

* How did you migrate from shared in app config to single DB endpoint? (this is an issue we are facing right now)

* What do you mean by some queries being too shard aware? How did you optimize queries for efficient routing with Vitess?


The thing that kills you is IOPs per query.

At 30 TBs the hot data set is huge an there is a long tail of queries not loaded into memory.

Plus they were saying 10k+ connections per server. So we are probably looking st 500k+ connections.

There is also a regular influx on unoptimized / badly written queries as people add features/games etc. Plus you can end up with random spikes.

This setup is likely optimized for high read availability. The slides mention the mental overhead of failovers. This is a thing. Try to explain and help each team understand how to do those safely. Downtime are also not an option.

So there is quite some information and context (likely) missing. I used to manage a similar scale zoo of MySQL (similar peak QPS, less total data iirc). I am not sure I would subscribe so much on the Vitess as solution side though. Galera and Group Replication can give you hassle free failovers, add some read replicas and you have a really nice DB setup that can scale horizontally amd vertically until you hit the write limitations per server.


Ahh, I wanted to go see this talk at KubeCon yesterday but the room was completely full. Glad to see the finer points got posted here.


Anyone have more blog posts about video game backend tech like this one?


None, I do wish the Gaming industry in general produce more contents like these. From Network ( Extreme latency sensitive ) to DB, in game Frame Rate where techniques are not used in Browser and Editors for faster response time.

But I think the industry are being pushed hard and dont have much time to produce content for our reading interest.


Very interesting read. I've worked on game services, and there are a lot of scaling challenges compared to "traditional" apps. Majority of traffic write heavy (many common patterns for scaling read traffic don't help you), massive spikes of launch day players, followed by big drop offs and occasional spikes with content updates. The height of these spikes can also be unpredictable, you're never sure how many players you'll get. The team I worked in had success with DynamoDB which did a lot of the operational heavy-lifting to scale up and down as well handling scale.

I would have liked a bit more specifics on their manual sharing solution, and how the vitess implementation differs.


I assume Activision actually knows how to scale databases, but has "join queues" for hype? I think everyone knows the infamous Blizzard queues in release week, where you have to wait an hour with only 5000 people in front of you. Happened recently with the OW2 release.


I have to be careful what I say here, given I work for ATVI, I'm former Demonware myself.

I do feel safe in saying though that the login queues are _definitely_ not hype.

They're designed to constrain a quite-complex distributed system to a login rate that has been load-tested thoroughly, e.g. they know it'll work at that rate.


Databases are not the only contended-for resource in online games.

Those queues are usually the result of intentional capacity planning. Overprovisioning is expensive.


> Those queues are usually the result of intentional capacity planning. Overprovisioning is expensive.

I'd imagine they could spin up new servers on demand.


Could, but after near 20 years of WoW and however many expansion launches with big ole wait queues each time, they really don't seem to bother.


> PROBLEM Certain Vitess scaling operations fail on FK violation > SOLUTION Temporarily disable FK checks on the target shards

I always found disabling FK checks weird, as in, once you disable them, isn't the integrity permanently lost? Or does it check again the integrity once it's reactivated?


just the right time to ask this question. We have an RFC open to add support for foreign keys in Vitess https://github.com/vitessio/vitess/issues/12967


Gaming is insane that it even works online in near realtime. Even moreso before. You can buy old wow servers. That was a world.


I'd be extremely interested to understand what was the bottleneck that made sharding necessary. Storage capacity on a single node? Memory capacity? CPU? Memory bandwidth? I/O bandwidth? Depends on the specific app? Depends on the specific query?


This is an interesting discussion on database performance and scaling. Thanks for sharing!


What kind of queries per second did you see while running on kubernetes, compared to VMs?


Considering [1] mySQL v5.x will EOL this year. And MySQL 8.0 with EOL in 2026. Does anyone knows if MySQL 9.0 will come soon?

[1] https://endoflife.software/applications/databases/mysql


It would have been interesting to know what alternatives they considered. Perhaps this would have been easy to solve using Nomad and TiDB.

So it's a story of people who used Kubernetes and MySQL and continued to use Kubernetes and MySQL. The end.


At my previous company when we made the decision to go with Vitess it was because we had confidence that we could accomplish our scalability goals using Vitess for our critical system in a predictable amount of time with a predictable amount of effort and with a safe, incremental, non-disruptive, revertable, testable rollout without downtime.

We had no way to have that kind of confidence about migrating off MySQL to an entirely different database (like TiDB). If we'd been addressing a new storage use case from scratch instead of migrating an existing one, the case for using something other than MySQL might have been stronger.



It seems that, on that list, it's almost entirely Chinese companies that have migrated to TiDB.

Thanks for the list (and the graph), very interesting!




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: