More

franckpachot · 2025-01-12T22:36:47 1736721407

Are NULLs treated as distinct or duplicates in your UNIQUE INDEX? It depends SQL != NoSQL != some specific implementations Which option offers the best developer experience? Comments welcome

franckpachot · 2024-12-09T22:34:03 1733783643

YugabyteDB supports much more than basic things. I've been a 3+ years dev advocate for Yugabyte, and I've always seen triggers. LISTEN/NOTIFY is not yet there (it is an anti-pattern for horizontal scalability, but we will add it as some frameworks use it). Not yet 100% compatible, but there's no Distributed SQL with more PG compatibility. Many (Spanner, CRDB, DSQL) are only wire protocol + dialect. YugabyteDB runs Postgres code and provides the same behavior (locks, isolation levels, datatype arithmetic...)

franckpachot · 2024-12-09T21:57:22 1733781442

Index creation should not be controlled by statement timeout, but backfill_index_client_rpc_timeout_ms which defaults to 24 hours. May have been lower in old versions

franckpachot · 2024-09-11T06:43:23 1726037003

It depends on the use cases and performance goals. You may want to distribute the rows that you insert, and then a random UUID makes sense. However, it is too much distributed for B-Tree indexes and the problem is not only cache but the amount of modifications due to leaf block splits. This includes MySQL which stores the primary key in a B-Tree index. Other use cases may benefit from colocating the rows that are inserted together. Think of timeseries, or simply an order entry where you query the recent orders. A sequence makes sense there, to have a good correlation between the index (on time) and the primary key. This avoids too many random reads with low cache hits.

It is wrong to think that distributed databases do not need sequences. YugabyteDB allows it. With YugabyteDB you use hash sharding to distribute them to a small number of hash ranges, so that they don0t go all at the same place, but are not scattered across the whole database. CockroachDB and Spanner doesn't have hash sharding and that's why they do not recommend sequences. There are also use cases where range sharding on the sequence is good when you don't need to distribute the data ingest, but benefit from their colocation when querying.

franckpachot · 2024-08-28T20:52:09 1724878329

1. Regarding performance, I recently did a simple test. CockroachDB uses a considerable number of CPU instructions compared to YugabyteDB: https://dev.to/yugabyte/comparing-sql-engines-by-cpu-instruc...

Writing a database from scratch is not easy. YugabyteDB uses some PostgreSQL, Kudu, and RocksDB code that has been heavily optimized before. Those are good codebases, and only some parts need to be enhanced to make them distributed.

2. Their Go version of RocksDB, Peeble, seems less efficient. They did it for a good reason. They didn't have the C++ skills to enhance RocksDB itself.

3. The repo holds more than the database.

C: is the SQL layer, based on PostgreSQL

C++: the transactional distributed storage, heavily modified Kudu and RocksDB

Java: some regression tests, the managed service automation, sample applications

TS: the Graphical User Interface

Python: some tooling to build the releases, some support tools

The database itself is C and C++

franckpachot · 2024-08-16T20:34:10 1723840450

YugabyteDB is and will always be Apache2. It is PostgreSQL compatible (the query layer is a fork of PostgreSQL) so the migration from CockroachDB, which implements a subset of PostgreSQL features, is easy.

franckpachot · 2024-03-30T11:26:47 1711798007

What I do not understand is they say "we explored CockroachDB, TiDB, Spanner, and Vitess". Those are not compatible with PostgreSQL beyond the protocol and migration would require massive rewrites and tests to get the same behavior. YugabyteDB is using PostgreSQL for the SQL processing, to provide same features and behavior and distributes with a Spanner-like architecture. I'm not saying that there's no risk and no efforts, but they are limited. And testing is easy as you don't have to change the application code. I don't understand why they didn't spend a few days on a proof of concept with YugabyteDB and explored only the solutions where application cannot work as-is.

eivanov89 · 2024-03-30T13:19:47 1711804787

I think Denis addressed this in his post: "Overall, as an engineer, you will never regret taking part in the development of a sharding solution. It’s a complex engineering problem with many non-trivial tasks to solve". In other words, it might be not invented here syndrome (with all due respect to Figma team). Or there might be more nuances we are unaware about.

pas · 2024-03-31T15:32:33 1711899153

they wanted to stay on RDS, maybe not "them", maybe it was the decision of some manager

also, it's... strange that they had 18 months and "extremely tight timeline pressure" we simply don't know enough about the situation

jmull · 2024-03-30T16:42:39 1711816959

Maybe it’s just a matter of it being difficult to list all the things they didn’t use. The Figma article itself is a little more clear on their goals…

It’s not really just postgres compatibility they are after, but compatibility with the Amazon RDS version of postgres. They also wanted to have something they could adopt incrementally and back out of when something unanticipated goes wrong.

Also, I think yugabyte uses an older version of the postgres processing engine, which may or may not be a big deal, depending on what they are using.

franckpachot · 2024-03-23T12:45:47 1711197947

Maybe it is not the planner. On difference between those other databases and PostgreSQL is that their plan do not depend on how freshly the table was vacuumed. The cost of your "correct index" may becomes worse when the rows are updated until they are vacuumed.

franckpachot · 2024-03-23T12:00:02 1711195202

In all databases you will avoid bad plans (and the unpredictable performance related to plan changes) by providing the right index. You have two selective filters: WHERE and LIMIT so the right index have both

franckpachot · 2024-03-16T21:19:04 1710623944

If changing random_page_cost from 4 to 2 makes a difference, then probably there are no good indexes. The choice between Seq Scan and Index Scan should be obvious without depending on small adjustments or one day, with slightly different data distribution the plan will flip to a bad one