Hacker News new | past | comments | ask | show | jobs | submit | franckpachot's comments login

Are NULLs treated as distinct or duplicates in your UNIQUE INDEX? It depends SQL != NoSQL != some specific implementations Which option offers the best developer experience? Comments welcome


YugabyteDB supports much more than basic things. I've been a 3+ years dev advocate for Yugabyte, and I've always seen triggers. LISTEN/NOTIFY is not yet there (it is an anti-pattern for horizontal scalability, but we will add it as some frameworks use it). Not yet 100% compatible, but there's no Distributed SQL with more PG compatibility. Many (Spanner, CRDB, DSQL) are only wire protocol + dialect. YugabyteDB runs Postgres code and provides the same behavior (locks, isolation levels, datatype arithmetic...)


Index creation should not be controlled by statement timeout, but backfill_index_client_rpc_timeout_ms which defaults to 24 hours. May have been lower in old versions


It depends on the use cases and performance goals. You may want to distribute the rows that you insert, and then a random UUID makes sense. However, it is too much distributed for B-Tree indexes and the problem is not only cache but the amount of modifications due to leaf block splits. This includes MySQL which stores the primary key in a B-Tree index. Other use cases may benefit from colocating the rows that are inserted together. Think of timeseries, or simply an order entry where you query the recent orders. A sequence makes sense there, to have a good correlation between the index (on time) and the primary key. This avoids too many random reads with low cache hits.

It is wrong to think that distributed databases do not need sequences. YugabyteDB allows it. With YugabyteDB you use hash sharding to distribute them to a small number of hash ranges, so that they don0t go all at the same place, but are not scattered across the whole database. CockroachDB and Spanner doesn't have hash sharding and that's why they do not recommend sequences. There are also use cases where range sharding on the sequence is good when you don't need to distribute the data ingest, but benefit from their colocation when querying.


1. Regarding performance, I recently did a simple test. CockroachDB uses a considerable number of CPU instructions compared to YugabyteDB: https://dev.to/yugabyte/comparing-sql-engines-by-cpu-instruc...

Writing a database from scratch is not easy. YugabyteDB uses some PostgreSQL, Kudu, and RocksDB code that has been heavily optimized before. Those are good codebases, and only some parts need to be enhanced to make them distributed.

2. Their Go version of RocksDB, Peeble, seems less efficient. They did it for a good reason. They didn't have the C++ skills to enhance RocksDB itself.

3. The repo holds more than the database.

C: is the SQL layer, based on PostgreSQL

C++: the transactional distributed storage, heavily modified Kudu and RocksDB

Java: some regression tests, the managed service automation, sample applications

TS: the Graphical User Interface

Python: some tooling to build the releases, some support tools

The database itself is C and C++


YugabyteDB is and will always be Apache2. It is PostgreSQL compatible (the query layer is a fork of PostgreSQL) so the migration from CockroachDB, which implements a subset of PostgreSQL features, is easy.


What I do not understand is they say "we explored CockroachDB, TiDB, Spanner, and Vitess". Those are not compatible with PostgreSQL beyond the protocol and migration would require massive rewrites and tests to get the same behavior. YugabyteDB is using PostgreSQL for the SQL processing, to provide same features and behavior and distributes with a Spanner-like architecture. I'm not saying that there's no risk and no efforts, but they are limited. And testing is easy as you don't have to change the application code. I don't understand why they didn't spend a few days on a proof of concept with YugabyteDB and explored only the solutions where application cannot work as-is.


I think Denis addressed this in his post: "Overall, as an engineer, you will never regret taking part in the development of a sharding solution. It’s a complex engineering problem with many non-trivial tasks to solve". In other words, it might be not invented here syndrome (with all due respect to Figma team). Or there might be more nuances we are unaware about.


they wanted to stay on RDS, maybe not "them", maybe it was the decision of some manager

also, it's... strange that they had 18 months and "extremely tight timeline pressure" we simply don't know enough about the situation


Maybe it’s just a matter of it being difficult to list all the things they didn’t use. The Figma article itself is a little more clear on their goals…

It’s not really just postgres compatibility they are after, but compatibility with the Amazon RDS version of postgres. They also wanted to have something they could adopt incrementally and back out of when something unanticipated goes wrong.

Also, I think yugabyte uses an older version of the postgres processing engine, which may or may not be a big deal, depending on what they are using.


Maybe it is not the planner. On difference between those other databases and PostgreSQL is that their plan do not depend on how freshly the table was vacuumed. The cost of your "correct index" may becomes worse when the rows are updated until they are vacuumed.


In all databases you will avoid bad plans (and the unpredictable performance related to plan changes) by providing the right index. You have two selective filters: WHERE and LIMIT so the right index have both


If changing random_page_cost from 4 to 2 makes a difference, then probably there are no good indexes. The choice between Seq Scan and Index Scan should be obvious without depending on small adjustments or one day, with slightly different data distribution the plan will flip to a bad one


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: