Hacker News new | past | comments | ask | show | jobs | submit | franckpachot's comments login

It depends on the use cases and performance goals. You may want to distribute the rows that you insert, and then a random UUID makes sense. However, it is too much distributed for B-Tree indexes and the problem is not only cache but the amount of modifications due to leaf block splits. This includes MySQL which stores the primary key in a B-Tree index. Other use cases may benefit from colocating the rows that are inserted together. Think of timeseries, or simply an order entry where you query the recent orders. A sequence makes sense there, to have a good correlation between the index (on time) and the primary key. This avoids too many random reads with low cache hits.

It is wrong to think that distributed databases do not need sequences. YugabyteDB allows it. With YugabyteDB you use hash sharding to distribute them to a small number of hash ranges, so that they don0t go all at the same place, but are not scattered across the whole database. CockroachDB and Spanner doesn't have hash sharding and that's why they do not recommend sequences. There are also use cases where range sharding on the sequence is good when you don't need to distribute the data ingest, but benefit from their colocation when querying.


1. Regarding performance, I recently did a simple test. CockroachDB uses a considerable number of CPU instructions compared to YugabyteDB: https://dev.to/yugabyte/comparing-sql-engines-by-cpu-instruc...

Writing a database from scratch is not easy. YugabyteDB uses some PostgreSQL, Kudu, and RocksDB code that has been heavily optimized before. Those are good codebases, and only some parts need to be enhanced to make them distributed.

2. Their Go version of RocksDB, Peeble, seems less efficient. They did it for a good reason. They didn't have the C++ skills to enhance RocksDB itself.

3. The repo holds more than the database.

C: is the SQL layer, based on PostgreSQL

C++: the transactional distributed storage, heavily modified Kudu and RocksDB

Java: some regression tests, the managed service automation, sample applications

TS: the Graphical User Interface

Python: some tooling to build the releases, some support tools

The database itself is C and C++


YugabyteDB is and will always be Apache2. It is PostgreSQL compatible (the query layer is a fork of PostgreSQL) so the migration from CockroachDB, which implements a subset of PostgreSQL features, is easy.


What I do not understand is they say "we explored CockroachDB, TiDB, Spanner, and Vitess". Those are not compatible with PostgreSQL beyond the protocol and migration would require massive rewrites and tests to get the same behavior. YugabyteDB is using PostgreSQL for the SQL processing, to provide same features and behavior and distributes with a Spanner-like architecture. I'm not saying that there's no risk and no efforts, but they are limited. And testing is easy as you don't have to change the application code. I don't understand why they didn't spend a few days on a proof of concept with YugabyteDB and explored only the solutions where application cannot work as-is.


I think Denis addressed this in his post: "Overall, as an engineer, you will never regret taking part in the development of a sharding solution. It’s a complex engineering problem with many non-trivial tasks to solve". In other words, it might be not invented here syndrome (with all due respect to Figma team). Or there might be more nuances we are unaware about.


they wanted to stay on RDS, maybe not "them", maybe it was the decision of some manager

also, it's... strange that they had 18 months and "extremely tight timeline pressure" we simply don't know enough about the situation


Maybe it’s just a matter of it being difficult to list all the things they didn’t use. The Figma article itself is a little more clear on their goals…

It’s not really just postgres compatibility they are after, but compatibility with the Amazon RDS version of postgres. They also wanted to have something they could adopt incrementally and back out of when something unanticipated goes wrong.

Also, I think yugabyte uses an older version of the postgres processing engine, which may or may not be a big deal, depending on what they are using.


Maybe it is not the planner. On difference between those other databases and PostgreSQL is that their plan do not depend on how freshly the table was vacuumed. The cost of your "correct index" may becomes worse when the rows are updated until they are vacuumed.


In all databases you will avoid bad plans (and the unpredictable performance related to plan changes) by providing the right index. You have two selective filters: WHERE and LIMIT so the right index have both


If changing random_page_cost from 4 to 2 makes a difference, then probably there are no good indexes. The choice between Seq Scan and Index Scan should be obvious without depending on small adjustments or one day, with slightly different data distribution the plan will flip to a bad one


Even if bugs are fixed instantly, nobody will apply a patch in production withiut previous testing. Changing system-wide behavior to fix a single query may make things worse. Hints are the only way to fix at the scope of one statement with the guarantee that it doesn't break others


All those methods are try and guess. With hints you can have a scientific approach to understand why the bad plan has been chosen and find the right plan. Then, you can address the root cause. join_collapse_limit=1 may set the join order but not the join direction, so that's not enough if cardinality is misestimated. And pg_hint_plan can set this parameter for one statement if that's what you want, better than setting for the transaction


(YugabyteDB Developer Advocate here)

@flagged24 If you can send me more info about your migration problems, I would love to look at it (fpachot@yugabyte.com). The postgres-compatibility, performance, and YB Voyager are improving from feedback.

The default parameters may not be the best to try an existing app. Here is a docker image I've made with the best defaults for a quick start: https://github.com/FranckPachot/yb-pglike to check the compatibility, and then look at more tuning.

@gunapologist99 I'm not a big fan of benchmarks, especially on products with fast evolution. The best is to test with something that is similar to your app and open an issue (github, forum, slack) if it is slow to be sure it's not a configuration issue, or bug recently fixed.

Franck


Thanks, will do when I give it another try.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: