Hacker News new | past | comments | ask | show | jobs | submit login
TiDB: A Raft-based HTAP Database [pdf] (vldb.org)
50 points by Lilian_Lee 30 days ago | hide | past | favorite | 7 comments

Here's more reference: https://pingcap.com/blog/how-tidb-htap-makes-truly-hybrid-wo... This post introduces the design details of the HTAP architecture of TiDB, including the real-time updatable columnar engine, the multi-Raft replication strategy, and smart selection.

These hybrid databases are really exciting. I also really like how mature pingcap is from a technical perspective. On the benchmarking page this paragraph really stood out.

> If you don't have reproducible, fair benchmarking, you can be blinded by your own hubris. Our contributors and maintainers depend on benchmarks to ensure that as we strive to improve TiDB, we don't negatively impact its performance. To us, not having benchmarks is like not having logging or metrics.

One important limitation is that TiDB only offers snapshot isolation, not serializability.

Is serializability desirable in a distributed database?

It's just as desirable as with a simple database since it avoids certain anomalies (write skew). For many operations it's strong enough, but it adds the risk that you run into problematic anomalies you did not anticipate.

> In a write skew anomaly, two transactions (T1 and T2) concurrently read an overlapping data set (e.g. values V1 and V2), concurrently make disjoint updates (e.g. T1 updates V1, T2 updates V2), and finally concurrently commit, neither having seen the update performed by the other. Were the system serializable, such an anomaly would be impossible, as either T1 or T2 would have to occur “first”, and be visible to the other. In contrast, snapshot isolation permits write skew anomalies.

FoundationDB has a workaround, but I don't know how well that works in practice.

> In TiDB, you can use SELECT … FOR UPDATE statement to avoid write skew anomaly. In this case, TiDB will use locks to serialize writes together with MVCC to gain some of the performance gains and still support the stronger “serializability” level of isolation.

Serializability is more expensive of course, since the database needs to track what was read and not only what was modified. It is possible to achieve this in a distributed database, FoundationDB is one example that supports it.


I got the TiDB coffee mug, it looks better than the MongoDB mug

Tandem (who sold high-availability via redundant hardware) coffee mugs had two handles. What are some other companies whose mugs reflected their value proposition?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact