
TiDB: A Raft-based HTAP Database [pdf] - Lilian_Lee
http://www.vldb.org/pvldb/vol13/p3072-huang.pdf
======
Lilian_Lee
Here's more reference: [https://pingcap.com/blog/how-tidb-htap-makes-truly-
hybrid-wo...](https://pingcap.com/blog/how-tidb-htap-makes-truly-hybrid-
workloads-possible) This post introduces the design details of the HTAP
architecture of TiDB, including the real-time updatable columnar engine, the
multi-Raft replication strategy, and smart selection.

~~~
sitkack
These hybrid databases are really exciting. I also really like how mature
pingcap is from a technical perspective. On the benchmarking page this
paragraph really stood out.

> If you don't have reproducible, fair benchmarking, you can be blinded by
> your own hubris. Our contributors and maintainers depend on benchmarks to
> ensure that as we strive to improve TiDB, we don't negatively impact its
> performance. To us, not having benchmarks is like not having logging or
> metrics.

------
CodesInChaos
One important limitation is that TiDB only offers snapshot isolation, not
serializability.

~~~
nextaccountic
Is serializability desirable in a distributed database?

~~~
CodesInChaos
It's just as desirable as with a simple database since it avoids certain
anomalies (write skew). For many operations it's strong enough, but it adds
the risk that you run into problematic anomalies you did not anticipate.

> In a write skew anomaly, two transactions (T1 and T2) concurrently read an
> overlapping data set (e.g. values V1 and V2), concurrently make disjoint
> updates (e.g. T1 updates V1, T2 updates V2), and finally concurrently
> commit, neither having seen the update performed by the other. Were the
> system serializable, such an anomaly would be impossible, as either T1 or T2
> would have to occur “first”, and be visible to the other. In contrast,
> snapshot isolation permits write skew anomalies.

FoundationDB has a workaround, but I don't know how well that works in
practice.

> In TiDB, you can use SELECT … FOR UPDATE statement to avoid write skew
> anomaly. In this case, TiDB will use locks to serialize writes together with
> MVCC to gain some of the performance gains and still support the stronger
> “serializability” level of isolation.

Serializability is more expensive of course, since the database needs to track
what was read and not only what was modified. It is possible to achieve this
in a distributed database, FoundationDB is one example that supports it.

[https://tikv.org/deep-dive/distributed-
transaction/isolation...](https://tikv.org/deep-dive/distributed-
transaction/isolation-level/)

------
balboah
I got the TiDB coffee mug, it looks better than the MongoDB mug

~~~
082349872349872
Tandem (who sold high-availability via redundant hardware) coffee mugs had two
handles. What are some other companies whose mugs reflected their value
proposition?

