
TiDB – A distributed NewSQL database compatible with MySQL protocol - the_duke
https://github.com/pingcap/tidb
======
yogthos
Wonder how this compares to ActorDB
[http://www.actordb.com/](http://www.actordb.com/)

~~~
biokoda
(ActorDB developer here)

Most distributed newSQL databases are an SQL layer on top of a distributed KV
store. What they try to do is hide the distributed reality of their database
so it acts like a regular database from the client side. Of course there are
always caveats that might not be completely obvious but can cause terrible
performance.

We take the opposite approach. We make the user aware of the distributed
nature and force the user to use the distributed database like a distributed
database should be used. You must split your data into chunks (actors) and you
have a full raft replicated SQL engine (SQLite) within that chunk.

I think TiDB is comparable more to CockroachDB.

------
lobster_johnson
Previous discussion:
[https://news.ycombinator.com/item?id=10180503](https://news.ycombinator.com/item?id=10180503)

------
rystsov
Many distributed databases supporting linearizability fail to provide
consistent backups.

MongoDB's docs: "To capture a point-in-time backup from a sharded cluster you
must stop all writes to the cluster"

Cassandra's docs: "To take a global snapshot, run the nodetool snapshot
command using a parallel ssh utility ... This provides an eventually
consistent backup. Although no one node is guaranteed to be consistent with
its replica nodes at the time a snapshot is taken"

Riak's docs: "backups can become slightly inconsistent from node to node"

CockroachDB's docs: "The table data is dumped as it appears at the time that
the command is started ... there is no guarantee that NOW() is monotonic in
transaction order"

Does TiDB support consistent backups? Are there any docs covering how it is
implemented?

~~~
ngaut
Yes, it does. Thanks to MVCC, TiDB supports the repeatable read isolation
level which guarantees that any data read cannot change, if the transaction
reads the same data again, it will find the previously read data in place,
unchanged, and available to read.

So we can use any MySQL tools such as mysqldump or mydumper to backup the
database consistently.

See the MVCC implementation here:
[https://pingcap.github.io/blog/2016/10/17/how-we-build-
tidb/](https://pingcap.github.io/blog/2016/10/17/how-we-build-tidb/)

------
tyingq
TiDB just rolled out RC1, this google groups post might be helpful to get an
idea of where they are at: [https://groups.google.com/forum/#!topic/tidb-
user/4_cpTCSkKZ...](https://groups.google.com/forum/#!topic/tidb-
user/4_cpTCSkKZo)

~~~
shenli3514
Please feel free to discuss in this google group or raise a new issue on
github.

------
andrewchambers
This very similar to cockroachdb, only mysql compatible instead of postgres
compatible.

~~~
ngaut
There are lots of other differences. The default distributed storage engine of
TiDB is TiKV, and TiKV is written is Rust. The transaction mode is different,
and so on. The are something similar are they are both NewSQL, both use Raft
to replicate data.

~~~
simonw
Any idea why the same people appear to have used Rust for their KV store but
Go for their SQL frontend to it? They are both great languages, I'm just
surprised to see one team with major projects spread across the two.

~~~
shenli3514
Go is good at concurrent concurrency and has great development efficiency. We
could easily develop complex SQL logical and parallelize many operators. But
its GC and cgo overhead is not suitable for developing storage engine. So we
choose Rust to develop TiKV.

~~~
lobster_johnson
Isn't the overhead of calling Rust from Go quite high?

Also, I'd assume that Rust would be better at expressing and pattern-matching
against the sort of complex execution plan trees and query expressions you
need for this sort of thing.

I was looking at Apache Spark the other day, which is written in Scala, and
which has a planner that goes SQL -> AST -> logical plan -> physical plan. The
planner/optimizer relies extensively on pattern matching to apply rules to the
logical plan (pushdown and so on), and it manages a lot of this with "match"
statements and not a lot of recursion.

But that stuff is murder in Go, which doesn't have pattern matching, or
generics for that matter. I know it's bad because I'm in the middle of
something similar right now, in Go. The Spark code is _exactly_ how I wanted
to organize it (minus the awful class inheritance they do), but that's not
possible in Go.

~~~
shenli3514
Go code communicate with Rust code through network. So there is no overhead.
Go has interface, which could be used for expressing the AST, plan tree. We
define some interface for AST node and plan tree node. It is convenient to
apply some rules (predict-pushdown/column-pruning/cost-computing) on the tree.
You could refer to the code in
[https://github.com/pingcap/tidb/blob/master/plan/plan.go](https://github.com/pingcap/tidb/blob/master/plan/plan.go)

~~~
biokoda
> So there is no overhead.

Other than going through the kernel to talk to a library..

~~~
shenli3514
We use RocksDB as the single-machine storage engine and build TiKV above
RocksDB as a distributed storage engine. At first we consider use Go to build
TiKV, but the cgo overhead between TiKV and Rocksdb is considerable. So we
turn to Rust. The network communication overhead between TiDB and TiKV has
nothing to do with cgo.

------
devty
Has go language become a go-to language to build SQL frontend for distributed
databases? If so, why is that?

~~~
jpgvm
Likely because of it's nature as a network first language. Go is largely
speaking a language designed to push bytes over sockets to lots of concurrent
clients.

~~~
shenli3514
There are two more things: 1. Go has high development efficiency. So we could
build complex SQL logic easily. 2. Go is easy to write concurrent code. So we
could enjoy the benefit of multi-core cpu. For example, we could use multiple
goroutine to scan data, do join, do aggregation.

------
aamederen
Comparison Request: Citus
([https://github.com/citusdata/citus](https://github.com/citusdata/citus))

------
eternalban
Do they mean a Relational database? SQL is a language.

~~~
coderholic
NewSQL has a specific meaning:
[https://en.wikipedia.org/wiki/NewSQL](https://en.wikipedia.org/wiki/NewSQL)

~~~
eternalban
Coined by a journalist, no less ... [https://451research.com/analyst-
team/analyst/Matt+Aslett](https://451research.com/analyst-
team/analyst/Matt+Aslett)

