Hacker News new | past | comments | ask | show | jobs | submit login
tiDB – Scalable RDBMS Inspired by Google F1 with Support for MySQL Protocol (github.com)
88 points by haisumbhatti on Sept 7, 2015 | hide | past | web | favorite | 32 comments

So a SQL layer on top of a KV base? Isn't this very similar to what FoundationDB did (and had a lot of trouble with)?

if so, good post by John Hugg of voltdb about it: http://voltdb.com/blog/foundationdbs-lesson-fast-key-value-s...

How does tiDB compare to this?

Darnit, acquired by Apple and shutdown!? I didn't even know! Thanks for the link. His critique is actually the first I've seen on their tech. He speculates quite a bit despite them giving plenty of detail and claiming to use many of same choices as Google's F1. If they failed, I'd love to know in detail how to improve on their successes. Some of his gripes have obvious work-arounds with others probably having work-arounds.

That said, I think props still go to Google's teams on GFS, Spanner, and F1 as the best techs out there. A F1 variant, which two teams are trying, is best approach given it's proven and with much published detail.

Seems like AWS Aurora just got an open source competitor.

I'm just a little sad that it's all MySQL again, now that we're betting on PostgreSQL and don't really look back.

On the performance side I'm just a litte worried however. While Go is an excellent choice for performant and concurrent app-level code, I'm not too bullish on the language on the database level. InfluxDB sucked hard when we put it under some more load, but let's see what comes out there.

PostgreSQL support is on our roadmap.https://github.com/pingcap/tidb/blob/master/ROADMAP.md

InfluxDB's performance problems have very little to do with the implementation language, quite a lot to do with fundamental architectural and team constraints.

Can you detail, or point me to a blog post which details this stuff? As someone who is about to invest a lot of time in researching use of influxdb at scale, I'm interested in doc of the performance problems, and even moreso if the performance is troubling in the long term due to the architecture or team. Anything you can point me to will save me a lot of time.

We choose MySQL protocol first because it's widely used and we are more similar with the tool chain and protocol in comparison with PostgresSQL. About the performance, Golang is design for building distributed system in Google, and the development productivity is perfect, InfluxDB's performance problem is partly caused by other reason.

> I'm just a little sad that it's all MySQL again, now that we're betting on PostgreSQL and don't really look back.

Doesn't Postgres have its own multi-master solution in the works with BDR?

bdr is async replication and doesn't support sharding

aws-aurora doesn't support sharding/scaling

Can you expand on this a bit with some actual technical details about how you think it doesn't scale enough?

aws-aurora is a fancy storage engine + fancy replication

You can't divide the data on multiple servers (sharding)

It will come a time, when you'll exhaust the biggest machine they have available, and then you're stuck

Wouldn't the classic sharding approach at that point simply be to use a second instance? I mean, it'd be great if they handled that for you but I'd tend to assume most apps would have hit the wall on the approach of shoving everything into a single database long before maxing out a 32-core/244GB system with 64TB on SSD.

[meme]You can't simply use a second machine/instance.

While it's true that most companies will be fine in 256GB ram, we were talking about sharding, which it doesn't have.

> [meme]You can't simply use a second machine/instance.

I'm assuming [meme] is shorthand “overly-broad assertion”? It's nice if your database has horizontal scaling built in but it's not like we don't have an entire generation of successful companies who had application-level sharding logic either by necessity or because they found the control it offered was valuable compared to the built-in generic logic.

> While it's true that most companies will be fine in 256GB ram, we were talking about sharding, which it doesn't have.

You still haven't supported the assertion that it's common for places to have massive, heavily-queried databases like this which would not be better split following natural application-level boundaries. This is particularly relevant when discussing AWS as some of the common reasons for keeping everything in one database are sweet-spots for other services (e.g. migrating data for long-term historical reporting over to RedShift).

Again, I'm not questioning that integrated sharding would have its uses – only your sweeping assertion that this is a likely problem for most people and that it's a dead-end (“you're stuck”) rather than merely one of many growing pains which you'll deal with on a successful product. In particular, it's unlikely that everyone will have the same right answer at that scale since access patterns vary widely.

Wrong, aurora supports read replicas, and of course client side sharding.

Every database supports client side sharding.

I'm curious to know the motivation behind publicizing the project at this stage in development, as it seems like the key feature (distributed transactional storage engine) is quite far away on the road map.

Are there any design documents detailing its implementation? I checked the wiki but it didn't look like there was anything there. What alternatives were considered, and why were they abandoned?

Also, is there a concrete use case for which this system is being built? If so, what are some (publicly releasable) details about the use case, e.g. access patterns, data volume, etc.?

We open-sourced on early stage because we want to get feeds back from the community and build the project with community involved.

We will consistently deliver new features like HBASE support, and the documentation will be improved as well.

Could one of the authors provide some more information about the goals and architecture? Do I understand it correctly that the goal is to implement a relational database on top of one of a couple of different key value stores and aiming, among other things, for drop in compatibility with MySQL? What are expected benefits?

EDIT: The architecture diagram wasn't visible to me before, now that it suddenly appeared things are way more clear.

The goal of TiDB is to create fault-tolerant distributed RDBMS. Especially for distributed transaction support. Let developers get benefits without changing any of their existing code.

Please correct me if I am wrong, but it look like it is not distributed yet. From the roadmap : https://github.com/pingcap/tidb/blob/master/ROADMAP.md Distributed KV and distributed Transactions have no checkmark! So in it's current state, how does that differ from using sqlite?

How does it compare to CockroachDB?

CockroachDB is closer to Spanner. The globally distributed primitives that power their SQL layer which is F1. I guess that TiDB is on layer above Cockroach.

Cockroach Dev here. We're actively working on a SQL layer actually. We really need to update our github readme to contain info about it. It's under active development and of course, it's inspired by F1.

Cockroach dev here. We're actually currently implementing a SQL layer within cockroach. So it wont' be layered, it's the main method of interaction with CockroachDB.

One of TiDB authors here. Our design is more similar to Google F1, currently we're focusing on building an independent SQL layer above any transactional KV storage.

Does there exist any distributed transactional KV storage engine (open source) ?

Yes, currently TiDB works well with HBase. Coming soon. There are alse some distribute transaction layer on HBase.Ex:https://github.com/XiaoMi/themis

So TiDB is similar to the (unfortunately) abandoned FoundationDB SQL Layer, but written in a proper language? Nice :)

Aha! You got it. And we also want to build the distributed kv part in a proper way.

Looks good, especially the PostgreSQL support coming up.

What about MPP (Massively Parallel Processing) support? This is one of the things that make CitusDB so powerful.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact