
Tidis - Redis protocol on top of TiKV - jinqueeny
https://github.com/yongman/tidis
======
justicezyx
I am actually surprised that given today's wide availability of cloud service,
new low-level storage softwares are still getting invented.

Questions to fellow HNers: \- If you need to deploy such software for your
business, was it because optimizing GCP/AWS is not cost-effective or the same
level of performance is not possible on public cloud?

~~~
gregwebs
Your view point of just using something already hosted by GCP/AWS is probably
the default today for good reason. I would break down the reason to not do
that to three reasons: cost, performance, and customization. Lets ignore cost
(which is complex) and focus on the TiDB and AWS example.

TiDB was build to operate at a scale for OLTP (scale writes horizontally just
by adding more nodes) that doesn't exist on Amazon's MySQL. You would have to
you DynamoDB and downgrade from SQL transactions to eventual consistency.

In terms of customization, TiDB is offering something AWS does not: the
ability to store ever more of your data in one distributed data store (TiKV)
and to leverage that in different ways (query it with TiSpark, and perhaps
more projects like this in the future).

~~~
elvinyung
Correction: AWS does have this kind of strongly consistent, scalable OLTP
database in Aurora.

~~~
ddorian43
Correction: No. Aurora is single node. No horizontal scaling.

~~~
wgjordan
Aurora currently supports automatic horizontal scaling for reads [1], and
horizontal scaling for writes is currently in preview [2].

[1]: [https://aws.amazon.com/about-aws/whats-new/2017/11/amazon-
au...](https://aws.amazon.com/about-aws/whats-new/2017/11/amazon-aurora-now-
supports-auto-scaling-for-aurora-replicas/)

[2]: [https://aws.amazon.com/about-aws/whats-new/2017/11/sign-
up-f...](https://aws.amazon.com/about-aws/whats-new/2017/11/sign-up-for-the-
preview-of-amazon-aurora-multi-master/)

~~~
antoncohen
> horizontal scaling for writes is currently in preview

I don't expect multi-master to increase write performance much. It will scale
out network connections and maybe query processing, but not actual writing of
data.

It will replicate all data to all nodes, and the replication will be
synchronous to any node it can automatically fail over to. So a client will be
able to connect to any master, but that master will have to synchronously
write to at minimum all other masters, if not all other nodes, before the
server can acknowledge the write back to the client. Every node will need the
write performance to write every write. That isn't scaling out writes.

~~~
wgjordan
Feel free to maintain your skepticism until public benchmarks are available,
but Amazon has specifically advertised multi-master will horizontally scale
out writes for increased write performance [1]:

> Workloads that require even higher write throughput will benefit from
> horizontally scaling out their writes with additional master instances.

[1]: [https://aws.amazon.com/about-aws/whats-new/2017/11/sign-
up-f...](https://aws.amazon.com/about-aws/whats-new/2017/11/sign-up-for-the-
preview-of-amazon-aurora-multi-master/)

~~~
ddorian43
But no sharding. All writes will go to all servers.

~~~
wgjordan
You're probably thinking of 'group replication' in standard MySQL. Aurora's
horizontal scaling for reads, writes and storage capacity is quite different
since compute and storage is decoupled, and the concepts of 'sharding' or
sending writes to 'servers' don't really translate directly.

In Aurora [1], data is partitioned into 10GB chunks ('protection groups'),
with copies stored in 6 'storage nodes' spread across 3 Availability Zones.
When the 'server' (database node) handles a write, the data is sent in
parallel to all 6 storage nodes, and the write is accepted as soon as a quorum
of storage nodes (4) writes the data to its redo log and responds. The storage
nodes also continuously back up the data to S3 for extra durability.

These 6 parallel storage-node operations are performed for every write, even
in a single-master Aurora setup. Multi-master simply scales out the number of
'database nodes' (compute) for handling queries, while keeping the number of
storage nodes (and the total write-amplification) constant. In the rare case
when conflicting write-transactions occur that can't be resolved locally at
either the database or storage layer (e.g., when data changed at both multiple
database nodes AND multiple storage nodes), a hierarchical conflict-resolution
mechanism is invoked to decide which transaction succeeds and which gets
rolled back [2]. This reportedly provides 'near-linear performance scaling
when there is no or low levels of conflicts', even when dealing with multiple
masters spread across multiple regions [3].

[1]: [https://aws.amazon.com/blogs/database/introducing-the-
aurora...](https://aws.amazon.com/blogs/database/introducing-the-aurora-
storage-engine/)

[2]: [https://www.slideshare.net/AmazonWebServices/deep-dive-on-
th...](https://www.slideshare.net/AmazonWebServices/deep-dive-on-the-amazon-
aurora-mysqlcompatible-edition-dat301-reinvent-2017/35)

[3]: [https://www.slideshare.net/AmazonWebServices/deep-dive-on-
th...](https://www.slideshare.net/AmazonWebServices/deep-dive-on-the-amazon-
aurora-mysqlcompatible-edition-dat301-reinvent-2017/38)

~~~
ddorian43
When (and ONLY WHEN) they make the chunks only be available on SOME servers,
then it will be. See tikv/tidb. Tikv is storage but data is PARTITIONED while
tidb is computation (no persistence). Currently, difference is that, functions
only happen on "compute" (so data transfer) while on tikv, functions happen on
"storage" layer, and "compute" layer is for aggregation.

When they do that it will be called horizontal scaling.

------
littlestymaar
I think that's a good example of the different use-cases where Go and Rust
excel at: The front-end (Tidis), which offers a lot of feature but little
complexity, is written in Go, and the back-end(TiKV), where complex stuff is
done and performance must be immaculate, is in Rust.

~~~
ddorian43
Pretty soon you'll have to write the whole thing in rust/c++ depending on how
much efficiency you really need.

At least move many things in coprocessors in tikv layer.

------
imauld
Are there docs or examples of how to use this somewhere? I love Redis and I
love using Go so I would be interested in trying this out for hobby projects
and it would be helpful to have some documentation/examples.

~~~
ceohockey60
Not directly related to tidis, but I did read about another Redis-on-top-of-
Tikv use case by Ele.me, which is apparently this big food delivery platform
like Doordash that's super popular in China. Here's the post:
[https://zhuanlan.zhihu.com/p/35226447](https://zhuanlan.zhihu.com/p/35226447)
(too bad it's in Chinese right now. Hopefully someone translates it into
English soon...)

~~~
c4pt0r
I think the team is working on it right now.

------
jazoom
>This repo is WIP now and has lots of work to do, and for test only.

Cool project though.

------
maxpert
Benchmarks please?

------
etaioinshrdlu
Redis has been called a "data structure server".

Isn't that what a database is?

Is there anything fundamentally better about Redis than say, a single
connection to Postgres?

Except with Postgres you would be allowed many concurrent connections and
greater throughput, while having the same external behavior thanks to
transaction serializability. It's as if the operations got applied by a single
thread in a defined order, but in reality can be parallelized. Redis cannot do
that intelligently.

I use Redis occasionally because it's a very simple system and it's easy, and
has been micro-optimized very well. I don't think it necessarily contains any
good computer science ideas. Am I wrong here?

Tidis is more evidence that Redis's single threaded design leaves a ton on the
table.

~~~
meowface
Many in-memory databases don't have native data types for sets, sorted sets,
hash tables, bitmaps, HyperLogLogs. Postgres and Redis have different use
cases.

~~~
etaioinshrdlu
A unique index in Postgres is an efficient set data structure.

A sorted set is a index in Postgres, stored as a B-Tree.

Postgres has hash indexes, basically the same as a hash table.

And you can run Postgres in memory if you want via tmpfs.

Bitmaps and hyperloglogs are basically math functions running in the database,
and Postgres supports user-defined math functions. It seems disingenuous to
point out individual math functions when both databases support arbitrary user
defined math functions.

~~~
matthewmacleod
You are conflating structures used internally to implement a relational
database, and structures exposed as a first-class building block for your own
systems.

Think of Redis a a shared, in-memory, well-tested implementation of basic data
structure primitives that takes care of a lot of the fiddly nonsense around
them without you having to do the work. Postgres is awesome, but it's targeted
at a different use case and has different characteristics; it's overkill for
something like a shared key-value store, for example.

