
YugaByteDB – A Transactional Database with Cassandra, Redis and PostgreSQL APIs - mountainview
https://github.com/YugaByte/yugabyte-db
======
lukeqsee
Does anyone know a comparison between this and CockroachDB? Or have experience
running either in production?

They seem to compare against other databases, but not against Cockroach which
seems to be the biggest competitor. I'm looking at implementing a global-scale
database cluster with very specific requirements, and YugaByte seems to meet
those but a comparison against CockroachDB seems warranted.

Edit: just noticed they do compare features against CockroachDB:
[https://docs.yugabyte.com/latest/comparisons/#distributed-
sq...](https://docs.yugabyte.com/latest/comparisons/#distributed-sql-
databases), but they don't have an in-depth comparison.

~~~
atombender
YugaByte only added distributed transactions in the new 1.1 release
(previously, it only supported single-row transactions). The architecture is a
variation on 2PC and seems sound, but I think it's fair to say that it's early
days. Meanwhile, Cockroach and TiDB both have battle-tested implementations.

YugaByte is pretty quirky. Rather than settle on a native data model, they
offer several "personalities" that mimic other products (Cassandra, Redis and
PostgreSQL), and you can mix and match them. But they've implemented each API
with their own set of weird warts. For example, their CQL implementation has
CREATE INDEX, but it does not index existing data [1]. You have to either
create the secondary index _before_ inserting data, or force a reindex of
everything with a dummy UPDATE statement. Who would ship such a product?

More warts/Cassandraisms: UPDATE is actually an upsert; an update that doesn't
match a row will insert it. And SELECT ... WHERE expressions can only use AND
expressions (!) [2].

Hopefully the forthcoming SQL API should be saner, but it's very limited at
the moment. It does not seem to support joins, transactions or indexes, for
example. Meanwhile, CockroachDB and TiDB both have rich SQL implementations,
including joins and aggregations, with cost-based query optimizers that can
take advantage of multiple indexes and table statistics.

It seems more appropriate to compare YugaByte with distributed key/value
stores like FoundationDB, Cassandra, Scylla and Redis.

[1]
[https://docs.yugabyte.com/latest/explore/transactional/secon...](https://docs.yugabyte.com/latest/explore/transactional/secondary-
indexes/)

[2]
[https://docs.yugabyte.com/v1.0/api/cassandra/dml_select/#roo...](https://docs.yugabyte.com/v1.0/api/cassandra/dml_select/#root)

~~~
manigandham
Those things you mentioned are just how Cassandra works. It's a wide-column
key/value store and everything is an upsert because that's how the write path
is designed, there is no read-then-write (other than LWTs). If you're using an
application that expects Cassandra then this is normal so why would Yugabyte
change the semantics?

It's true that indexing is unfinished but they've also long been problematic
in the CQL data model so Yugabyte is moving faster and actually releasing
something for those who can work with it. Cassandra took years to come up with
several types of indexes that all have problems and Scylla is overdue by 18
months on their implementation.

~~~
atombender
That's why I referred to the behaviour as Cassandraisms.

The question was posed in the context of CockroachDB, so my answer stands --
CDB vs. YugaByte is an apples to oranges comparison, seeing as YugaByte isn't
an RDBMS by any useful measure, and has a very long way to go to get there.

------
lykr0n
When I see databases like this pop up, a part of me wonders why they don't
devote effort to develop a plugin for MariaDB or PostgreSQL? Follow in the
steps of Citus Data.

~~~
isoos
Several months ago, setup of a Citus Cluster seemed to be a not
straightforward, while at the same time I was able to setup and run a
CockroachDB cluster in less than half hour with docker. I've checked YugaByte
documentation about setup, and CockroachDB still won in operation complexity.

I doubt that the same will be done with PostgreSQL or MariaDB anytime soon.
People are begging for a good and easy psql HA cluster setup for ages, and it
is just not happening (yeah, they are getting closer and closer), while such
DBs fill a need with good results.

~~~
bmatican
@isoos, I'd be curious to understand more about why you thought YugaByte would
be more operationally complex to deploy, as we strive to make things as easy
to get started with as possible...

For example, for local testing, in our Quick Start docs section, we have info
for: mac/linux, involving just downloading and unpacking the release and you
should be good to go; for local docker, you can download our control script;
for k8s download our sample yml.

Finally, for non-local testing, we have a Deploy > Manual Deployment section,
highlighting the 3 steps for downloading YugaByte, bringing up Masters
(metadata nodes) and then Tservers (data nodes).

Note: I work at YugaByte.

~~~
isoos
With CockroachDB, I was able to create a cluster with three commands, one
command on each node (docker run ....), one port on each node. It can't really
get more simple than that.

With YugaByteDB: I needed to think about master and tserver nodes, to run them
and connect them separately. There was no clear guide that had "run these
three docker command on the three different nodes" and be done with it.

Note: I run everything is in simple docker images (and not compose, not swarm,
not kubernates, just plain docker images).

------
ralfn
Those are a lot of claims. Databases that make half those claim, turn out to
be misrepresenting some of them. So far its empty promises on a box.

When can we expect independent 3rd party evaluation of all these claims? The
most famous one is Jespen:

[https://jepsen.io/analyses](https://jepsen.io/analyses)

Because this all sounds way too good to be true. What's the catch? How stable
is this? Are the claims tested other than in theory?

~~~
manigandham
They made a post about testing with Jepsen: [https://blog.yugabyte.com/jepsen-
testing-on-yugabyte-db-data...](https://blog.yugabyte.com/jepsen-testing-on-
yugabyte-db-database/)

~~~
ralfn
Awesome. Thanks. This is very very encouraging!

------
sunnycpp
Design is very similar to Hbase minus the dependency on Zookeeper and HDFS.
RocksDB has been very smartly modified to use it as an optimized storage
layer.

