
Jepsen: YugaByte DB 1.1.9 - aphyr
https://jepsen.io/analyses/yugabyte-db-1.1.9
======
the_duke
First time I heard about them.

Seems to be another distributed SQL (aka 'newsql') alternative to TiDB and
CockroachDB.

Based on RocksDB (like Cockroach) with a custom distributed key/val layer and
and additional SQL layer on top. PostgreSQL protocol compatible.

OS with Apache license.

Seems interesting. (when ignoring the "planet scale SQL" marketing speak...
[1])

[1] [https://www.yugabyte.com/planet-scale-
sql/](https://www.yugabyte.com/planet-scale-sql/)

~~~
marknadal
As an expert in the DB space, I'm extraordinarily cynical.

But their willingness to license it as truly open, Apache-style, instantly is
a big win-over.

I'm a competitor, but I can tell these guys/gals are genuine in their efforts.

We need more people, teams, and DBs like YugaByteDB in the world.

Thank you for your efforts.

~~~
the_duke
The question is, how many of these very similar databases can the market
support?

The field is getting crowded, and the database market is already quite
competitive as it is, without these new competitors.

There just are not that many use cases where a larger Postgres/MySQL instance
with one or two replicas is insufficient.

From a user perspective, I'd much rather have one or two successful companies
where I can be reasonably certain that the product will be maintained in 5
years than too much competition.

~~~
marknadal
This is why the license matters.

Even if the commercial/AGPL/GPL ones "win" for a few years, they won't be able
to compete with these more Open licensed DBs when they catch up.

So at any given moment, too many DBs may be annoying, but for the _long term_
and _long game_ it is important there is this type of competition & research
going on.

Although, I absolutely agree, when it comes to Master-Slave based systems
(I've been very vocal in criticizing them) that market is drying up to some
very limited use cases (banking, etc.). 99%+ of use cases will be Strong
Eventual Consistency and CRDT with distributed or decentralized/P2P tools.

Some really old, yet very relevant, thoughts on this subject:

[https://hackernoon.com/the-implications-of-rethinkdb-and-
par...](https://hackernoon.com/the-implications-of-rethinkdb-and-parse-
shutdowns-c076460058f7)

~~~
jimktrains2
I'm thourghly confused by your association of (a)GPL and commercial and
calling apach/bad licenses more open. AGPL ensures that users always retain
the 4 freedoms, by restricting developers. BSD allows developers to do
whatever, including restricting the users. Neither is "more open", they both
make trade offs and neither is comparable to proprietary except to say the BSD
style licences allow for it if the developer chooses.

Look at the kurfuffle around mongo, redis, and elastic search because of their
licenses. However, you don't hear the same issues coming from the postgres
community. The licenses you're claiming will win the day cause problems for
for-profit I companies, for exactly the reason you think they're "more open".

In the end, either entrenched proprietary software or open, community-focused,
community-stewarded software will win the day.

~~~
marknadal
I believe we both have reasonable arguments from our paradigm, it is just the
paradigms have conflicting definitions.

When people who share camp with me say "Open" or "Freedom" we mean Free Speech
AND Free Beer.

Where the disagreement happens is on Free Speech:

There are many people/governments that define Free Speech as "Free Speech as
long as someone does not shout 'fire' in a crowded room." This is the spirit
of (a)GPL in restricting people.

The other group defines Free Speech and/or "Freedom" as "without restriction".
Not because they _want_ people to yell "fire" but because they attribute
restriction/regulation as the mechanism towards monopoly & centralization. Not
that regulation/restriction on its own is bad (every individual ought exercise
self-discipline), but it is particularly dangerous once monopoly &
centralization emerges because it produces totalitarian or fascist structures.

To counter my own view, many people in the camp opposite of me, have expressed
same end-goal concerns "we want to restrict hate speech so fascism doesn't
rise". I think it is admirable we have shared-goals (stopping
totalitarianism), but for reasons you probably don't share, I think it is more
effective to stop fascism by removing the ability for fascists to enforce
rules/regulation/restrictions on individuals, even if that comes at the cost
or risk of someone yelling "fire".

Why? (I don't assume anyone cares about my view, so don't feel obligated to
read) Because I have higher optimism that humans will eventually overcome
their individual immaturity (shouting "pen--" in a crowd), especially through
incentive design, than in humans overcoming their tendency towards abuse of
power (or even worse, most people who "abuse" power don't think they are
abusing it, they have a conviction that the use of power is for some greater
good). Wielding power is often the end game of any incentive structure, but
yelling "fire" or "p--is" often ruins your reputation/power so naturally is
disincentivized over time (or where it matters most).

~~~
jimktrains2
I feel like your "fire" and "totalitarian" examples are confusing, entirely
off-base and non-illustrative of anything useful to this conversation.

Why? Because the difference between copyleft and non-copyleft licenses isn't
akin to censorship vs no-censorship. The argument for the copyleft is more
akin to the arguments for laws in general: someone's absolute freedoms needs
to be troddened on to have a free society.

I similarly fail to see how a copyleft is a power to abuse. Surely the ability
to close the source of an application has more power that can be abused?

~~~
marknadal
Your 2nd paragraph says pretty much what I was trying to say (except for
difference in law views) that your 1st paragraph says is off-base.

Another way for me to say it is, that _of course_ you would think my thoughts
are off-base since I come from a different foundational base as you. I was
just trying to explain the difference itself, not saying that you need to
change views (your view is logical from your "base").

You think people's freedoms need to be trodden upon for a free society.

I don't. That scares me and many others.

Edit: I did not downvote you, just FYI, I don't know who/why would.

~~~
jimktrains2
> You think people's freedoms need to be trodden upon for a free society.

Do you take this stance with laws against murder and theft? Society has laws
and rules. People as a whole, as all available examples show, do not optimize
for the greater good by default and without any rules or norms.

There are good talking points to the copyleft debate, but that copyleft
imposes rules and non-copleft doesn't is false and doesn't move this debate
forward in any meaningful way.

------
spullara
Since it doesn't support serializable transactions I'm not sure why
FoundationDB would be mentioned as a comparison in the write up. The
operations it does support seem to set the bar pretty low as to what to test.

edit: good reply by the founder of YugaByte but for some reason the comment is
dead. I have noticed that when founders don't have an account on here and then
something comes up where they need to reply their comments are often deaded.

~~~
gigatexal
We use cockroachDB in production and before that we were on MySQL and as of
yet we don’t have a specific usecase where we use serializable transactions.
Snapshot isolation or even read committed is just fine. So I don’t think it’s
absolutely necessary

~~~
gigatexal
To be clear there’s no way around serializable transactions in cockroachDB. We
have had to adapt our monolith to it (we’re thinking of ways to make it more
nimble by breaking out services etc). But the point I was making was that we
had MySQL for a while and never ran into issues with its isolation levels
until it stopped scaling. Instead of vitess or some other MySQL system we went
with cockroach after finding vitess didn’t fit us — too complicated and too
many moving parts. CockroachDB just works. Also moving to k8s adds complexity
too for a monolith built and run on VMs. But so far so good. Cockroach runs
fast and is performant given production queries. And ops is happy because it
self heals.

~~~
redwood
First time I've heard about a production use case. Care to share any details?

------
danburkert
Does YugaByte still use the Raft and HybridTime implementations from Apache
Kudu? If so, how relevant are these results for Kudu?

~~~
kmuthukk
I wanted to add a few details to the previous reply.

While the Raft/HybridTime implementation has its roots in Apache Kudu the
results will NOT be quite applicable to Kudu. Aside from the fact that the
code base has evolved/diverged over the 3+ years, there are key/relevant areas
(ones very relevant to these Jepsen tests) where YugaByte DB has added
capabilities or follows a different design than Kudu. For example:

\-- Leader Leases: YugaByte DB doesn't use Raft consensus for reads. Instead,
we have implemented "leader leases" to ensure safety in allowing reads to be
served from a tablet's Raft leader.

\-- Distributed/Multi-Shard Transactions: YugaByte DB uses a home grown
([https://docs.yugabyte.com/latest/architecture/transactions/t...](https://docs.yugabyte.com/latest/architecture/transactions/transactional-
io-path/)) protocol based on two-phase commit across multiple Raft groups.
Capabilities like secondary indexes, multi-row updates use multi-shard
transactions.

\-- Allowing online/dynamic Raft membership changes so that tablets can be
moved (such as for load-balancing to new nodes).

regards Kannan (Co-founder @ YugaByte)

~~~
mpercy
FWIW, we implemented dynamic consensus membership change in Kudu way back in
2015
([https://github.com/apache/kudu/commit/535dae](https://github.com/apache/kudu/commit/535dae))
but presumably that was after the fork. We still haven't implemented leader
leases or distributed transactions in Kudu though due to prioritizing other
features. It's very cool that you have implemented those consistency features.

~~~
kmuthukk
hi @mpercy,

Thanks for correcting me on the dynamic consensus membership change. Looks
like the basic support was indeed there, but several important enhancements
were needed (for correctness and usability).

\- To make the "online" piece of the membership change work correctly we added
support for LEARNER (PRE VOTER) role (where the new member enters in a non-
voting mode till it's caught up). [https://github.com/YugaByte/yugabyte-
db/commit/909d26e31ecd0...](https://github.com/YugaByte/yugabyte-
db/commit/909d26e31ecd0ef0f87eb677961dcf238f9d7853).

\- Load Balancing (which uses the membership changes) is automatic.
([https://github.com/YugaByte/yugabyte-
db/commit/e4667eb7ec0e6...](https://github.com/YugaByte/yugabyte-
db/commit/e4667eb7ec0e6b870eeb6a8cc34273fe1b9b576b))

\- Remote bootstrap (due to membership changes) also has undergone substantial
changes given that YugaByte DB uses a customize/extended version of RocksDB as
the storage engine and does a tighter coupling of Raft with RocksDB storage
engine. ([https://github.com/YugaByte/yugabyte-
db/blob/master/docs/ext...](https://github.com/YugaByte/yugabyte-
db/blob/master/docs/extending-rocksdb.md))

\- Dynamic Leader Balancing is also new-- it causes leadership to be
proactively altered in a running system to ensure each node is the leader for
a similar number of tablets.

regards, Kannan

~~~
mpercy
Interesting. Just last year we implemented improved re-replication
([https://github.com/apache/kudu/commit/79a255](https://github.com/apache/kudu/commit/79a255))
which sounds very similar to what you did with LEARNER roles, and we added
manually-triggered rebalancing
([https://github.com/apache/kudu/commit/ccdcf6](https://github.com/apache/kudu/commit/ccdcf6)
and
[https://kudu.apache.org/releases/1.8.0/docs/administration.h...](https://kudu.apache.org/releases/1.8.0/docs/administration.html#rebalancer_tool)).

I'm curious if you did anything to prevent automatic rebalancing from being
triggered at a "bad time" or have throttled it in some way, or whether moving
large amounts of data between servers at arbitrary times was not a concern.

I am also curious if you added some type of API using the LEARNER role to
support a CDC-type of listener interface using consensus.

By the way, we also recently added support for rack/location awareness in a
series of patches including
[https://github.com/apache/kudu/commit/ebb285](https://github.com/apache/kudu/commit/ebb285)

We should really start some threads on the dev lists to periodically share
this type of information and merge things back and forth to avoid duplicating
work where possible. I know the systems are pretty different at the catalog
and storage layers but there are still many similarities.

------
robterrell
Not a comment on YugeByte, but... I love it when a new Jepsen report get
released. Kyle Kingsbury has single-handedly raised the bar on an entire
industry. (Well, not single-handedly anymore, but still.)

~~~
Jupe
Couldn't agree more. There are 3 sources of information regarding database
serializability/linearizability:

1\. Marketing material (mostly useless)

2\. Individual projects/post-mortems (50/50 here; some just mis-use the
technology from the get-go, others have valid feedback, but it's tough to
determine when either applies)

3\. Jepsen Tests (which is more like independently verifiable science)

Sure, you can decide that your social-media solution has no need for
consistency (or even durability!) - but in my experience, most solutions don't
have that flexibility.

------
shin_lao
If they rely on clocks, why don't they use PTP? Am I missing something?

~~~
aphyr
I think the YB team members are probably best equipped to talk about this, but
I can note that while some databases do build their own clock synchronization
protocol, many prefer to let the OS handle clocks. For one thing, clock sync
is surprisingly tricky to do well, so it makes sense to write daemons that do
it well _once_ and be able to re-use them in lots of contexts. There's also
the question of HW support: in theory, datacenter and hardware providers could
do better than pure-software time synchronization by, say, offering dedicated
physical links to a local atomic + GPS clock ensemble. AWS TimeSync is a step
in this direction, and I wouldn't be surprised if we see more accurate clocks
in the future.

There are still tons of caveats with this idea--Linux and most database
software ain't realtime, for starters--but you can imagine a world in which
clock errors are sufficiently bounded and infrequent that they no longer
represent the most urgent threat to safety. That's ultimately a quantitative
risk assessment.

My suspicion is that DB vendors like YugaByte and CockroachDB are making a
strategic bet that although clocks right _now_ are pretty terrible, they won't
be that way forever. I'd like to see more rigorous measurement on this front,
because while I've got plenty of anecdotes, I don't think we have a broad
statistical picture of how bad typical clocks are, and whether they're
improving.

~~~
shin_lao
My comment was that PTP is a much better protocol than NTP, and in the doc
they only talk about NTP:

[https://docs.yugabyte.com/latest/deploy/checklist/#clock-
syn...](https://docs.yugabyte.com/latest/deploy/checklist/#clock-
synchronization)

[https://en.wikipedia.org/wiki/Precision_Time_Protocol](https://en.wikipedia.org/wiki/Precision_Time_Protocol)

~~~
sllabres
[https://en.wikipedia.org/wiki/The_White_Rabbit_Project](https://en.wikipedia.org/wiki/The_White_Rabbit_Project)

------
truth_seeker
Is it also tested against ScyllaDB ? ScyllaDB could be up to 10x performant
than Cassandra as backend storage.

~~~
aphyr
Jepsen is not a performance test; we verify safety. I haven't looked at
ScyllaDB personally, but you can read about Scylla's own work testing their
database here [1], and see some of the issues they found here [2].

[1]: [https://www.scylladb.com/2016/02/11/jepsen-
testing/](https://www.scylladb.com/2016/02/11/jepsen-testing/)

[2]:
[https://github.com/scylladb/scylla/issues?utf8=%E2%9C%93&q=i...](https://github.com/scylladb/scylla/issues?utf8=%E2%9C%93&q=is%3Aissue+jepsen+)

~~~
truth_seeker
I never said Jespen is about performance. My comment was only about
considering ScyllaDB as backend choice. Thanks for the links anyway.

~~~
sidch
YugaByte product manager here. The YCQL API which passes Jepsen has its roots
in Cassandra Query Language but does not use Cassandra as its backend store.
It’s backend store is DocDB, which is a Google Spanner-inspired distributed
document store.

