
Jepsen: YugaByte DB 1.3.1 - aphyr
https://jepsen.io/analyses/yugabyte-db-1.3.1
======
dougwbrunton
I cannot recall seeing a public Jensen report that didn’t result in better
software. Kudos to the Jepsen team and their clients.

------
pdeva1
the db doesnt guarantee linearizability (assuming clock skew). even the post
says stale counters values can be read.

Doesn't that mean that following use cases are not possible:

    
    
      1. keeping count of page views   
      2. vote count per ticket/post  
      3. like/dislike counts per entity
    

a reason to use highly scalabale db like Yuga would be to keep such counts.

~~~
aphyr
All those should be possible; you just might not observe the state of the
counter as it was a few milliseconds(?) ago, rather than what it might be
right now. If you're dealing with extremely frequent updates where the chances
of observing that sort of anomaly are more common, chances are the precise
value of the counter doesn't matter so much, and an occasional off-by-one or
off-by-two isn't the end of the world. YMMV, of course. :)

~~~
pdeva1
>you just might not observe the state of the counter as it was a few
milliseconds

did you mean 'might observe' instead of 'might not'? stale read would result
in the older value being read.

i believe the anomaly would then occur depending on how close the 2 events are
in time, regardless of frequency.

in that case, precise counts can be needed even with low frequency. say you
are a restaurant with 1 burger left. 2 folks order the same time, but you
decrement the count by 1 only.

also anything involving monetary values, you definitely don't want to read
stale values ever.

~~~
aphyr
Yes, thank you. I was rushing to get out the door and edited the sentence
carelessly! You'll likely see a current value _most_ of the time, even under
high contention, but might occasionally see a slightly stale one.

As grogers notes, a stale read is a different phenomenon than the burger
problem you describe, which is also called "lost update". We didn't observe
any cases of lost updates in YugaByte DB.

As for monetary values, you might find it interesting to ask someone who works
in banking or fintech about recency properties of their data systems.
Latencies on the order of multiple days are surprisingly common.

~~~
pdeva1
correct me if i am wrong here:

1\. burger case, i do: count = count-1 if both txs see count=1, we get count=0
at the end.

2\. i didnt say banking per se. can involve a simple billing system of a
startup. or any critical data where you need to ensure you are reading
accurate, uptodate values. maybe a leaderboard.

>You'll likely see a current value most of the time

'most' is not a guarantee :) either the system is designed with seeing
uptodate values or not. and if 'some' of the time the value is stale, you have
to program with low consistency in mind.

the marketing on yugabyte's page makes it seems it can replace db's like
cassandra and give you a consistent view of your data. but if one is seeing
stale values, you are back to coding like data is non-consistent

~~~
aphyr
_correct me if i am wrong here... i do: count = count-1 if both txs see
count=1, we get count=0 at the end._

I, er, don't mean to be rude, but... both grogers and I have already explained
that this idea is somewhat less than correct. The anomaly you're describing is
called lost update, and is explicitly prohibited by both snapshot isolation
and serializable isolation, the two isolation levels supported by YugaByte DB.
Linearizability is not necessary to prevent lost updates. This is not only a
theoretical fact, but supported by experimental evidence: we have extensively
tested YugaByte DB for lost updates, and have not (yet) observed any.

 _i didnt say banking per se. can involve a simple billing system of a
startup. or any critical data where you need to ensure you are reading
accurate, uptodate values. maybe a leaderboard._

While linearizability may be a nice property for these systems to have, it is
rarely _necessary_. Billing systems and leaderboards, like bank ledgers and
shopping carts, are often designed as append-only ledgers with eventual
consistency, employing sealing windows, compensating transactions, and time-
shifting to handle late discovery of events. Others are designed as as reports
periodically derived from some underlying datastore via, say, an ETL process;
data may be not only milliseconds, but even days or weeks out of date. That's
not to say this is a universal pattern, but I can think of a few dozen systems
off the top of my head.

I know this because I've worked on several systems like this, including in
fintech, and consulted for companies and government organizations building
others. I can also offer some anecdotal experience here. For example, I have
no fewer than three emails in my inbox from AWS's billing system informing me
of missing data or other mistakes in the previous month's bill, accompanied by
updated reports providing newer data. Last summer, my bank deferred visibility
of some transaction data for over _six weeks_.

This might feel counter-intuitive, but since money is fungible and addition is
commutative, it's one of the _easiest_ things to work with in a stale or even
eventually-consistent manner. Where you _really_ want linearizability (or at
least sequential consistency) is in domains where operations _don 't_ commute.
Linearizability is particularly important where those non-commutative
operations involve side-channels, but that's a longer discussion for another
time.

 _' most' is not a guarantee_

This is true, but there is a qualitative difference between a system which
exhibits, say, a 5ms stale read once per thousand transactions during clock
skew, and a 10-day stale read one in every two transactions all the time. I
mention these numbers to provide a rough characterization of anomaly frequency
and severity.

 _the marketing on yugabyte 's page makes it seems it can replace db's like
cassandra and give you a consistent view of your data. but if one is seeing
stale values, you are back to coding like data is non-consistent_

Linearizability is not the only form of consistency model, and many models
have no realtime properties whatsoever. You might find
[https://jepsen.io/consistency](https://jepsen.io/consistency) informative.

~~~
pdeva1
thanks for clarifying. appreciate the detailed answer.

------
pdeva1
seems like yugabyte, like postgres, also spawns a process per connection; thus
requiring use of external connection pool

~~~
rkarthik007
Very astute observation! Two things:

* We implemented a feature to share memory between these connection handling processes so that makes it a bit more efficient

* Longer term we are thinking of switching to a thread based model. This is how our other API YCQL works.

(cto / co-founder)

~~~
ralusek
I think this is a very important thing to prioritize for a DB of this nature.
Postgres/RDBMS connections in general are an extreme weak spot for
containerized/serverless applications.

DynamoDB/ElasticSearch's approach of not needing to establish/maintain a
connection seems like the ideal solution. I think for applications that would
utilize your DB, a minor hit to the speed of an individual request is well
worth it as compared to being able to scale up the amount of simultaneous
requests.

~~~
rkarthik007
Yes absolutely spot on!

Peeling the onion one more layer, there are two underlying features that are
required:

1) Move to a threaded model to be able to scale instantaneously. This is
already the case with YCQL, we're planning on doing this for YSQL.

2) There needs to be a change to the client drivers in order to become smarter
to deal with how the app connects to the DB. In fact, we're working on
enabling this for the Spring (Java) ecosystem by enhancing the JDSB driver
(still in the early stages):
[https://github.com/YugaByte/ybjdbc](https://github.com/YugaByte/ybjdbc)

(cto/co-founder)

~~~
pdeva1
what does the JDBC driver has to do with thread per connection in the
database? jdbc doesnt care if you use threads or processes for the connection

~~~
rkarthik007
True for a single node. However, there will be multiple IP address/connections
anyway, since YugaByte DB is a distributed DB and it runs across multiple
nodes. JDBC drivers connect to only one node - so to provide true connection
scaling, the client side would need to become aware of a multi-node cluster
and do connection pooling across these. Of course there are other issues also
that need to get solved that are slightly related:

* The node given to the JDBC driver could be down, or nodes could get added/removed over time. This makes "discovery of nodes" a problem.

* We could use a random round robin strategy to connect to nodes - where client connects to a random cluster node which internally connects to the appropriate node. This would result in an increased latency, and also an increase in the net number of connections needed.

These may not matter if the load balancer is smart like in the case of
Kubernetes (and where the extra hop becomes mandatory as well). But for
non-k8s deployments, these help.

------
darshanime
Great. I am not sure about using yugabytedb now.

~~~
aphyr
Keep in mind that this analysis focuses on features which are still in beta;
while YugaByte DB advertises SQL features throughout their marketing, the
documentation does state that SQL isn't production-ready yet. We should expect
bugs at this stage! :-)

