Hacker News new | past | comments | ask | show | jobs | submit login
Jepsen: YugaByte DB 1.3.1 (jepsen.io)
106 points by aphyr on Sept 5, 2019 | hide | past | favorite | 23 comments

I cannot recall seeing a public Jensen report that didn’t result in better software. Kudos to the Jepsen team and their clients.

the db doesnt guarantee linearizability (assuming clock skew). even the post says stale counters values can be read.

Doesn't that mean that following use cases are not possible:

  1. keeping count of page views   
  2. vote count per ticket/post  
  3. like/dislike counts per entity
a reason to use highly scalabale db like Yuga would be to keep such counts.

All those should be possible; you just might not observe the state of the counter as it was a few milliseconds(?) ago, rather than what it might be right now. If you're dealing with extremely frequent updates where the chances of observing that sort of anomaly are more common, chances are the precise value of the counter doesn't matter so much, and an occasional off-by-one or off-by-two isn't the end of the world. YMMV, of course. :)

>you just might not observe the state of the counter as it was a few milliseconds

did you mean 'might observe' instead of 'might not'? stale read would result in the older value being read.

i believe the anomaly would then occur depending on how close the 2 events are in time, regardless of frequency.

in that case, precise counts can be needed even with low frequency. say you are a restaurant with 1 burger left. 2 folks order the same time, but you decrement the count by 1 only.

also anything involving monetary values, you definitely don't want to read stale values ever.

If you earmarked the last burger to multiple people, it would violate serializability, not linearizability. So you should be good on that. Violating linearizability would look like this: with no burgers left, the kitchen cooks a burger, marks it as available in the DB and then shouts that it's ready. A customer hears them shout and immediately tries to order it, but fails because the customer's transaction happened at an earlier logical time than the kitchen's. Kyle is saying there are still relatively small time bounds on such violations (usually?) so it's probably not the biggest problem your restaurant has to worry about (just apologize and have the customer try again).

Dealing with money is similarly not a problem. ACH takes much longer to clear than the uncertainty window of linearizability violations. The client probably won't be able to tell the difference.

For banking systems in reality, monetary values in databases are immutable and only ever appended to. Banking is often used as a reason for strong consistency but it actually is almost always eventually consistent.

Yes, thank you. I was rushing to get out the door and edited the sentence carelessly! You'll likely see a current value most of the time, even under high contention, but might occasionally see a slightly stale one.

As grogers notes, a stale read is a different phenomenon than the burger problem you describe, which is also called "lost update". We didn't observe any cases of lost updates in YugaByte DB.

As for monetary values, you might find it interesting to ask someone who works in banking or fintech about recency properties of their data systems. Latencies on the order of multiple days are surprisingly common.

correct me if i am wrong here:

1. burger case, i do: count = count-1 if both txs see count=1, we get count=0 at the end.

2. i didnt say banking per se. can involve a simple billing system of a startup. or any critical data where you need to ensure you are reading accurate, uptodate values. maybe a leaderboard.

>You'll likely see a current value most of the time

'most' is not a guarantee :) either the system is designed with seeing uptodate values or not. and if 'some' of the time the value is stale, you have to program with low consistency in mind.

the marketing on yugabyte's page makes it seems it can replace db's like cassandra and give you a consistent view of your data. but if one is seeing stale values, you are back to coding like data is non-consistent

Cassandra is designed for eventual consistency across multiple independent regions.

correct me if i am wrong here... i do: count = count-1 if both txs see count=1, we get count=0 at the end.

I, er, don't mean to be rude, but... both grogers and I have already explained that this idea is somewhat less than correct. The anomaly you're describing is called lost update, and is explicitly prohibited by both snapshot isolation and serializable isolation, the two isolation levels supported by YugaByte DB. Linearizability is not necessary to prevent lost updates. This is not only a theoretical fact, but supported by experimental evidence: we have extensively tested YugaByte DB for lost updates, and have not (yet) observed any.

i didnt say banking per se. can involve a simple billing system of a startup. or any critical data where you need to ensure you are reading accurate, uptodate values. maybe a leaderboard.

While linearizability may be a nice property for these systems to have, it is rarely necessary. Billing systems and leaderboards, like bank ledgers and shopping carts, are often designed as append-only ledgers with eventual consistency, employing sealing windows, compensating transactions, and time-shifting to handle late discovery of events. Others are designed as as reports periodically derived from some underlying datastore via, say, an ETL process; data may be not only milliseconds, but even days or weeks out of date. That's not to say this is a universal pattern, but I can think of a few dozen systems off the top of my head.

I know this because I've worked on several systems like this, including in fintech, and consulted for companies and government organizations building others. I can also offer some anecdotal experience here. For example, I have no fewer than three emails in my inbox from AWS's billing system informing me of missing data or other mistakes in the previous month's bill, accompanied by updated reports providing newer data. Last summer, my bank deferred visibility of some transaction data for over six weeks.

This might feel counter-intuitive, but since money is fungible and addition is commutative, it's one of the easiest things to work with in a stale or even eventually-consistent manner. Where you really want linearizability (or at least sequential consistency) is in domains where operations don't commute. Linearizability is particularly important where those non-commutative operations involve side-channels, but that's a longer discussion for another time.

'most' is not a guarantee

This is true, but there is a qualitative difference between a system which exhibits, say, a 5ms stale read once per thousand transactions during clock skew, and a 10-day stale read one in every two transactions all the time. I mention these numbers to provide a rough characterization of anomaly frequency and severity.

the marketing on yugabyte's page makes it seems it can replace db's like cassandra and give you a consistent view of your data. but if one is seeing stale values, you are back to coding like data is non-consistent

Linearizability is not the only form of consistency model, and many models have no realtime properties whatsoever. You might find https://jepsen.io/consistency informative.

thanks for clarifying. appreciate the detailed answer.

seems like yugabyte, like postgres, also spawns a process per connection; thus requiring use of external connection pool

Very astute observation! Two things:

* We implemented a feature to share memory between these connection handling processes so that makes it a bit more efficient

* Longer term we are thinking of switching to a thread based model. This is how our other API YCQL works.

(cto / co-founder)

I think this is a very important thing to prioritize for a DB of this nature. Postgres/RDBMS connections in general are an extreme weak spot for containerized/serverless applications.

DynamoDB/ElasticSearch's approach of not needing to establish/maintain a connection seems like the ideal solution. I think for applications that would utilize your DB, a minor hit to the speed of an individual request is well worth it as compared to being able to scale up the amount of simultaneous requests.

Yes absolutely spot on!

Peeling the onion one more layer, there are two underlying features that are required:

1) Move to a threaded model to be able to scale instantaneously. This is already the case with YCQL, we're planning on doing this for YSQL.

2) There needs to be a change to the client drivers in order to become smarter to deal with how the app connects to the DB. In fact, we're working on enabling this for the Spring (Java) ecosystem by enhancing the JDSB driver (still in the early stages): https://github.com/YugaByte/ybjdbc


what does the JDBC driver has to do with thread per connection in the database? jdbc doesnt care if you use threads or processes for the connection

True for a single node. However, there will be multiple IP address/connections anyway, since YugaByte DB is a distributed DB and it runs across multiple nodes. JDBC drivers connect to only one node - so to provide true connection scaling, the client side would need to become aware of a multi-node cluster and do connection pooling across these. Of course there are other issues also that need to get solved that are slightly related:

* The node given to the JDBC driver could be down, or nodes could get added/removed over time. This makes "discovery of nodes" a problem.

* We could use a random round robin strategy to connect to nodes - where client connects to a random cluster node which internally connects to the appropriate node. This would result in an increased latency, and also an increase in the net number of connections needed.

These may not matter if the load balancer is smart like in the case of Kubernetes (and where the extra hop becomes mandatory as well). But for non-k8s deployments, these help.

Would async/event based work? Of course with at least one thread per core.

We are working with the R2DBC folks on reactive programming for async/event based use cases as well. This work is just getting kicked off. If interesting, please join our community Slack and give us any feedback/thoughts.

Great. I am not sure about using yugabytedb now.

Keep in mind that this analysis focuses on features which are still in beta; while YugaByte DB advertises SQL features throughout their marketing, the documentation does state that SQL isn't production-ready yet. We should expect bugs at this stage! :-)

All distributed databases (and, frankly, all software) have bugs. At leat with Yugabyte you have some idea of what the bugs are because they tested them and published the results.

Would much appreciate some constructive criticism! - CTO/co-founder

Applications are open for YC Summer 2021

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact