
Jepsen: Crate 0.54.9 version divergence - gmcabrita
https://aphyr.com/posts/332-jepsen-crate-0-54-9-version-divergence
======
ianamartin
I have a general question about database reliability that perhaps someone
could help me look into.

I love these Jepsen posts, and Kyle Kingsbury's work is amazing.

It all shows that distributed systems are really complicated and difficult to
get right. These posts on the open source ones are really great. But how do
you go about evaluating a proprietary one?

Let's say my company is looking for a columnar store. We look at the
evaluation of Cassandra and think, "Nope."

But then someone says, "Hey! Let's go with Redshift!" How do you know that's
any better?

We all know that DB makers claim certain things. Why would Amazon be better at
this tricky subject than the people behind ElasticSearch?

My question is that I would really like to know what people think about this?

Is the decision that it's better to go with the unreliable demon that you
know? Or is it better to pass the responsibility off to a company like Amazon,
hope for the best, and lawyer up if unexpected things happen?

~~~
takeda
There's nothing wrong with Cassandra but its reliability depends on settings
you use. For example you should prefer using CRDTs. The article is a good
guide to understand what's going on.

RedShift technically is a service for which you pay not a database (I did not
use it but I believe it is Postgres - don't quote me on that), so if it
doesn't provide what's promised you take it with Amazon.

Anyway the summarize databases. Currently we have two types:

\- relational databases - old and proven, always consistent, but typically not
distributed

\- so called NoSQL which I think only worth attention are AP (from the CAP).
These databases scale, but are eventually consistent. This means they are
consistent most of the time, but not always. It's good for storing things that
while important, it's acceptable when some individual entries are wrong or
lost (user sessions, shopping carts, tracked information about users etc)

There are also databases like Mongo which are snake oil.

~~~
010a
Redshift is definitely "based" on Postgres, but its distributed systems
components are probably proprietary enough that the concern is valid.

------
sdegutis
Not to be confused with Rust Crates (which I mention because I was confused).

~~~
cordite
Was my first thought as well.

~~~
caleblloyd
It should be referred to as crate.io for this reason

~~~
sdegutis
Meh. I think a better root solution is picking more unique names. Like,
instead of Create, why don't they call it: Rustpkg? Or maybe, Rusputin? You
know, a mix of (Grigori) Rasputin and Rust. It avoids all the (many) drawbacks
of giving projects too generic of a name, while also making it more memorable
and fun!

~~~
steveklabnik
"rustpkg" was the name of the thing Cargo replaced.

------
justinsaccount
I think something went wrong here:

    
    
      ; For each [version, reads] pair, discard those with one value
      multis (remove (fn <a href="/data/posts/332/k vs">k vs</a>
        (= 1 (count (set (map :value vs)))))

------
lobster_johnson
Is this bug in Crate, or Elasticsearch? I believe Crate bypasses Elasticsearch
for some things and accesses shards directly, at least to perform query
pushdown operations, but are they also handling the optimistic version
control?

~~~
caleblloyd
Crate.io does skip elasticsearch and go straight to lucene in many cases, see
[https://crate.io/a/sql-for-elasticsearch/](https://crate.io/a/sql-for-
elasticsearch/) for some examples. It's still so new that I don't trust it
with master data though. They do have good backup/restore functionality

