
Rust at speed – building a fast concurrent database [video] - henning
https://www.youtube.com/watch?v=s19G6n0UjsM
======
marknadal
I very much enjoyed this talk and I'm one of those in-the-database-industry
"tell me something I don't know" types.

He put better words to many concepts I've worked on for years, in simpler
terms that what other languages have and/or could express. Has me very excited
for Noria.

As others have commented though, the 25M ops/sec is a little confusing, it is
certainly better than modern/legacy systems, but it is subpar for other
recent/similar research and production systems (like ours). Theirs is about 2X
Redis, but should be capable with that threading and type of machines of
~300M+ ops/sec, unless their cached reads were on very large values (but he
says in the talk it is on Reddit-like messages, so it can't be) or if
benchmarked relative to the client and it is including roundtrip latency to
local server.

Either way, the quality of the ideas, work, and presentation certainly strike
me as Jon or Noria is an up and coming smash success to keep an eye on.

~~~
Jonhoo
I'm glad to hear that you enjoyed it, and that you found the concepts and
ideas understandable!

It's worth pointing out that this was intended to be a relatively high-level
talk. For the more technical details, I'd recommend reading the [Noria
paper]([https://www.usenix.org/conference/osdi18/presentation/gjengs...](https://www.usenix.org/conference/osdi18/presentation/gjengset)).
As for the performance numbers, I've tried to give some better discussion of
that [in a
response]([https://news.ycombinator.com/item?id=18838359](https://news.ycombinator.com/item?id=18838359))
to NovaX's comments above.

------
dijit
While we're on the subject of Rust (and, in particular, /learning/ Rust), I'd
like to ask folks here: is there a developer advocate who does what Francesc
Campoy does for Go but with Rust?

I'm talking about Francesc's "JustForFunc"[0] video series where he commonly
does code reviews of Go code. I find it a really good way to see examples of
working code and best practice and I've been scouring youtube looking for the
same but for Rust.

Ideas?

[0]:
[https://www.youtube.com/channel/UC_BzFbxG2za3bp5NRRRXJSw](https://www.youtube.com/channel/UC_BzFbxG2za3bp5NRRRXJSw)

~~~
f2f
i think steveklabnik was hired very early on in the rust project's life as a
community liaison. not sure if he still does that work but he posts here
regularly and perhaps can point to one, or advocate for such a position if one
does not exist.

~~~
steveklabnik
My job is technically docs; I do the community stuff because I love it.

I don’t think there’s anyone doing this in Rust that I’m aware of.

------
NovaX
I’m confused by the read/write performance numbers. I don’t disagree that
lock-free reads are better, but 25M reads/sec at 16 threads is really bad. I
can do 13.5M with a niave exclusive lock on an Lru cache, and 380M reads / 48M
writes on a concurrent cache. The left/right concurrency isn’t novel and I
feel bad being so very underwhelmed. What am I missing?

~~~
dpezely
How does your workload and server spec compare to theirs?

Following links to the OSDI'18 paper [1] gives this overview:

"Setup. In all experiments, Noria and other storage backends run on an Amazon
EC2 c5.4xlarge instance with 16 vCPUs; clients run on separate c5.4xlarge
instances unless stated otherwise."

That paragraph goes on to explain the nature of the workload.

Their Figure 2 gives example SQL commands.

[1]
[https://jon.tsp.io/papers/osdi18-noria.pdf](https://jon.tsp.io/papers/osdi18-noria.pdf)

~~~
NovaX
He was showing a microbenchmark of docs.rs/evmap and not of Nora. This is a
load test on the local machine: [https://github.com/jonhoo/rust-
evmap/tree/master/benchmark](https://github.com/jonhoo/rust-
evmap/tree/master/benchmark)

His read throughput is linear to the number of cores, which means that it must
have been run with the same number of cores to threads. Otherwise context
switching and sharing of cores would have resulted in sublinear growth (you
can get superlinear growth, but it won't look like that). Assuming his numbers
are not fake, his github charts indicate a 32+ core machine.

My 2015 benchmarks were on a Azure G4, Xeon E5-2698B v3 @ 2.00GHz (16 core,
hyperthreading disabled), 224 GB, Ubuntu 15.04. They show a slight superlinear
growth for reads (Caffeine). [https://github.com/ben-
manes/caffeine/wiki/Benchmarks#server...](https://github.com/ben-
manes/caffeine/wiki/Benchmarks#server-class)

Both benchmarks use a Zipf distribution. I believe he said it only supports
single-writer semantics, whereas mine is roughly per-entry (per hashbin). So
reads should be safe to compare, and we can ignore his slow write rate. His
should be faster as I do more work to maintain complex eviction policies,
whereas his can reading a pseudo-immutable map with no eviction.

Since he does not perform any cache operations, as it is a concurrent map, we
should be comparing against another unbounded concurrent map. That is over _1
billion_ reads per second on the above hardware. That is why I think something
is miserably wrong or I'm misunderstanding something fundamental. Achieve
25M/s is a very disappointing result.

~~~
the_mitsuhiko
I'm not the author but you should probably ask there and not on HN.

That said:

> Since he does not perform any cache operations, as it is a concurrent map

Why is it a concurrent map? To me it seems to be a regular rust hashtable.
Unless you are going to tweak something most of this is going to be the very
slow hashing algorithm.

------
yamafaktory
Link to the Github project: [https://github.com/mit-
pdos/noria](https://github.com/mit-pdos/noria).

------
Dowwie
Postgresql developers have responded to alteratives by adding functionality
that those alternatives offer. What could postgresql learn from Noria's value
proposition?

~~~
jeremysalwen
They need to add support for incremental refreshes of materialized views. It
is something they have thought about for a while:
[https://rhaas.blogspot.com/2010/04/materialized-views-in-
pos...](https://rhaas.blogspot.com/2010/04/materialized-views-in-
postgresql.html) and makes the lists of most requested features:
[https://rhaas.blogspot.com/2016/01/postgresql-past-
present-a...](https://rhaas.blogspot.com/2016/01/postgresql-past-present-and-
future.html)

------
linuxhansl
To observe the "epoch counters" between threads (which can run on different
cores), don't you still need memory fences/barriers to make the epochs visible
to other threads - which I found is a significant part of the cost of locking.

Perhaps that accounts for "only" 25m reads/s, although I'm surprised that
mutexes are sooo much slower.

~~~
BagOfPistchios
Yes you need both compiler barriers for the ordering, as well as memory fences
to ensure the global visibility of previous operations in order. It is
possible that the mutex implementation he used does no spinning in case of
contention and just context switches each time. That could explain the poor
performance of mutexes but I don't know the implementation.

~~~
Jonhoo
The mutexes were standard Rust Mutex, which I believe just forwards directly
to pthread locks. I'm not sure what kind of spinning behavior they have
though.

~~~
the_mitsuhiko
Rust mutexes are in the process of being redone. The popular parking-lot
library js about to replace the platform native ones.

~~~
kibwen
Can you link to the discussion this is referencing? I can't imagine
parking_lot outright _replacing_ the mutex in std, rather than being added as
an optional alternative... what would be the supported way of deliberately
using the platform-native mutex?

~~~
the_mitsuhiko
The discussion is partially here: [https://internals.rust-lang.org/t/standard-
library-synchroni...](https://internals.rust-lang.org/t/standard-library-
synchronization-primitives-and-undefined-behavior/8439)

The PR for std is here: [https://github.com/rust-
lang/rust/pull/56410](https://github.com/rust-lang/rust/pull/56410)

Mostly comes to the platform native APIs habing various soundness issues.

------
BagOfPistchios
As I understand it, it achives lockfree reads by having two caches: One for
modifying and one for reading, and atomically swaps them after modifications
(and applies the changes to what was previously the read cache). Am I missing
something or doesn't this reduce the overall memory available for cache by a
factor 2x?

~~~
dpezely
He explains this at roughly the 40 minute mark. [1]

"There's 7 more lines of `unsafe` that avoids keeping both maps. You still
keep two maps, but you de-duplicate the data between those two maps."

[https://www.youtube.com/watch?v=s19G6n0UjsM&t=40m1s](https://www.youtube.com/watch?v=s19G6n0UjsM&t=40m1s)

------
nwah1
Really cool ideas for automatic cache invalidation and leveraging rust's
ownership to guarantee safety with concurrent reads and writes.

------
StreamBright
I am not sure how this would work in a production database but the problem of
concurrent reads and writes are pretty well solved in Clojure.
[https://clojure.org/reference/refs](https://clojure.org/reference/refs)

I was also wondering if the Datomic approach is better.

~~~
Scarbutt
Datomic is not a database you want if you have any performance requirements.
Tuning it for production is a nightmare, only cognitect can do it, so if you
use it for anything serious, make sure to buy a support contract.

They partly solve the 'no one can tune this black box' with their cloud
offering by doing it for you, but cloud is a different product and dependent
on AWS, it doesn't have an on-premise version.

~~~
dustingetz
If you look at what Reddit [1] and Facebook [2] did in their data layers to
scale past what the relational model can offer, they start to look pretty darn
close to the Datomic architecture. Linearized. Time aware. Immutable. Reads
from cache. Horizontal scaling. Query/writer/storage separation. And obviously
they are cloud architectures, built and operated by teams of distributed
systems engineers capable of tuning performance of such a system.

[1]
[https://news.ycombinator.com/item?id=15726376](https://news.ycombinator.com/item?id=15726376)

[2] [http://www.dustingetz.com/:datomic-facebook-
tao/](http://www.dustingetz.com/:datomic-facebook-tao/)

Neither of these architectures resemble the relational model anymore. They
resemble a half-baked bug-ridden implementation of half of Datomic.

TLDR: Nobody likes their database and zero teams at scale run vanilla out-of-
box database.

This might shed light on what people mean when they are talking about tuning
Datomic: [http://www.dustingetz.com/:datomic-performance-
gaare/](http://www.dustingetz.com/:datomic-performance-gaare/) I'm sure there
is more but this is the only thing I have ever come across.

