
An embedded database written in Rust - mountainview
https://github.com/spacejam/sled
======
jsnell
Other people have had trouble wringing competitive performance out of Bw-Trees
despite heroic optimization efforts [0]. Why is this implementation going to
beat other index structures with just a bit of tuning?

[0]
[https://news.ycombinator.com/item?id=17041616](https://news.ycombinator.com/item?id=17041616)

~~~
arthursilva
That paper is pretty good but it's comparing bw-tree with much simpler in
memory data structures. I think bw-tress might work specially well for fast-
disk storage.

~~~
krenoten
This was my interpretation as well. I'm going to compare a disk-backed bwtree
with a disk-backed ART, both backed by the same pagecache, and maybe end up
with an ART that scatters partial pages on disk, bwtree style. But I need to
measure apples to apples on the metrics that matter for storage first. The
pagecache is where most of the complexity is in my implementation, and it
makes building different kinds of persistent structures on top of it pretty
easy. docs.rs/pagecache

------
samuell
Nice with more developments in embedded databases!

Would be interesting with a comparison with mentat [1]. Do you plan to support
any query langauge such as datalog, like mentat?

[1] [https://github.com/mozilla/mentat](https://github.com/mozilla/mentat)

~~~
krenoten
Yeah, I'm curious about using sled as a more ssd friendly storage engine for
mentat. I'm just starting to experiment with datalog implementations, but I
think by having harmony between the storage engine, query language, and
hardware properties we can make a really compelling stateful systems. If this
is something that interests you, I'd love to work with more people on this.

~~~
scary
Sounds very exciting!

------
jeffdavis
"don't wake up operators. bring reliability techniques from academia into
real-world practice."

What does he mean here? Don't wake up people with operational problems? Or
does "wake up" refer to a scheduling strategy?

Either way, what are these techniques?

~~~
krenoten
It means pay more attention to reliability than pop infrastructure and
internet companies (who can offset poor reliability with human attention or
intentionally deprioritize it to sell more support contacts) tend to put into
these things. Specifically, exhaustive concurrency testing of lock-free
algorithm interleavings via ptrace driven scheduling, model-based testing in
combination with fault injection, ALICE-style file correctness testing, and
for the various distributed modules that sit on top, network simulation
combined with lineage driven fault injection. This is all very much a work in
progress, and I'd love to work with more people on it!

------
dmitrygr
How well does that handle the storage disappearing halfway through a write?
How well does it handle power being cut halfway through an update of some
sort? How well does it handle some of the written blocks actually making it to
the disc and others not? How about if the ones that made it were not the first
or the last it issued to be written?

(For predictable answers to these, and many other complex questions, when
dealing with data you care about, use sqlite)

~~~
krenoten
ALICE showed that's not always true with sqlite. Sled is being built with an
extreme bias toward reliability over features, but as the readme says, it has
some time to go before reaching maturity. The tests are quite good at finding
new issues and deterministically replaying them, so you can help bake it in by
mining bugs using the default test suite and help it get there.

~~~
toolslive
Most database designers assume that a power failure will only affect writes
that are pending. Alas, for SSDs and NVMEs that's not always true. A power
failure can cause all kinds of corruption. Long story short: even append-only
strategies will not save you.

[https://www.usenix.org/system/files/conference/fast13/fast13...](https://www.usenix.org/system/files/conference/fast13/fast13-final80.pdf)

~~~
krenoten
Indeed. This is why I aggressively checksum everything and pay particular
attention to throwing away all data that was written after any detected
corruption during recovery. This is easier with the log-only architecture.
It's also totally os and filesystem agnostic. I was happily surprised
yesterday when it passed tests on fuchsia :]

~~~
toolslive
So you might end up throwing away everything. (I know, it's not your fault)

~~~
krenoten
Users can rely on sequential recovery. At some point I'll probably write a
partial recovery tool that gives you all versions of all keys that are at all
present anywhere in the readable file though, which won't be much work.
Typical best practices encourage moving away from single disk reliance for
particularly valuable data, but this library will also work on phones etc...
So it is important to support people when a wide variety of things go wrong.

------
dajonker
I see that MVCC is a planned feature. Why would you need MVCC for an embedded
database? It seems like unnecessary overhead that conflicts with the
performance goals.

~~~
EugeneOZ
In Rust abstractions are free.

~~~
fooker
If that was so, there wouldn't have been a need for the unsafe mode.

~~~
zacmps
Unsafe does not exist for performance.

Quoting from the rust book:

> Unsafe Rust exists because, by nature, static analysis is conservative. When
> the compiler is trying to determine if code upholds the guarantees or not,
> it’s better for it to reject some programs that are valid than accept some
> programs that are invalid. That inevitably means there are some times when
> your code might be okay, but Rust thinks it’s not! In these cases, you can
> use unsafe code to tell the compiler, “trust me, I know what I’m doing.” The
> downside is that you’re on your own; if you get unsafe code wrong, problems
> due to memory unsafety, like null pointer dereferencing, can occur.

> There’s another reason Rust has an unsafe alter ego: the underlying hardware
> of computers is inherently not safe. If Rust didn’t let you do unsafe
> operations, there would be some tasks that you simply could not do. Rust
> needs to allow you to do low-level systems programming like directly
> interacting with your operating system, or even writing your own operating
> system! That’s one of the goals of the language. Let’s see what you can do
> with unsafe Rust, and how to do it.

~~~
steveklabnik
As the person who wrote those paragraphs, yes. The fundamental reason unsafe
exists is not for performance reasons. That being said, the parent is also
true that sometimes, unsafe is _used_ to enhance performance. But that's not
the usual case, and in fact, can even hurt performance in ways. For example,
&mut T can't alias, but *mut T can, and so &mut T can be optimized more
aggressively. (We recently turned these optimizations back on:
[https://github.com/rust-lang/rust/pull/50744](https://github.com/rust-
lang/rust/pull/50744) )

------
stringer
How does it compare relatively to dgraph's Badger library?

~~~
ihsw2
Considering this has not even exited alpha, it stands to reason that Badger is
a fair bit more robust in performance and failure modes.

~~~
krenoten
There is zero reason behind this assumption.

------
dingo_bat
Does rust compile reliably to embedded targets yet? Last time I checked there
were a lot of problems with armv5.

~~~
steveklabnik
I'm not sure about ARMv5, but v7 and v8-A are Tier 1 for Firefox, so we get a
pretty decent smoke test out of them. The thing about embedded is that it's
quite diverse, so speaking about it in broad terms is tough. It goes from "lol
nope" to "barely works" to "pretty decent" to "great", depending on which
thing you're talking about.

We have a whole working group this year working on embedded.

------
jstewartmobile
I'll take Richard Hipp's _C_ code over just about anyone's _Rust_ code any day
of the week. The man is a national treasure!

~~~
tzahola
But C is _very unsafe_! /s

~~~
jstewartmobile
sorry you got strikefrced, but love your other comments

------
tormeh
No clustering :( I feel like good multi-master asynchronous and synchronous
clustering is truly the frontier in DBs.

~~~
jacquesm
You can't cluster an embedded database, this comment makes not sense.

Compare with SQLite, not with Orcale or Postgres.

~~~
krenoten
Currently it's even more basic. The current usable parts are a pagecache
following the llama approach, some great testing utility libraries, and an
index (that you can use as a kv) that follows the bwtree approach. Later it
will have structured access support, but it needs some more db components to
get there. It is a construction kit as well as a kv.

~~~
Hello71
So... SQLite FoundationDB?

