
Show HN: Badger – an embeddable, persistent key-value store written in Go - mrjn
https://github.com/dgraph-io/badger
======
skybrian
Nice project! What's new since the last time we saw this? [1]

[1]
[https://news.ycombinator.com/item?id=14335931](https://news.ycombinator.com/item?id=14335931)

~~~
slackingoff2017
New funding round

------
aphextron
>Keep it simple, stupid. No support for transactions, versioning or snapshots
-- anything that can be done outside of the store should be done outside.

Not sure how I feel about that philosophy. Transaction support is extremely
complex and should be left to a storage engine if ever possible.

What is the benefit of this over something like Berkeley DB?

~~~
mrjn
The reason we want to KISS is to ensure a solid, robust, and performant layer
of storage, that other projects can use to build their transactional versions,
etc.

One can take Badger and build a transactional layer above it, and it would
still be below the application layer.

~~~
maxpert
> The reason we want to KISS is to ensure a solid, robust, and performant
> layer of storage, that other projects can use to build their transactional
> versions, etc.

Then comparing it to Rocks DB is unfair (infact title of this post is
misleading). Stuff like transactions and MVCC is way complicated than it
sounds.

~~~
mrjn
It's not. The comparison is using the same APIs as provided by
LevelDB/RocksDB. Not the transactional ones mentioned below.

TransactionDB* txn_db; Status s = TransactionDB::Open(options, path, &txn_db);

Transaction* txn = txn_db->BeginTransaction(write_options, txn_options); s =
txn->Put(“key”, “value”); s = txn->Delete(“key2”); s = txn->Merge(“key3”,
“value”); s = txn->Commit(); delete txn;

------
Veratyr
Very interesting, though a shame (to me) that it's Go, as I'm actually after a
C++ KV store that ideally isn't RocksDB (I want something that's much easier
to embed and RocksDB has a bunch of dependencies).

Have you tried benchmarking on baremetal though? I've been shopping around for
a KV store and came across this benchmark:
[http://www.lmdb.tech/bench/ondisk/#sec7](http://www.lmdb.tech/bench/ondisk/#sec7)

Interestingly, for 768-byte values, LevelDB is the clear winner when it comes
to read scaling on a VM yet on bare metal it's drastically outperformed by
LMDB.

~~~
valarauca1
On bare metal you'll be hard pressed to beat LMDB.

It's really about as simple, and as bare bones as you can make a traditional
KV storage while still preserving MVCC, transactions, and maintaining ACID.

Furthermore it's design is prefectly suited to exploit Optane/FRAM non-
volatile RAM devices that appear to becoming.

~~~
ntoshev
Apparently the way LMDB does I/O is exercising slower (more complex?) code
paths in the virtualization implementation code compared to LevelDB, and
virtualization overhead is significant in these benchmarks (actually slows
down execution several times).

LMDB does mmap while LevelDB does plain file I/O.

~~~
valarauca1
It depends on the hypervisor. XEN/AWS uses and older slower path for memory
allocation then the modern KVM/Intel-EPT allows.

Every `malloc` under XEN (that can't be satisfied with unused memory already
mapped into that processes userspace) requires a hyper call.

This VMM approach was _largely_ depreciated about 6 years ago when Intel
created page table bits that let the bare metal OS give the VM a _range_ of
memory it can virtually re-map, but AWS hasn't adopted this.

------
wolf550e
What do people typically store in KV stores that do not have a database built
on top of them? Terabytes of what? Accessed by what kind of application logic?

~~~
ereyes01
I currently use leveldb in my project as a way to handle an intermediate data
processing step that chews on much-larger-than-RAM datasets. And I didn't feel
like this processing step warranted the overhead of a full-blown database
sitting on another box, or even the same box. Works quite nicely thus far.

~~~
wolf550e
Would you have used SQlite if it had an LSM backend so the write performance
would be as good as leveldb?

~~~
ereyes01
I considered it, but for what I was doing, a K/V store that also has fast key
prefix lookup was perfect. Plus, yeah, what I'm doing is currently super
write-heavy.

------
makmanalp
Link to WiscKey paper, discussion of tradeoffs made - awesome. DB vendors,
take a page out of their book please.

RocksDB does go to some lengths to reduce amplification on SSDs, though it
seems this design is taking that into consideration from the very beginning.

------
continuations
I remember last time you mentioned range query was very slow and you didn't
quite understand what was causing it. Any news on that end?

~~~
mrjn
Go is slow when doing a lot of random lookups. Each lookup into disk is a
blocking one, which causes a Goroutine to block the OS thread. A new OS thread
then has gets created after a time delay to schedule goroutines. This happens
over and over again.

You can follow the discussion here:
[http://bit.ly/2tb19eX](http://bit.ly/2tb19eX)

~~~
continuations
> Each lookup into disk is a blocking one, which causes a Goroutine to block
> the OS thread.

Can you have multiple OS threads created at the beginning so whenever an OS
thread is blocked the OS scheduler simply schedules another OS thread? That's
how most databases handle blocking disk IO, right?

Doing blocking disk IO has to be a common task. How do other golang
applications/databases handle this issue?

~~~
mrjn
> so whenever an OS thread is blocked the OS scheduler simply schedules
> another OS thread

Go does that even now. If a thread is blocked, it would create another thread,
but it does it with a small delay. I saw a visible increase in IOPS when
starting Go with more OS threads (using GoMAXPROCS), because Go doesn't need
to wait before having access to these threads, but on a longer running job
with enough initiation for OS threads, that benefit would be nullified.

Most LSM based KV stores are designed to avoid random lookups -- and Go DBs
tend to use RocksDB -- so it probably hasn't been such a big issue for them.

------
mrjn
Why is this marked as dup? This is the first time the Github link is on HN.

~~~
detaro
Because duplicate counting is not per URL, and and the announcement blog post
(which has a link to the github, and the same kind of information as is on
there) already was on HN.

------
amirouche
How does it compare with wiredtiger which IIRC use also LSM trees?

~~~
valarauca1
As it doesn't offer transactions, or MVCC you really shouldn't compare the
two.

~~~
fh973
Thanks for the comparison!

------
dylz
Who drew the logo?

