
Badger: A fast key-value store written purely in Golang (2017) - fanf2
https://blog.dgraph.io/post/badger/
======
mrjn
(Author of Badger here)

First of all, thanks to the growing Go community for using Badger. Within a
short period of time since the original announcement [-1], Badger has become
very popular.

A lot has changed since the original post. Despite being tagged as a non-goal
at the time of release, Badger now runs concurrent ACID transactions providing
serializable snapshot isolation guarantees.

In fact, Badger is being used to serve Petabytes of data (yes, PB, not TB). On
my long TODO list to write a blog post about it. It has stood the test of
time, being directly used by the community and Dgraph [0]. We have put it
through many different crash testing scenarios, including ALICE[1], and run
Jepsen style bank transaction tests every night for 8h [2].

There're some common questions around benchmarks on our blog posts. They were
done before Badger had transactions and are somewhat old by now. But, Badger's
read-write performance is still spectacular. In terms of write throughput, it
outperforms RocksDB (called from Go), LevelDB BoltDB, and in terms of read
performance, it goes neck-to-neck against B+ tree based DBs, which typically
perform better than LSM trees. Recently, there was a benchmark by Ethereum 2.0
folks, which I ran [3]. You can see the ASCIInema here [4].

There's an open bounty of $1337 on Badger to help us find scenarios where
Badger could lose data [5]. So, if you got ideas, throw them in!

Thanks again for using Badger!

[-1]: Well, the original link here on HN [0]:
[https://dgraph.io](https://dgraph.io) [1]:
[https://blog.dgraph.io/post/alice/](https://blog.dgraph.io/post/alice/) [2]:
[https://github.com/dgraph-
io/badger/commit/c10276c9d3b0c4274...](https://github.com/dgraph-
io/badger/commit/c10276c9d3b0c42744ed41b9177f4a9f021d8e08#diff-a3128b018385dad1c7c20cb322b8267b)
[3]: [https://github.com/rawfalafel/db-
benchmarks/pull/1](https://github.com/rawfalafel/db-benchmarks/pull/1) [4]:
[https://asciinema.org/a/Qgyf8C50YRbtH7WpixSaBKDUQ](https://asciinema.org/a/Qgyf8C50YRbtH7WpixSaBKDUQ)
[5]: [https://github.com/dgraph-
io/badger/issues/601](https://github.com/dgraph-io/badger/issues/601)

------
majidazimi
I've read the WisKey paper.

1\. Wasn't it simpler to just store the Key/Pointer(Offset) in RocksDB and put
the values in the value log? Then the whole database just becomes a simple
wrapper on top of RockDB(or whatever K/V database) with bunch of files as
value log.

2\. Why 16 bytes for value pointer? 8 bytes is sufficient.

3\. If LSM size is around 5GB, screw LSM. I would rather use a simple
immutable HashTable which gives you consistent snapshot out of the box for
persisting to disk. Every 15 minutes launch a thread to serialize the index to
disk.

Am I missing something or there is some more detail missing? I mean the
reasoning to write a whole database from scratch isn't convincing.

~~~
tyingq
1\. They start off with why they don't want to use Cgo. Not a go expert, so I
don't have an opinion on whether that makes sense or not. If the goal is to
not use Cgo, they can't use RocksDB.

2\. It appears to be 12 bytes. ID, length, offset, all as 4 byte uint32
values: [https://github.com/dgraph-
io/badger/blob/5242a997f5101ee00c4...](https://github.com/dgraph-
io/badger/blob/5242a997f5101ee00c433d5fc96432ec8f111ef5/structs.go#L12)

~~~
majidazimi
Then any other database written in pure Go would be sufficient to keep the
index.

~~~
tyingq
Sure. Searching a bit, they did benchmark against BoltDB, but didn't try what
you're suggesting. The benchmark I found is also prior to them adding
transactions to Badger.

Edit: Actually, the benchmarks turn off sync writes, which makes them silly to
me. If you're okay with losing data, you probably wouldn't have chosen a disk
based kv store. [https://github.com/dgraph-io/badger-bench/blob/master/rw-
ben...](https://github.com/dgraph-io/badger-bench/blob/master/rw-
bench/bench.go#L76)

The default install has sync writes turned on, so why benchmark it
differently?

~~~
mrjn
As mentioned in my top comment, we have run benchmarks again. SyncWrites on or
off, Badger outperforms BoltDB. There's a nice ASCIIrema link that you can
see, which is a recording of me running Ethereum 2.0 store benchmarks.

------
maxpert
Have personally made contributions to the project once or twice. I think its a
fairly young project so far being used part of bigger graph database. I went
back to BoltDB for it’s stability and decent performance. I think everybody
has his own scale for measuring fast vs stable. It was fast but not stable
enough for my scale.

~~~
mrjn
Based on the commit you made, looks like there was a build breakage on a
32-bit system. We've resolved all known issues with 32-bit systems. Let us
know if you encounter more.

------
auxym
Can someone clarify for me what is the use case of these key-value stores
getting posted around HN recently?

~~~
tyingq
They are often used as the base storage layer for more complex things.

CockroachDB has RocksDB at the bottom:
[https://www.cockroachlabs.com/docs/stable/architecture/stora...](https://www.cockroachlabs.com/docs/stable/architecture/storage-
layer.html)

Etcd uses a fork of BoltDB: [https://github.com/etcd-
io/bbolt](https://github.com/etcd-io/bbolt)

~~~
billsmithaustin
Indeed uses an LSM tree based key value store for its data about job postings:
[https://engineering.indeedblog.com/blog/2013/10/serving-
over...](https://engineering.indeedblog.com/blog/2013/10/serving-
over-1-billion-documents-per-day-with-docstore-v2/)

------
folex
Is it possible to make Badger completely in-memory? I.e., disable persistence.

~~~
amelius
You might as well use a simple hash-table then.

It seems that most of the design effort of this project was focused on fast
disk access.

~~~
grumpydba
Fast disk access on linux means async io+direct io, which badger does not use.

Files are essentially mmap'd.

Regarding the hash table suggestion, you would lose compression/compaction and
indexing/prefix scans.

~~~
mrjn
Mmaping is optional. Badger doesn't require it, it gives you that option.
Async IO in Go has been debated vigorously over the years, but there're little
benefits in building that library. Because Goroutines do an equivalent job.
Which is what we're doing, random reads (for value log access) spread over
many goroutines.

~~~
grumpydba
My bad, I just glanced at the code. Are you using O_DIRECT?

Goroutines do an equivalent job but write is synchronous and thus blocks one
thread of the go scheduler, as far as I can tell.

Also the benefit of the async io API is that it allows sending multiple
requests in ones call. Syscalls in go do have an overhead which can be
mitigated this way, I guess.

------
herf
Would definitely like to see the GC writeup. Are there patterns where
replacing or expanding values use lots of storage or where things slow down
waiting for GC?

~~~
mrjn
Lack of value log GC won't affect performance directly, but every write would
write to the value log GC, hence increasing its size. So, a GC is needed to
deal with the growing size.

