
Dgraph chose Badger over RocksDB - biomcgary
https://blog.dgraph.io/post/badger-over-rocksdb-in-dgraph/
======
mrjn
Author of Badger here. Thanks for posting this!

We have built some pretty unique things into Badger, and this post talks about
those features and how they're being used within Dgraph
([https://dgraph.io](https://dgraph.io)). And as we build more features into
Dgraph, we'll try to offload them as much as possible into Badger in the
future as well.

I'm around if people want to ask any questions about our journey with Badger,
how it works, etc.

------
tapirl
Does gc has a negative impact on the performance of Badger?

Does Badger support distributed storage natively?

~~~
mrjn
When GC is run, it does lookups in the LSM tree. We first do a small number of
lookups to sample the file -- if the sample indicates that the data can be
GCed (based on the discard ratio), only then does the GC go over the entire
file.

In addition, Badger limits the number of key-values per log file, by default
to 1M (ValueLogMaxEntries), which was a sweet spot we found.

You can run GC during periods of low activity. We run it periodically in
Dgraph and don't see any negative impact on performance. Again, because LSM
tree is typically small and memory mapping does a good job of serving most
lookups from RAM.

Badger is an embedded KV DB. All the distribution of data happens a layer
above Badger. It does not support that natively. You could use either Etcd for
that, or TiKV for that.

