
Replacing Redis with BoltDB – A Pure Go Key/Value Store - tjcunliffe
https://www.specto.io/replacing-redis-with-boltdb-a-pure-go-keyvalue-store/
======
benbjohnson
BoltDB author here. I'm glad to see that Bolt is working well for the OP! It
seems like a great use case for Bolt (multi-core, high read volume, local
storage).

There seems to be a lot of focus on the performance improvements by
commenters. Using a local mmapped KV can remove a lot of network and
serialization overhead, however, I would caution others from thinking Bolt is
a silver bullet. There are a TON of use cases where Redis would utterly
destroy Bolt (e.g. high random write volume).

The two data stores have fundamentally different designs and purposes. For
example, Bolt provides MVCC with fully serializable transaction isolation. If
you need that then Redis isn't an option. If you don't need that then Bolt may
be overkill.

As always, YMMV and be sure to test out different options based on your
requirements.

~~~
andrewguenther
This post makes me so happy. The key statement is here:

> The two data stores have fundamentally different designs and purposes

With all the datastore hooplah nowadays, it is really nice to see an author
come out and say "this isn't for everyone" rather than "we're the fastest for
everyone, period." and throwing around a bunch of meaningless benchmarks.

------
pavlov
I have a habit of not including a database at all when starting out a server
app.

Instead I just use the language's native array and map types to store objects,
and write simple functions to persist the data into a transaction log (usually
a "newline-separated JSON objects" style file). On application startup, the
objects are reconstructed by reading back the log file.

This approach can go surprisingly far. Memory is cheap, and not having a
separate database behind a socket eliminates a whole class of potential issues
and bottlenecks. The log file is easy to back up. If it grows too big, you can
always compact it by dumping the in-memory structure.

(I guess I should add a disclaimer about "toy projects" and "serious
production environments", because some people seem to get oddly upset about
the notion of not having a "real" database for a web app...)

~~~
eloff
You can get surprisingly far with that approach, but as a database developer,
I should warn you that something that "works" is not the same as something
that works under rare failure conditions. See SQLite's famous document[1] to
get an idea of how things can go wrong. It's not nearly as simple as it might
seem. [1]
[https://www.sqlite.org/howtocorrupt.html](https://www.sqlite.org/howtocorrupt.html)

~~~
pavlov
Absolutely... But if I go with random database X because people are raving
about it on HN, do I really have a better idea of how it works under rare
failure conditions? Or how to fix those issues?

When it's my code and the persistence layer fits in a few screenfuls in a text
editor, at least I know where to place the blame. Debugging database trouble
is maybe the worst kind of work I can imagine.

~~~
eloff
I never recommend going with random database X, use something that's been
around a while and has lots of users, then at least you know that you're
statistically unlikely to experience a failure mode that nobody else has.

Rolling your own on disk persistence and recovery is like rolling your own
crypto. What you don't know can definitely hurt you, you'd better be an
expert, and even then you won't likely get it right the first time around.
It's just a really hard problem involving complex interactions between your
code, the OS, the filesystem, the disk drivers, and your disks all of which
can cause data loss in very unexpected ways.

Your method can work, but it almost certainly has issues lurking that can
cause data loss. You might be ok with that because it's a toy project, but
it's obviously not a good idea for important data.

------
sylvinus
This is not a useful article for many reasons:

\- Redis has no dependencies either

\- "weird responses" from Redis? Do you change your tools the minute they
don't behave the way you expect, without investigation?

\- The title "Picking the right tool for the job" is the almost opposite of
what's done here. It reads more like "Stumbling on cool Go software and
picking what job it can do for me"

\- 400 (or 850) requests/second are ridiculously low numbers for an in-memory
key/value store. Redis is capable of doing 100x that on small/medium machines.

~~~
justinholmes
Yeah I get with redis-benchmark on a small machine:

SET: 109051.26 requests per second GET: 109170.30 requests per second

with 32 byte dataset.

------
AYBABTME
Redis is a data structure server, BoltDB is a storage engine. They're not
really comparable. If you know you need a storage engine, then using BoltDB
might make sense. But if you're not sure why you use BoltDB over Redis (aside
from performance and having heard of it), then maybe you didn't explore the
question enough. You can't just replace BoltDB with Redis and call it a day,
at least not if you want to have some sort of availability and more than 1
server in your service.

Also, using Redis as a key-value store for its persistence isn't really a
great idea.

~~~
diogofranco
"Also, using Redis as a key-value store for its persistence isn't really a
great idea."

Could you please elaborate on this point? I'm thinking of using Redis
precisely in that manner and would love know about the drawbacks.

~~~
mvitorino
Redis is missing the D in ACID, meaning a successfully executed command is not
guaranteed to persist to disk since Redis only flushes it's data to disk (if
you do have persistence enabled) at configurable intervals.

So when you server dies you lose the changes since the last flush. I would not
use as a primary storage for data you cannot afford to lose.

Have a look at this:
[http://redis.io/topics/persistence](http://redis.io/topics/persistence)

Edit: you can configure redis to flush on every command that changes
data...but you probably wouldn't want to use redis that way :)

~~~
dgreensp
This is acceptable for a wide range of apps. EtherPad, for example (which used
MySQL) only flushed every second or two. If the database crashed, then yes,
you'd lose the last couple keystrokes, but that was so extremely rare it
didn't even register as a factor in the overall UX.

Edit: Also, you probably have bigger problems if your database goes down or
your cluster falls off the map somehow.

~~~
mvitorino
Absolutely, but people should still be aware of the compromises and not
confuse categories.

Very few apps can make do with only a key-value store and if you have to throw
a relational db in the mix, why introduce more moving parts unless absolutely
necessary? ACID, referential integrity, SQL. Sometimes it feels people are
willing to ditch all this for very little gain.

Let's face it, how many apps out there can't really run a beefy relational DB
with potentially an absurd amount of RAM?

------
nemo1618
We use BoltDB in production and it has been perfect for our use case. The Bolt
source code is clean and is obviously written by people who Know What They're
Doing. We discovered a perfect-storm bug due to some extreme demands placed on
the DB, and benbjohnson helped us track down the problem and quickly provided
a patch. Thanks Ben!

The bug in question:
[https://github.com/boltdb/bolt/pull/452](https://github.com/boltdb/bolt/pull/452)

------
brudgers
_BoltDB can be imported as a library and can persist data in a file so you can
use it as an embedded database – meaning you don’t need a separate data
store._

Serious question: BoltDB something like a Key Value store equivalent
(congruent?) to SQLite?

~~~
AYBABTME
Yeah that's about right. BoltDB is a storage engine that exposes a sorted key-
value store with ACID properties.

~~~
rakoo
More than just a key-value, it also has buckets so you can nest keys in other
keys (and build your own tree), and it has excellent support for iterating on
keys. It's a real nice engine to play with.

------
etaty
As the writer of Rediscala, I can tell that a good redis client library should
not be the bottleneck of your app!

The redis go client had probably some design issue.

~~~
elithrar
The author was using redigo
([https://github.com/garyburd/redigo](https://github.com/garyburd/redigo)),
which by my analysis is fairly performant.

One of the likely causes for issues is file descriptor limits or TCP tuning
that you suffer when stress testing two dependent networking apps on the same
server. BoltDB, being embedded, just works out of the box.

------
pbnjay
I hope the improvement isn't too surprising, you're basically dropping all the
intermediate serialization overhead and command parsing.

------
markbnj
If you can live with an in-memory store (with or without a roll-your-own
persistence model) then there are all sorts of lighter-weight alternatives to
redis. It's when you need to share the data with multiple processes that tools
like redis or etcd really shine.

------
tracker1
Related question, looking at BoltDB, saw LedisDB[1], just curious if anyone
has used it and how that performs, as compared to redis? I mostly work with
node/web applications, so in-process memory is fairly limited straight away.
If the persistence guarantees of something like Ledis against BoldDB are
better than Redis, it may be an option for a primary application data store.

On another note, really need to find time/excuse to play with go... It's been
on my radar for quite a while now... but the use of JS all around has been
very nice for my productivity.

------
the_mitsuhiko
> Removing the dependency is just one part of the story. Another part is that
> requests per second under heavy load increased from ~400 to ~850! Also, all
> those weird errors were gone – using BoltDB with 10k concurrent users we get
> zero errors.

Not exactly surprising. Redis is single threaded so if you will need to spawn
multiple instances if you want to satisfy all cores. Which is what I assume is
happening here.

~~~
antirez
I'm not sure what the user did with Redis, but it's unlikely that you can
saturate a core with, like, 1000 requests/sec unless you do complex/slow
operation. But apparently the OP was just doing simple ops because otherwise
the switch to BoltDB could be impossible or harder. So I guess there is
something to dig to understand what was happening. The most likely cause is
the execution of many fast commands in Redis without a pipeline, so that all
the time was burn in RTTs.

EDIT: Anyway for my programming philosophy, what the OP did, regardless of
performances, totally makes sense. I think easy to use and deploy is a _key
value_ in software ;-)

~~~
ploxiln
There can often be issues with managing pools of tcp connections in clients -
locking around accessing the pool, corner cases in state/lifetime of a
connection in the pool, etc. In my experience with other databases and
languages anyway.

------
Lobster101
The OP says about Redis "I was happy with current performance". His problem
with Redis was that it introduced dependencies on a separate process with all
that entails in terms of deployment, distribution, etc.

------
tmaly
I use Boltdb and Redis in my current project. Boltdb made sense for the data
that rarely changed and was more read heavy.

