Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Replacing Redis with BoltDB – A Pure Go Key/Value Store (specto.io)
151 points by tjcunliffe on Jan 11, 2016 | hide | past | favorite | 48 comments


BoltDB author here. I'm glad to see that Bolt is working well for the OP! It seems like a great use case for Bolt (multi-core, high read volume, local storage).

There seems to be a lot of focus on the performance improvements by commenters. Using a local mmapped KV can remove a lot of network and serialization overhead, however, I would caution others from thinking Bolt is a silver bullet. There are a TON of use cases where Redis would utterly destroy Bolt (e.g. high random write volume).

The two data stores have fundamentally different designs and purposes. For example, Bolt provides MVCC with fully serializable transaction isolation. If you need that then Redis isn't an option. If you don't need that then Bolt may be overkill.

As always, YMMV and be sure to test out different options based on your requirements.


This post makes me so happy. The key statement is here:

> The two data stores have fundamentally different designs and purposes

With all the datastore hooplah nowadays, it is really nice to see an author come out and say "this isn't for everyone" rather than "we're the fastest for everyone, period." and throwing around a bunch of meaningless benchmarks.


Hello Ben, thanks for the great tool.

If Bolt supports MVCC using a page log, why does Redis destroy Bolt on random write volume?

Also, I'm seeing wildly different write performance measures. Siddontang on #237 back in 2014 for Ledis suggests 970 Sets per second. This HN article mentions 800 requests per second. Running "bolt bench --count 10000000 --batch-size 10000 --write-mode rnd" gets me 29203 ops per second. Can you shed some light on what random write performance I can expect?


I have a habit of not including a database at all when starting out a server app.

Instead I just use the language's native array and map types to store objects, and write simple functions to persist the data into a transaction log (usually a "newline-separated JSON objects" style file). On application startup, the objects are reconstructed by reading back the log file.

This approach can go surprisingly far. Memory is cheap, and not having a separate database behind a socket eliminates a whole class of potential issues and bottlenecks. The log file is easy to back up. If it grows too big, you can always compact it by dumping the in-memory structure.

(I guess I should add a disclaimer about "toy projects" and "serious production environments", because some people seem to get oddly upset about the notion of not having a "real" database for a web app...)


You can get surprisingly far with that approach, but as a database developer, I should warn you that something that "works" is not the same as something that works under rare failure conditions. See SQLite's famous document[1] to get an idea of how things can go wrong. It's not nearly as simple as it might seem. [1] https://www.sqlite.org/howtocorrupt.html


Absolutely... But if I go with random database X because people are raving about it on HN, do I really have a better idea of how it works under rare failure conditions? Or how to fix those issues?

When it's my code and the persistence layer fits in a few screenfuls in a text editor, at least I know where to place the blame. Debugging database trouble is maybe the worst kind of work I can imagine.


I never recommend going with random database X, use something that's been around a while and has lots of users, then at least you know that you're statistically unlikely to experience a failure mode that nobody else has.

Rolling your own on disk persistence and recovery is like rolling your own crypto. What you don't know can definitely hurt you, you'd better be an expert, and even then you won't likely get it right the first time around. It's just a really hard problem involving complex interactions between your code, the OS, the filesystem, the disk drivers, and your disks all of which can cause data loss in very unexpected ways.

Your method can work, but it almost certainly has issues lurking that can cause data loss. You might be ok with that because it's a toy project, but it's obviously not a good idea for important data.


I also used this approach for many of my prototypes - persistence layer as a future feature. Then I found kdb+/q and realized that with the database built into the language, there was no need for separate infrastructure.

Here is an MMO in kdb+/q that fits on a single Vim screen: https://github.com/srpeck/kchess


I like approaches like this too. Largely because it tends to make you think about why you need consistency, more durability, better query abilities etc. for specific subsets of the data if/when it comes up, and what the specific requirements are.

When people throw everything into a database, in my experience the specific requirements for specific subsets of the data are often lost in the fog of history as people often don't document why they need to be in the database.

Then it turns out the database is used for everything from a short term cache which could just as well have been in memory, to logs which are of limited utility over long term and "just" need to be indexed for short term querying, to vital financial transactional data which needs to be kept for X number of years, to per-user data which is never, ever looked up other than when that specific user is logged in, and so would be ideal to shard, and so on.

Grabbing for the RDBMS right away is then the lazy way out where you can often get away with not seriously evaluating what is best, because it is "good enough". Then it's nice to start out with something that is sufficiently limited that you're forced to take it into consideration sooner rather than later.


I also tend to use cheap and easy solution to start personal projects and then migrate them to something else. I generally use sqlite as a database, so I can just copy the file directly, like there is only maybe two users on the server, it's not going to make a big difference.


I do this as well. This helps get the API right very easy on. When building services for my work, we've paired this with Swagger (using Compojure-API in Clojure) to get clickable API consoles. The end result is fast iteration on an api contract (between us and our consumers) and once we have bound that api contract, we can focus on rewriting the data persistence namespace however is best suited. Often this involves nothing more than adding a line to our DI system to get a database handle in our http request and then for any background worker threads (plus the obvious writing of db-specific routines).


I do this, but I just roll using initials so every time the app is reloaded its reset back to its original constraints.

This really helps with testing, because now you have all your initial dataset available without needing to write it after the fact, and the database connection wasn't always an assumed fact.


This is not a useful article for many reasons:

- Redis has no dependencies either

- "weird responses" from Redis? Do you change your tools the minute they don't behave the way you expect, without investigation?

- The title "Picking the right tool for the job" is the almost opposite of what's done here. It reads more like "Stumbling on cool Go software and picking what job it can do for me"

- 400 (or 850) requests/second are ridiculously low numbers for an in-memory key/value store. Redis is capable of doing 100x that on small/medium machines.


Yeah I get with redis-benchmark on a small machine:

SET: 109051.26 requests per second GET: 109170.30 requests per second

with 32 byte dataset.


Redis is a data structure server, BoltDB is a storage engine. They're not really comparable. If you know you need a storage engine, then using BoltDB might make sense. But if you're not sure why you use BoltDB over Redis (aside from performance and having heard of it), then maybe you didn't explore the question enough. You can't just replace BoltDB with Redis and call it a day, at least not if you want to have some sort of availability and more than 1 server in your service.

Also, using Redis as a key-value store for its persistence isn't really a great idea.


"Also, using Redis as a key-value store for its persistence isn't really a great idea."

Could you please elaborate on this point? I'm thinking of using Redis precisely in that manner and would love know about the drawbacks.


Redis is missing the D in ACID, meaning a successfully executed command is not guaranteed to persist to disk since Redis only flushes it's data to disk (if you do have persistence enabled) at configurable intervals.

So when you server dies you lose the changes since the last flush. I would not use as a primary storage for data you cannot afford to lose.

Have a look at this: http://redis.io/topics/persistence

Edit: you can configure redis to flush on every command that changes data...but you probably wouldn't want to use redis that way :)


This is acceptable for a wide range of apps. EtherPad, for example (which used MySQL) only flushed every second or two. If the database crashed, then yes, you'd lose the last couple keystrokes, but that was so extremely rare it didn't even register as a factor in the overall UX.

Edit: Also, you probably have bigger problems if your database goes down or your cluster falls off the map somehow.


Absolutely, but people should still be aware of the compromises and not confuse categories.

Very few apps can make do with only a key-value store and if you have to throw a relational db in the mix, why introduce more moving parts unless absolutely necessary? ACID, referential integrity, SQL. Sometimes it feels people are willing to ditch all this for very little gain.

Let's face it, how many apps out there can't really run a beefy relational DB with potentially an absurd amount of RAM?


I'd also add that (unless you have a very good reason not to) it's probably a good idea to always add a TTL (expiration date) to any key you store in Redis to enforce the "not primary storage for data you cannot afford to lose" way of thinking.


A good rule of thumb is to only use Redis as an ephemeral storage for data that you can reconstruct if you lose it, or that you can otherwise afford to lose.


If all you need is GET and SET, you're probably good with either. Not all defining tradeoffs are relevant.


If you don't need your storage layer to be external to your service (Redis) and can do with only local storage (BoltDB), then you should totally do without the external storage. It's one less piece of code to maintain, monitor, alert and deal with when it breaks.


We use BoltDB in production and it has been perfect for our use case. The Bolt source code is clean and is obviously written by people who Know What They're Doing. We discovered a perfect-storm bug due to some extreme demands placed on the DB, and benbjohnson helped us track down the problem and quickly provided a patch. Thanks Ben!

The bug in question: https://github.com/boltdb/bolt/pull/452


BoltDB can be imported as a library and can persist data in a file so you can use it as an embedded database – meaning you don’t need a separate data store.

Serious question: BoltDB something like a Key Value store equivalent (congruent?) to SQLite?


Yes. Your comparison to SQLite is probably more correct than the original author's comparison to Redis.


Yeah that's about right. BoltDB is a storage engine that exposes a sorted key-value store with ACID properties.


More than just a key-value, it also has buckets so you can nest keys in other keys (and build your own tree), and it has excellent support for iterating on keys. It's a real nice engine to play with.


yes, it is :)


As the writer of Rediscala, I can tell that a good redis client library should not be the bottleneck of your app!

The redis go client had probably some design issue.


The author was using redigo (https://github.com/garyburd/redigo), which by my analysis is fairly performant.

One of the likely causes for issues is file descriptor limits or TCP tuning that you suffer when stress testing two dependent networking apps on the same server. BoltDB, being embedded, just works out of the box.


I hope the improvement isn't too surprising, you're basically dropping all the intermediate serialization overhead and command parsing.


If you can live with an in-memory store (with or without a roll-your-own persistence model) then there are all sorts of lighter-weight alternatives to redis. It's when you need to share the data with multiple processes that tools like redis or etcd really shine.


Related question, looking at BoltDB, saw LedisDB[1], just curious if anyone has used it and how that performs, as compared to redis? I mostly work with node/web applications, so in-process memory is fairly limited straight away. If the persistence guarantees of something like Ledis against BoldDB are better than Redis, it may be an option for a primary application data store.

On another note, really need to find time/excuse to play with go... It's been on my radar for quite a while now... but the use of JS all around has been very nice for my productivity.


> Removing the dependency is just one part of the story. Another part is that requests per second under heavy load increased from ~400 to ~850! Also, all those weird errors were gone – using BoltDB with 10k concurrent users we get zero errors.

Not exactly surprising. Redis is single threaded so if you will need to spawn multiple instances if you want to satisfy all cores. Which is what I assume is happening here.


I'm not sure what the user did with Redis, but it's unlikely that you can saturate a core with, like, 1000 requests/sec unless you do complex/slow operation. But apparently the OP was just doing simple ops because otherwise the switch to BoltDB could be impossible or harder. So I guess there is something to dig to understand what was happening. The most likely cause is the execution of many fast commands in Redis without a pipeline, so that all the time was burn in RTTs.

EDIT: Anyway for my programming philosophy, what the OP did, regardless of performances, totally makes sense. I think easy to use and deploy is a key value in software ;-)


There can often be issues with managing pools of tcp connections in clients - locking around accessing the pool, corner cases in state/lifetime of a connection in the pool, etc. In my experience with other databases and languages anyway.


I just did some digging on their Github, for every captured HTTP request there is one Redis/Bolt op and for every replay there is one get Redis/Bolt op.

You should be able to pipeline the recording, but likely not the get operations.


I am using BoltDB in production. BoltDB is wonderful for mostly read operations, if you perform a lot of read/write operations, Redis will at the end be better because of the B-tree of BoltDB and the way the pages are handled (the author and the docs do not hide this, heavy read is the use case).

What I take from his post is that his software was nicely designed to be able to easily change the backend which will allow him to easily switch back to Redis in the future when he will get corruption from BoltDB because of the intensive read/write which is not the use case.


BoltDB is based on LMDB which is supposed to never be corrupted, so if they based it right they'll be fine.


They share no code at all.


The fundamental design of Bolt is the same as LMDB though. The premise behind why LMDB is "uncorruptable" is that all the dirty data pages for a transaction are written out first followed by a write to a double-buffered meta page that points to the new root of the B+tree.

If any data page is partially written then it doesn't matter because the new meta page hasn't been written to point to it. If the new meta page is partially written then it is detected through a checksum and the previous meta page is used (thereby rolling back the transaction). That's how Bolt works as well.

That being said, all code has bugs (yes, even LMDB). Bolt has a large amount of test coverage as well as randomized black box testing that it uses to help minimize those bugs.


Just for the record, LMDB uses both random testing as well as explicitly targeted test cases. (I.e., we construct a DB with specific data in specific sequences to trigger splits and rebalances, etc.) Test coverage is over 90%, with the remainder being auxiliary and platform-specific code. (I.e., we cannot get 100% coverage on any single box due to segments of the code that are #ifdef'd for some other platform.)


That's a really good explanation of the db's function, thanks.

Are you guys still single process? That killed a few pretty important lmdb use cases. I've always been curious why you did that.


The corruption I was talking about is this one: https://github.com/boltdb/bolt/issues/348


In case of Hoverfly, it never needs read/write operations, it only writes during "capture" mode and then only reads during "virtualize" mode and that's when performance really counts, so it looks like a perfect solution.


The OP says about Redis "I was happy with current performance". His problem with Redis was that it introduced dependencies on a separate process with all that entails in terms of deployment, distribution, etc.


I use Boltdb and Redis in my current project. Boltdb made sense for the data that rarely changed and was more read heavy.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: