Hacker News new | past | comments | ask | show | jobs | submit login
Replicating SQLite using Raft Consensus (philipotoole.com)
97 points by xena on Feb 23, 2015 | hide | past | web | favorite | 21 comments

Hello -- rqlite author here.

It was definitely interesting to code this. It was 50% to meet a need, 50% to better understand Raft consensus in general, and Ben Johnson's go-raft implementation in particular (https://github.com/goraft/raft). As it happens Ben and I now work together, on InfluxDB (http://influxdb.com/), which has a new Raft implementation at its core.

Definitely fun.

This exact problem was the final challenge of last year's Stripe CTF contest. They put together a really cool test harness that would systematically probe for any possible way that an unreliable network could cause the system to fail. In order to complete the challenge, you had to achieve a desired throughput without ever allowing the replicas to become inconsistent.

Unfortunately Stripe doesn't seem to have kept the main site[1] online, but the test harness is still available on Github[2].

[1]: https://stripe-ctf.com/ [2]: https://github.com/stripe-ctf/octopus

> While it is possible to continually copy the SQLite file to a backup server every time it is changed, this file-copy must not take place while the database is being accessed.

What about the `.dump` command?

I've been using it to backup SQLite databases since forever. It produces an SQL script (similar to `mysqldump`) that you can import back into another SQLite database, or even into another RDBMS. It's much more elegant than copying the database file itself.

But come to think of it, I never really checked whether the `.dump` command produces a consistent backup, and the official documentation doesn't seem to answer that, either. If you're familiar with the internals of SQLite, please let me know whether I should keep trusting `.dump`.

You could be right, it might well produce a consistent backup. If fact, I'd be very surprised if it didn't. But doing that misses the point of rqlite, of course. rqlite replicates the database to other machines, after every transaction, without copying the entire sqlite file everytime.

I'm not sure a backup that was never tested qualifies as backup.

The backups do restore properly all the time, I made sure of that.

I simply don't have enough transaction volume to test all the edge cases that might produce an inconsistent dump (and if I did, I wouldn't be using SQLite).

SQLite is amazing. I wonder if it's usable as a replacement for Postgres for web apps, I'm not sure what happens when multiple processes try to write to it.

Regardless, this is a very interesting project, good job.

Multithreaded use is not supported. SQLite is not a replacement for Postgresā€”Postgres supports many advanced data structures (lists!), SQL commands, and is faster. But in certain cases, it is enough, and of course setting up SQLite is much easier.

> Multithreaded use is not supported

To be more precise, it's perfectly legal to use SQLite from multiple threads or processes at the same time. It will use its own mutexes and OS file-locking operations to ensure that at any given time, a database is being accessed by either a single writer, or an arbitrary number of readers.

In my opinion, the bigger issue is that clients can only safely access a SQLite database if it exists on a local filesystem. As soon as you decide you want to run multiple web servers, you're in trouble.

Sorry, yes, that's what I meant. I use it for a mostly-read-only Web app, but wasn't sure if it's a disaster waiting to happen. Good to know it's a viable alternative to postgres in multithreaded or multiprocess scenarios.

I ran it as a replacement for MySQL for 3 years, light usage php web app, around 100 - 200 users I guess.

It saved me quite some hassle and reduced the attack surface compared to MySQL.

FWIW I never bumped into speed issues. My biggest issue against it is that it would let me alter columns directly which effectively sabotaged my use of alembic when I started using Python/flask.

Huh, that sounds like it worked very well. Plus, the reduced overhead of backups etc is definitely worth it for small apps.

Here is more information on the concurrency-related features of SQLite to complement teraflop's sibling comment: http://sqlite.org/threadsafe.html

It's a good database in its own right. Use a write-ahead log, and you can get one writer and as many readers as your local disk can support.

The common thinking is that it's more of a replacement for fopen than it is for Oracle.

This looks like statement based replication, which can come out of sync if you're not careful to make your statements deterministic.

Any use of random(), as a trivial example.

This is pretty cool!

Interestingly, once SQLite 4 comes out you'd probably implement this the opposite way, using Raft consensus (and something like etcd) for block storage, and using the SQL engine over arbitrary pluggable block storage.

sqlite4 seems not to be moving so fast - has it stalled?

ActorDB takes this concept much further. It replicates the WAL file (not sql statements).



I skimmed the source and didn't see anything regarding leader election. Am I missing something or is this implementation encapsulating some of the underlying terminology/analogies traditionally used by Raft?

Wouldn't that be done by the underlying Raft implementation (goraft)?

Yes, that's right. 'go-raft' is imported by rqlite, and uses that package for consensus and leader election.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact