Hacker News new | past | comments | ask | show | jobs | submit login
How SQLite scales read concurrency (fly.io)
240 points by pkilgore on Aug 24, 2022 | hide | past | favorite | 69 comments



Something I found non-obvious about WAL mode in SQLite is that it's actually a property of the database file itself.

When you run "PRAGMA journal_mode=wal;" against a database file the mode is permanently changed for that file - and the .db-wal and .db-shm files for that database will appear in the same directory as it.

Any future connections to that database will use it in WAL mode - until you switch the mode on it back, at which point it will go back to journal mode.

It makes sense when you think about it - of course a database can only be in one or the other modes, not both, so the setting must be at the database file level. But it took me a while to understand.

I wrote some notes on this here: https://til.simonwillison.net/sqlite/enabling-wal-mode


There is a significant warning about the use of WAL mode, very plain in the documentation:

"Transactions involving multiple attached databases are atomic, assuming that the main database is not ":memory:" and the journal_mode is not WAL. If the main database is ":memory:" or if the journal_mode is WAL, then transactions continue to be atomic within each individual database file. But if the host computer crashes in the middle of a COMMIT where two or more database files are updated, some of those files might get the changes where others might not."

https://sqlite.org/lang_attach.html

Systems designed for DML against multiple database files should eschew WAL mode.


The other odd thing is that the journal_mode is only persistent for WAL, I believe. The DELETE, TRUNCATE, & PERSIST modes are per-connection. It makes sense though since those 3 modes deal with the rollback journal and are compatible with each other while the WAL is totally separate. https://www.sqlite.org/pragma.html#pragma_journal_mode


I was reading up on WAL mode this week and came across a mailing list where you asked for clarification on this same topic. That post helped me understand, thanks for asking the question! The confusing thing here for me is that the majority of other PRAGMAs (that I know of) are per-request, so this one differs from that pattern


A few are not, like application_id, user_version, and auto_vacuum.


Also worth noting that Sqlite released a WAL2 journal mode recently that eliminates the "stop of the world" we had with WAL when checkpointing. Basically it maintains two wal files and switched between them when one needs to be checkpointed. It is quite neat!


WAL2 mode is still only available in a branch - I've been hoping they'll merge it to main at some point, but I've not been following things closely enough to know if there are reasons that they wouldn't eventually do that.

https://www.sqlite.org/cgi/src/doc/wal2/doc/wal2.md


Not only WAL2, but there is yet another branch with BEGIN CONCURRENT which also helps with scaling concurrency.

https://www.sqlite.org/cgi/src/doc/begin-concurrent/doc/begi...

I really hope these two branches get merged into the mainline sometime soon. I'm not sure what's the blocker, since both branches have existed for literally years and are refreshed frequently.

WAL2 + BEGIN CONCURRENT would solve probably 99.9% of developer scaling needs.


SQLite runs critical systems on the A350 in flight.

Stability is vastly more important than new functionality in that context.


> SQLite runs critical systems on the A350 in flight.

Can you document that? I was under the impression that they used SQLite for on ground simulations?


Since the DO-178B certification was obtained by SQLite, it may be used in critical avionics components.

Rockwell-Collins is the organization that prompted Dr. Hipp to obtain DO-178B. I actually worked at Rockwell-Collins in the early 90s.

I do generally prefer airlines flying Airbus, BTW.

"Airbus confirms that SQLite is being used in the flight software for the A350 XWB family of aircraft."

https://www.sqlite.org/famous.html


> Basically it maintains two wal files and switched between them when one needs to be checkpointed. It is quite neat!

See also: eglSwapBuffers, https://katatunix.wordpress.com/2014/09/17/lets-talk-about-e...


It's become popular to talk about how scalable SQLite is lately, but let's not forget the elephant in the room, it only allows a single writer at a time.

It's obviously easier to manage and maintain due to it being an embedded database, but that seems to be a very separate issue from the data structure involved, which definitely has disadvantages compared to a typical SQL system.


> it only allows a single writer at a time.

This is true. However, in my testing with a relatively old NVMe device, I can push well over a gigabyte/second to the database with that single connection. One trick I employ is to only ever maintain 1 connection and to use an inter-thread messaging framework like Disruptor to serialize everything beforehand to a single batched writer. Another, much simpler option, would be to just lean on the fact that SQLite serializes all writes by default and to take a little bit of a performance hit on the lock contention (assuming a small # of writers).

I know for a fact that I can get more write throughput out of SQLite in an in-process setting than I could if I had to go across the network to a hosted SQL instance. In many cases this difference can be measured as more than 1 order of magnitude, both in latency and throughput figures.

Going vertical on hardware in 2022 is an extremely viable strategy.


The main thing to keep in mind is to keep your transactions short and not do unrelated tasks like calling REST APIs while holding a lock, otherwise readers will suffer.


How did you managed to do 1gb/sec? I tried everything I can think of, the most I managed to do 100-150MB/sec. As table grows, it slows down more and more.


It appears that a development branch for a new concurrent write mode has been launched.

These appear to be page-level locks, and row-level locks should not be expected for this effort.

https://www.sqlite.org/cgi/src/doc/begin-concurrent/doc/begi...


While most databases have more granular concurrency control, almost all application parallelism is handled by locking on critical resources, and having others wait for those resources become available. That's exactly what SQLite does, and it's not a big deal. If you have multiple processes, one might occasionally have to wait a few ms for the other to finish.


A long time ago I was on a project that used SQlite in production. At the time WAL mode didn't exist and the writer blocked the reads as well. We worked around the single writer issue by having multiple db's open. In our case it worked out due to the way our data was partitioned, I'm not necessarily advocating using that technique these days.


"The database header has a read & write version at bytes 18 & 19, respectively, that are used to determine the journal mode. They're set to 0x01 for rollback journal and 0x02 for write-ahead log. They're typically both set to the same value."

Are there any useful situations where they wouldn't be set to the same value?


What's with so many SQLite posts on HN these days? Never saw so much love for sqlite before.


Most of them are by the same author, who created the litestream.io project and has since been hired by fly.io to continue working on the project.


Author here. I have had a few posts on HN but there's a lot of other authors doing great work as well. In the two weeks there's been several SQLite posts from other folks on the front page:

- SQLite has pretty limited builtin functions: https://news.ycombinator.com/item?id=32557672

- Turning SQLite into a Distributed Database: https://news.ycombinator.com/item?id=32539360

- SQLite is not a toy database: https://news.ycombinator.com/item?id=32478907

- SQLite: Wal2 Mode Notes: https://news.ycombinator.com/item?id=32435601

- SQLite-HTTP: A SQLite extension for making HTTP requests: https://news.ycombinator.com/item?id=32417410


seems like it's become trendy to talk about how "underrated" sqlite

in same vein as all those articles comparing time to process data on single laptop vs hadoop cluster or whatever


Author here. I hope the article didn't come off as me claiming SQLite is underrated. I find that it's a great piece of software for understanding database internals. There's not a lot of moving parts or abstractions in between the API and the raw bytes on disk -- at least not compared to other databases.

Technologies have waves on HN. A couple years ago there were a ton of Go articles. Lately there's always a Rust article on the front page. Right now, I feel like SQLite is a refreshing escape from many of the complex deployments that have been in vogue lately.


But SQLite is underrated. It’s an absolutely amazing piece of software which is why I am still bitter at the deprecation of WebSQL in favor of the far inferior IndexedDB.


> it's become trendy

We were using SQLite in production way before it was cool on HN. I remember back in 2017-2018 describing how we use SQLite as the principal database engine for our multi-user product, and was basically tarred and feathered by the hosted SQL crowd.

I think the most intoxicating thing about SQLite is that you don't have to install or configure even one goddamn thing. It's a lot more work and unknowns to go down this path, but you can wind up with a far more robust product as a result.

I hope this trend continues aggressively.


> I think the most intoxicating thing about SQLite is that you don't have to install or configure even one goddamn thing.

There are a lot of configuration options when compiling SQLite, and some of its defaults are pretty conservative (e.g. assuming no usleep and default to `sleep` unless `HAVE_USLEEP` specified). Generally I recommend looking into how Apple configures its SQLite (SQLite provides https://www.sqlite.org/c3ref/compileoption_get.html to query) and modify to your needs.

Definitely look into compilation options if you plan to embed SQLite into your application.


Less of an option if you are writing open-source apps and don’t control how SQLite is deployed on the target. My feed reader Temboz uses FTS5 to good effect but I have to supply fallbacks if it isn’t available.


Marketing for newish SQLite based projects/startups.


Someone recently observed that SQLite would be used a lot more in production if it didn't have the word "lite" in its name. I've personally been amazed by what it can do. Really great piece of tech with an unfortunate name.


The name SQLite isn't Sequel-lite. It's S.Q.L.-ite akin to graphite or Levite. It means a thing of SQLy stuff.

The creator says it like Escue Ell-ite emphasis on the Es- and -ite. [0]

https://youtu.be/Jib2AmRb_rk?t=99


That may be so. But just because someone is the creator doesn't mean they have control over how people use the name.

Case in point:

Blue Origin wants people to refer to it by the nickname "Blue". But you kind of have to be in the industry to know that. Everyone else just uses the initialism instead, which is most commonly used to abbreviate "body odor".


TIL, but strange choice.

On a related topic, anyone have an idea of the breakdown of people that say Sequel vs S.Q.L. and are there people that look down on others for their usage? Of course ignoring the opinion of anyone that says Jif.


I have heard that Windows users (or those more influenced by SQL Server) say "sequel" whereas Unix users (or those more influenced by Postgres) say "SQL". Think of how bad "Postgresequel" sounds in comparison to "Postgr-es cue ell". But what about other environments, I don't know.


Yes but you don't get the namer's intent beamed into your brain when you see or hear a word for the first time. To me and a lot of other people it's pretty clearly sql-lite regardless of what the namer wants it to be. Maybe this makes me stupid or whatever but when naming things you also have to account for my stupidity because I share it with a lot of other people.


This is exactly the reason the Nimrod language was renamed to Nim.


I would be funny if they suddenly rebranded to "MegaSQL" :)


SQLheavy


SQLAwesome? :)


SQLit!


sql.io


Nah, this time zoomers win. SQLit > sql.io


SQLitio


newSQLy


$ lsof -u `whoami` | grep '[.]db$' | less

The macbook I'm typing this on has 113 sqlite databases open right now. I'd call that a lot of deployment. The name doesn't seem to have held sqlite back much!


Not so unfortunate according to Socrates: 'the hardest task needs the lightest hand..' ;-)


If it had a better name it might surpass zlib and be the #1 deployed library in existence instead of just #2.


Why dilute your community with people who don't do the reading?


Maybe someday it’ll be rebranded as SQLFile or SQLPhile instead.


To blame unconscious bias over a suffix as a non-trivial force on adoption is silly when there are plenty of valid technical constraints for adoption of SQLite in many projects.

IMO database names are incredibly bland, outside of Cockroachdb, and not a driving factor to adoption given the critical nature of the decision.


Fly.io constantly has good articles about databases, really cool to see it.


SQLite is such an excellent little tool.

I recently ran an experiment with Node and was able to take writes from around 750 / second to around 24K / second just by coordinating them using Node IPC. That is, I had the main thread own the sole write connection and all other threads sent their writes operations to it, thread-local connections were read-only.

It's pretty cool how far you can push SQLite and it just keeps humming right along.


Can you clarify "main thread"? Do you have a code example? Asking as Node is single threaded.

Thanks


Most likely using the cluster module and your favorite IPC module.

https://nodejs.org/api/cluster.html


My go-to method of inter-process communication in Node without using any third party modules is to just use process.send[1] to a send a message to the main thread, and then have it forward messages to the workers in the cluster, which you can listen for using the message event [2].

[1] https://nodejs.org/api/process.html#processsendmessage-sendh...

[2] https://nodejs.org/api/process.html#event-message



Using cluster and worker threads. The code is here, but completely undocumented, as I’ve been side tracked by a few other things[0]. I’m currently looking into porting that project to Bun, so it’s possibly dead in its current form.

I was planning on turning that into a library, but Bun nerd sniped me.

You need the “hey” tool to run the benchmarks[1] the way that I was running them.

[0] https://github.com/chrisdavies/sql3

[1] https://github.com/rakyll/hey


SQLite needs some 'put your money where your mouth is' benchmarks.

EDIT:Lots of devs have (unfounded) doubts about performance regarding SQLite for web apps, a single machine benchmark against postgres (with postgres and the app in the same server) would clear many doubts and create awareness that SQLite is going to be more than enough for many apps.. The app doesn't have to be a web app (we are not testing web servers) but maybe some code with some semi-complex domain models.


As someone who had to replace SQLite with a custom key value store in an app for throughput (needed to be able to write structured data at 4GB/s from multiple threads) I assure you that the claims about it being fast are not false - it wasn't fast enough for me but it's very fast


Yeah but in every thread about SQLite many mention the fear of having to migrate to postgres as they reach some quantity of concurrent users (web apps), a benchmark would calm these fears for some. (edited my original comment). Most won't go through the trouble to see if SQLite is good enough, they'll just go with the safe choice.


4GB/s is a pretty high requirement. What did you end up using?

Redis looks like it would struggle with this requirement also without some serious hardware, tweaks, or configuration. https://redis.io/docs/reference/optimization/benchmarks/


The app was a C# frontend (https://github.com/kg/HeapProfiler) that drove windows OS services to take heap snapshots and capture stack traces, so I ended up writing a custom key/value store in C# to avoid having to do cross-language interop, marshaling, etc (the cost of sending blobs to SQLite and running queries was adding up.). It's hard to beat the best-in-class optimized databases on their own turf but if you can just grab a spot to dump your data into, you end up being a lot faster.

By the end it ran fast enough that it was able to saturate the kernel's paging infrastructure and make my mouse cursor stutter, and I was able to take 1-2 snapshots per second of a full running Firefox process with real webpages in it, so it was satisfactory. SQLite couldn't process the amount of data I was pumping in at that rate (but it still performed pretty well - maybe a few snapshots per minute)

At the time I did investigate other data stores and the only good candidates I ran across used incompatible open source licenses, so I was stuck doing it myself. Fun excuse to learn how to write and optimize btrees for throughput :-)


Yeah, most databases probably will have pathological behaviors against your requirements (especially on tail-latency, which you would care about). Many implement similar tools would put a lightweight compression on top and just dump these snapshots to disk and then run a post-processing for queries. Dumping snapshots is also preferred because you can insert checksums and checkpoints for partial data recovery if there are failures.


Needs for what ? It is the most deployed database ever.

https://www.sqlite.org/mostdeployed.html


The big benchmark that I know is TPC-C.

This benchmark requires concurrent writes in a massive OLTP client-server model.

SQLite shines in SQL92 compliance (without grant/revoke or other permission-based aspects), but it was simply not designed to compete in any of the TPC tests.

The current holder of the top TPC-C score is OceanBase, which defeated Oracle's previous record with 11g/Solaris/SPARC.

https://www.tpc.org/tpcc/results/tpcc_perf_results5.asp?resu...


Dunno if it's still the case, but for a long time, it was a license violation to publish benchmarks of Oracle and (IIRC) Microsoft databases.


That's not suspicious at all. /s


Well, luckily we have another two major RDBMS that allow it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: