Hacker News new | past | comments | ask | show | jobs | submit login
Discussion on why SQLite is gaining popularity [audio] (syntax.fm)
168 points by thunderbong 10 months ago | hide | past | favorite | 175 comments



The killer feature for me with SQLite is as an application file format: https://sqlite.org/appfileformat.html

It has SO MANY advantages over every other solution. It can do the "just a bundle of files" thing, but it can also do all this rich data as well. And unlike most "just a bundle of files" formats, it's incrementally updatable, it's so good. We're using this in production right now and couldn't be happier with it, wouldn't dream of using anything else at this point.

One notable application that uses SQLite for this is Audacity: all the stuff you record is streamed into a SQLite database (the project file). This is a huge reason why Audacity is so good for recording long sessions, and why it's so resilient against crashes and failures.


I used SQLite as the file format for a game system, and it was probably one of the most successful technical decisions I've made in 40 years. The fact that the artists and developers and even managers could use standard database tools to lay out changes to the game levels, characters and even physics behaviours, meant I didn't have to write a ton of tooling to allow them access to all of those things - they just opened up their favourite database GUI, added the data to the sqlite database, and shared the single file around.

Nowadays I just can't see any other way to do this, other than by maintaining a well-ordered .zip file or filesystem structure, with strict validation being something that has to be programmed. I'd just prefer to sqlite all the things.


That sounds like a decent solution, but I'd rather have all that content in text files so there is meaningful version control with features like diffs and merges. Some engines will store the data this way but give you an editor to manage it, and there are dedicated level editors as well but every project is different and it could be the case none of that would have been helpful to you.


AFAIK git has pluggable diff and merge engines. I wonder if it would be possible to create a diff/merge git plugin for sqlite files that does an intermediate conversion to .sql dump files, with some kind of serialisation normalisation so that rows are output in a stable order between runs.


I haven't tried it with git pluggable diff tools yet, but I wrote a tool that lets me convert a SQLite database into a bunch of plain text files so I can manage them in git: https://github.com/simonw/sqlite-diffable


It is. It's a bit of a hassle, because the git mechanism that allows it is considered sensitive, so all users of your repo will need to separately configure this if they want to use it.

Bigger problem for me is that none of the code review systems I know of support this. At work, we've been wishing to have something like this in Gerrit, due to multiple SQLite DBs and other binary documents living in the repo, but as far as we can tell, we'd have to modify Gerrit sources directly to make it happen, which IT would frown at and nobody has time for anyway.


Not git, but I find it's interesting how Fossil implements version control https://www.fossil-scm.org/home/doc/trunk/www/tech_overview....


We solved this problem by checking in full sql dumps of the content database into source control, in between major changes. Not that we ever had to check and see what went wrong with the content side - using an sqlite database pretty much ensured that things were in the right format before it hit the filesystem assets, and we had practially zero bugs from introducing this methodology to the project.


Thousands of text files laying on bare filesystem is rather slow solution, so additional step to convert between text files and sqlite is needed. Never have seen ready to use solutions, only heard of ad-hoc scripts


We maintain a compromise with Alembic migration files in source control. This does mean people need to do Alembic things to make changes, but if they can do that little bit it's very good, and fully diffable of course.


In 2024, not using SQLite as your application file format, should really force an engineering team to have a damn good explanation....

https://www.sqlite.org/famous.html


Those journal/wal files create a number of potential issues for user visible application files. E.g. a user could copy a database file into a directory that already has an old journal file, corrupting the database [1].

Or users could simply get confused about what to do with those files and contact support a million times just to ask why these files exist, whether they need to be copied as well, why they are getting large, etc.

These issues can be worked around, but I wonder how many apps that use SQLite actually bother doing that. Clearly not all of them [2].

Avoiding file corruption is non-trivial regardless of file format, but giving users additional ways to corrupt their data is never a good thing.

So I think for small, user visible application files that don't need any database functionality, the onus is still on engineers to justify their choice of SQLite.

[1] https://www.sqlite.org/howtocorrupt.html

[2] https://forum.audacityteam.org/t/aup3-wal-file-remains-and-l...


>Or users could simply get confused about what to do with those files and contact support a million times just to ask why these files exist, whether they need to be copied as well, why they are getting large, etc.

This reminds me of a college roommate who was dissatisfied with his computer, likely due to malware from sketchy sites. He decided to delete large or suspicious files from window’s system32 folder. By suspicious I mean he didn’t like the file name.

He had to get a new computer later in the semester.


Decades ago I troubleshot a Solaris server that wouldn’t boot, and discovered the kernel file was missing. I suspect someone had tried to deal with a too-small root partition by removing large files.


You could simply turn off WAL for a simple application database, if you were really worried. You lose some performance and scale, but it should work fine.


Setting the journal_mode to OFF or MEMORY is a sure-fire way of corrupting the database. It's not fine.

https://sqlite.org/pragma.html#pragma_journal_mode


If the application crashes in the middle of a writing a transaction, yes. You can work around this if you care to, and I don't think it's anything specific to SQLite. What happens if your application crashes in the middle writing out a JSON file?

I had actually meant JOURNAL_MODE=DELETE, though. This would not technically use a WAL file, though there would be a rollback journal file that exists only for the duration of the transaction.


>What happens if your application crashes in the middle writing out a JSON file?

The usual way to deal with this is to write to a temporary file and then atomically replace the original file with that temporary file. You could do the same thing with an SQLite database. This is the workaround I was referring to earlier.

journal_mode=delete may not be the worst compromise, but I hate the idea of exposing users to potential database corruption, even in a relatively unlikely event.


> 1.4. Mispairing database files and hot journals

I’m surprised it would corrupt the database. The database could have included a unique identifier as well as in the journal and refuse opening the database if the identifiers don’t match.


Uniquely identifying a database and its journal/wal files would prevent just one of the five mispairing cases they mention. And it's the least likely case.


> The United States Library of Congress recognizes SQLite as a recommended storage format for preservation of digital content.

That’s one hell of an endorsement.


There's one big obvious one. Git and SyncThing.

If the user could benefit from either of those, atomically updated plain files are great. Otherwise, SQLite is perfect.

Of course there's always workarounds to use SQLite in memory and dump to SQL code and such.


Is SQLite the best solution for on-device logs?


unfortunately in the browser it needs to be downloaded so you’re stuck with indexeddb


It's too bad WebSQL never took off, as that was sqlite built in to the browser.

From memory the wasm binary of wa-sqlite is ~1mb, which is certainly not nothing but is an acceptable one-time download size for many web apps. I've seen websites with single image files larger than that. Not an advocating for more bloated web apps, but a sqlite download might not be a deal breaker.


> unfortunately in the browser it needs to be downloaded so you’re stuck with indexeddb

That argument implies that we're stuck with the browser's single built-in home page because every other web page has to be downloaded.

How is having to download sqlite3.wasm any different from having to download HTML, CSS, JS, images, etc.?


> How is having to download sqlite3.wasm any different from

'unnecessary' file size. K matter on the web still.

I'm not normally a web guy, but from https://sqlite.org/download.html it looks like it's ~800k extra added to initial page load? Based on average mobile speed in US of 97.09Mbps; that's an extra half second added to initial page load. That's not trivial; maybe an extra 15% bounce rate on initial hits.


Makes me laugh. When I got my first computer in 1987 it took so long to load a small game from cassette that I could go downstairs, boil a kettle, make a cuppa, and bring it back to my room, and the game would be just about finished loading. Now people are moaning about an extra half a second lol

How things change.


No, the narrative here is that things used to be fast and idiot JS devs have made everything bloated and slow.


Computers are thousands to millions of times faster. So yes, that's a true assessment.


> millions of times faster

Unless you had a 3KHz processor (and no processor ever had that), that statement is just not true.


It's not about raw clock speed, or a Zen 4 would be as fast as a Pentium 4 :)


> ~800k extra added to initial page load?

If you're serving it uncompressed, which no production-grade site will (for a given definition of "no"/"none").

> K matter on the web still.

Not, i opine, for the types of apps which want to host client-side databases. These are client-side applications, not "web pages."

Even a bare-bones, database-less google.com is now 3.14mb uncompressed (1.32 compressed), and that's not counting the pieces which uBlock Origin keep from loading. It loads somewhere around 2MB (uncompressed) of JS.

Last i checked, gdrive downloaded some 14mb to get up and running.

> that's an extra half second added to initial page load. That's not trivial;

We'll have to agree to disagree on whether half a second extra initial-hit-only load time is trivial.


> If you're serving it uncompressed

That 847 KB is actually the size of the (compressed) zip file, but it does contain other files. The compressed WASM + the JS to load it is probably about 500K, depending on the specifics of compression & minification.


It's a matter of use case - for simple web page or even a lightweight client-side app, maybe too big. But for tool with heavy data processing, I could see it being quite useful.


Once upon a time, I revamped a binary file format for a propriety industrial-scientific software.

The software was writing vast arrays of numbers into a file, and reading them based on offsets.

As the software (and its data) evolved, versioning became a nightmare.

I ended up writing an ORM serializer which read from/wrote into an SQLite DB with a variation on an EAV schema.

One of my favorite projects, result-wise. Reading and writing was clean and efficient; the format was portable, language-agnostic, compact.

Versions of the new format were backwards- (and, within reason, forward-) compatible with each other (I forced specifying sane defaults for attributes if they're missing).

I'm still thinking of reimplementing a project like that as FOSS one day.

Serializing numeric data to something like JSON is a waste, and the structured binary formats I've looked at aren't nearly as thought out as SQLite.


For numeric values only, any advantage of using sqlite over let's say parquet with Apache Arrow/duckdb?


SQLite let's you decide when and if vacuuming after partial row modifications. Does arrow allows that I don't know but it's most likely the best SQLite feature you can open a massive database and do one value change really really fast without having to write and compact the entire thing all the times.


Arrow isn’t great at random insertion and deletion, as each column is a contiguous array. If it’s on disk those arrays are probably packed tightly together, so you can’t even append rows.

(I have an idea to implement an LSM tree where each layer is an arrow file, which should allow for faster mutations while maintaining a lot of the benefits of arrow. But I haven’t got around to it).


Given SQLite is already distributed on basically every OS,is amazingly rock solid, I’d ask instead any advantage to using Apache Arrow/duckdb? I don’t have great experience with either.


DuckDB can have massive performance advantage over SQLite for things like aggregates on wide tables or or filters on unindexed columns, with advantage growing with the size of the table. We tried a few head to heads between the two and while we ultimately went with SQLite, we may go back and double write everything to DuckDB to get better performance for those kinds of queries on the same data set.

Duckdb can also read / import SQLite natively so we could ingest lazily from sqlite disk -> duckdb.

Biggest issue with duckdb is some memory leaks and crashes for the nodejs driver. Seems not production ready.


They did just release DuckDB version 1.0, so it should be getting much more reliable and stable.


They said 1.0 means “we aren’t going to break the disk format backwards compatibility anymore” other than that, it’s just a number and doesn’t have any power to make the software more reliable. It really depends on their priorities how much they’ll actually stabilize my use-case. I’ll probably check back in a year and see if things improved.


SQLite encodes integers in a variable-length format, so it can be very space-efficient compared to always storing integers in 4 or 8 bytes.


Strangely enough I'm doing the exact same thing. My case seems a bit more complicated since we're dumping not only long arrays of floats, but also a bunch of miscellaneous data (datetimes, unique datapoints, ids) right before/after those numbers. So it's even more of a nightmare... Some of these files have version numbers in the 30's (oh yeah, and file versions are based on file size.)

And did I mention that the file handles are kept open for the whole program duration (might be multiple hours) ? That's fun, data loss is almost a daily occurrence. It's the main thing I'm refactoring right now.


Well, with an ORM you put everything in data structures, and write them.

Most data fits into hierarchical data structures that don't form a complicated memory graph with references.

Once you know how to write primitive types and arrays of them, you can write objects, and it means you can write anything (without having to worry about writing "before" or "after").

And yes, switching to SQLite absolutely addresses the "perpetual fopen” problem.


I agree. And the best feature for me is "ATTACH DATABASE". Then one can just join together tables from multiple files etc.


Also, the in-memory feature (‘:memory:’) allows you to start working “on the fly” and saving the work to disk later.

Really convenient for the user, zero cost for the developer (same API for in-memory/fs db and smooth transition between them).


If you're careful about it, you can store the canonical runtime state for e.g. a videogame in :memory: database; you then get saving/loading for free with SQLite backup API.


Can you explain what you mean about needing to be careful about it?


You need to store all the data that define the runtime state, so it can be dumped and recovered, without accidentally capturing information or references to things that live outside the DB and won't survive application restart. It's a more generic case of the "don't dump pointers to files, if you can't guarantee they'll be valid after reload".


Thanks!


In addition to paying attention to references, perhaps it makes sense to always keep e.g. player inventory state in there, but something that can change every frame, like player position data, should perhaps only be synced just prior to the snapshot if it would otherwise degrade performance too much. This is of course another layer of complexity (volatile vs non-volatile game state) to think about.


This. In my testing some years ago, trying to make a basic Roguelike game with the state stored entitely in SQLite, the in-memory db could support reading the visible portion of the map and positions of entities at 60 FPS with time to spare, on a modest machine, and with slight FFI penalty as the code was written in Common Lisp (SBCL). So while you may not want to do this for a first-person shooter, there are many types of games that could in fact live entirely in SQLite, save for some transient visual FX.


I'm pretty far removed from game dev but curious.. is the in mem sqlite DB the only representation of game state or is it just 'mirroring' the 'real' values in ram? like if there's a Score on the HUD, are you doing a SQL select to read the value on every frame?

Or is this just a way to serialize and deserialization the game state to automatically save the game so it could be reloaded if it closed/crashed without explicitly running a 'save game' function?


> if there's a Score on the HUD, are you doing a SQL select to read the value on every frame?

Yes. That was the point of my experiments, after I realized that good chunks of the data structures I set up for my game look suspiciously similar to indices and materialized views. So I figured, why waste time reinventing them poorly, when I could use a proper database for it, and do something silly in the process?

In a way, it's also coming back to the origin of the ECS pattern. The way it was originally designed, in MMO space, the entity was a primary key into component tables, and systems did a multiple JOIN query.


I followed this line of thinking last week due to curiosity w.r.t ECS & SQLite. I found that the bottleneck was not on the reads, but on the complexity of the writes (iterating each entity and subsequently you have num_systems X num_components writes). You can actually avoid this entirely if you write your systems in SQL.

Since I had already thrown out logic & reason for the sake of curiosity, I took it a step further and learned that the Bun JS runtime actually has SQLite baked in, which allows you to create a :memory: db that can be accessed _synchronously_ avoiding modification of most ECS implementations. (I'm not familiar with the larger SQLite ecosystem, but being a largely TS developer this was very foreign to me)


I don't use Audacity often, maybe once or twice a month at most but stability and snappiness aren't really things I associate with Audacity.

None of the crashes I ever encountered with Audacity has been due to SQLite of course. If anything this article made me want to use SQLite in more of my projects where I need to store local data.


The resiliency against crashes offered by it's SQLite file format is much needed. Because it crashes a lot.


Theres is "Pack" archive format that does exactly this for general-purpose archives (sqlite + zstd for compression). https://news.ycombinator.com/item?id=39793805 https://pack.ac/

works faster AND more compact than .tar.zstd and .7z.zstd (patched 7z with zstd compression)


It doesn't list obvious cases where you shouldn't take this approach, so I have one off the top of my head that SQLite doesn't seem like it's a good fit, I'm interested in whether I'm wrong on this.

The article discusses storing files as BLOBs. But it looks to me like this isn't a good fit for sufficiently memory-constrained environments, given a task involving processing a very large file in a sequential manner, like with media reencoding, or video playback. It seems like the traditional file access patterns are better suited for those tasks.


It's very easy to design a schema that breaks up blobs into chunks. SQLite is excellent for memory constrained environments, with even mathematical proofs of maximum memory usage: https://www.sqlite.org/malloc.html


You can incrementally read or write the value in a BLOB column of a row. You can read/write etc using a file descriptor like API: https://www.sqlite.org/c3ref/blob_open.html


Thanks, I did a quick google for something like this before writing my comment, but your link did not come up.


When I start a project, my priority will be SQLite>PostgreSQL>Others.

If the application only runs on one server, then definitely SQLite. That means you do not need fail over, replication or other fancy features. Why not simple SQLite?

A complex system is more likely to have problems. Network, TLS, runtime, your application may have so many reasons to crash. But for SQLite, just check IO. If it does not have problem, then you check your application.


> If the application only runs on one server, then definitely SQLite. That means you do not need fail over, replication or other fancy features. Why not simple SQLite?

I agree. So many usecases require a bounded context, and even ephemeral data stores. Memory caches are all the rage but the cost of doing network calls to fetch data from a memory cache can eclipse the cost of simply fetching the data from a local disk.


I experienced that an application randomly throw exceptions when trying to access the SQL Server in the same PC. And finally we found that it is caused by a broken firewall software.

A traditional DBMS is a quite complex application, sometimes using that may introduce more bugs than your application itself.


Well, since the SQLite database is one simple file, isn't SQLite a hassle when you have some concurrency/parallelism ? And what if you have separate services that need to access the same DB ?


> isn't SQLite a hassle when you have some concurrency/parallelism

If you decide to put your application on a single server, that means you do not care about the single point failure, and all your workload can be handled by the single server. So you can just run only one instance of your application, then you will not have such kind of problem.

Even if you want to have multiple instances of your application running on the same machine, SQLite can also handle that.

> what if you have separate services that need to access the same DB?

That means one single server cannot handle the workload. If the bottleneck is in the database module, a cluster is required to process the data. Clearly the SQLite is not a good option in this case.

If it is not that case, you can separate the database module and provide a lightweight wrapper around the SQLite to create a database service. And use multiple instances for calculation, then call the single database service instance for persistance.

I think comparing to the single instance of MySQL/Postgres/SQL Server, the performance of SQLite is not too bad. So we should keep the architecture simple, if possible.


Well, running multiple instances of the same app is not only about "points of failure", it's also about using all the cores of your machine. Typically, using PI (nodejs), Docker or Uvicorn (python), you can spin up multiple instances of your backend to handle more concurrently.


SQLite connections can be used concurrently inside a single process. You can spin up more threads instead, and spare yourself the filesystem locking overhead.


Threads in Python in python are locked to a single core, and in NodeJS it's really annoying to pull off. That's why people use PM2, Uvicorn and Gunicorn.


That would be a problem, then. I'm always surprised when people try to write high-performance code in Python or JS.


Nobody talked about "high performance". I'm just talking about using all the cores on a machine. People write stuff in Python and JS for obvious reasons that I don't need to detail here.


Performance is best judged on a logarithmic scale. Most software barely needs any, and there are many orders of magnitude between "barely uses the CPU" and "uses 100% of a core".

But once you're at 100% of a core, then there aren't many orders of magnitude between that and "uses every core". In fact, the impact from not using a scripting language is on the same order as the threading would be! So, if it's worth spending effort squeezing out those last 1-2 orders of magnitude from the CPU, then it's probably worth thinking about the language as well.

If you've already blown through 4-5 OOMs going to a full core, then chances are you'll need it.


This is why I don't write serious applications in Python and Node.


SQlite handles concurrency "fine"; If you have long-lived locks, it can be a hassle, because that's still fundamentally single-threaded, but you can have as many processes accessing the same database as you want so long as their locks don't overlap. If you're mostly reading, it's great.


Nope! That’s what the write-ahead log (WAL) is for: https://www.sqlite.org/wal.html

Of course, if you experience scaling issues there (you probably won’t if it’s anything less than enterprise-level usage), you can always just add a second db file!


I see. Wasn't aware of this. I considered taking SQLite for my current stack, but dismissed it because I've read lot of bad stuff with concurrency. However, it says that it doesn't work over networks, so basically you can't do WAL with a containerized stack ?


As hruk pointed out, you can use docker volumes to solve for this. However, you can also ship multiple DBs depending on your use-case. If it's a shared DB, it's probably not a great idea for micro-services. But if you're building a majestic monolith, there's few reasons NOT to go with SQLite. Especially paired with litestream[0].

0: https://litestream.io/


You can have multiple containers on the same host use the same db via Docker volumes - we have a number of production services set up this way.


SQLite has been the most used DB for quite a while, by orders of magnitude. It's in every browser, on every phone, lots of desktop apps, ....

Now there are a bunch of new companies that also want it to take over in the more traditional domains.

So far I don't see a lot of uptake. It's still vastly inferior to the more entrenched databases. Both in terms of performance - especially under concurrency, and in available featureset.


I think the point is that there are surprisingly many applications where the advantages outweigh the issues with concurrency and featureset.


Indeed. I've found that as long as you store your database on a modern NVMe drive, you can easily push 5K+ write transactions per second with SQLite. I've worked at very large companies who received an order of magnitude fewer writes than that, and only during peak business hours.


Over the years i have seen extraordinary many database instances doing exactly nothing down to a very small margin of error.


I wonder if the Intel Management Engine, which famously runs MINIX on just about every Intel motherboard since 2008, uses a database.


But that would be merely one instance per machine. The number of truly distinct usages of SQLite is probably an order of magnitude greater than the number of machines in use.


> It's still vastly inferior to the more entrenched databases. Both in terms of performance - especially under concurrency, and in available featureset

It's inferior at a use case it wasn't designed for. It's definitely superior for at a use case it was designed for.


    > available featureset
What features are missing from SQLite compared to "more entrenched databases"?

Also, as I understand, it is the mostly widely deployed database in the world. They have a whole page about it.


Stored Procedures I guess?


Fair point! I have been in technology for long enough that I have seen the stored proc pendulum swing both ways: "all stored procs" -> "no stored procs". A long time ago, a big problem with stored procs: They were hard / impossible to debug. However, I heard that most DBs have a debug interface now that regular devs can use to debug stored procs. I am curious if they are making a comeback!


Interesting, I've found SP's very elegant to handle and debugging them to me has been trivial so far with the help of a few unit/integration tests.

And yea they're probably the only feature I really miss in SQLite so far.


I wonder how much of a factor the "One DB, one file" concept is in SQLite's success.

I always loved that an SQLite DB is just a single file. People say "Who cares how many files the DB is?". But in practice, I always found it to be very convenient. It makes the concept of a project easy to grasp when the data store is simply a file.

I recently realized that a Django project can also be done in a single file. That made me like Django even more.

I have the feeling that this type of logic and simplicty is indeed a factor in the long term success of a software project.


I think it's the same reason Windows feels easier than Linux to manage software in, or why Docker has the extremely fast and wide adoption it enjoys, or why so many of us hate working with modern web stack.

Windows software is often enough "portable executables" that can run off the folder they were unzipped into, and historically, even installed apps could often enough be copy-pasted from their installation folder into another machine and Just Work (it was helpful, in particular, that all the DLL files lived next to the executable instead of being installed system-wide).

Docker, obviously, because it eliminates all the bullshit of installing software - again, especially painful on Linux systems - and gives you a single file that drives it all. The platform itself may be system-wide, but the thing you care about: individual software - is contained (literally) so it doesn't spill out. Easy to reason about.

Modern web stack is a mess of tooling, some of which system-wide (so people will reach for Docker to contain it all!), with huge version churn and lots of moving pieces. Source code of your app doesn't feel like a major component of it; it's useless and unparseable without chains of finicky tooling. That's in contrast to "vanilla JS", old-school experience, where source files were the only thing that mattered. No transpilation, no build chains. You could fire a site up straight from your hard drive, or FTP/SCP it to a file host, and It Would Just Work.

Same with SQLite. To this day I don't like RDBMSes, because they're a system-level platform designed for admins, while all I care about is the RDB part. My database. SQLite maps conceptually to how I think about it - one database, one file (+/- WAL). One well-defined place my data lives, that I can move around and between machines using regular file operation tools.

Etc.

Files over apps, always.

EDIT:

This also makes me very eager to try doing something with RedBean[0] - a webserver in a single Actually Portable Executable[1] that's also a ZIP file into which you put all the data you want it to serve. A turn-key website in a single file!

--

[0] - https://redbean.dev/

[1] - https://justine.lol/ape.html


> Windows software is often enough "portable executables" that can run off the folder they were unzipped into, and historically, even installed apps could often enough be copy-pasted from their installation folder into another machine and Just Work (it was helpful, in particular, that all the DLL files lived next to the executable instead of being installed system-wide).

This hasn't always been true. The portable executables are a nice user convenience for certain apps, but by no means all, and Windows is the platform for which "DLL hell" was coined. They even got briefly worse with the "GAC" for .NET Framework, although it rapidly became apparent what the problem with that was.

These days they really want you to use .appx and ship via the Microsoft Store, so of course nobody does that and all Windows software is (a) nice single executables (b) sprawling DLL monsters (c) javascript in a box (Teams!) or (d) games.

Now I'm wondering if SQLite could be a viable executable format to unify PE and COFF.


> Now I'm wondering if SQLite could be a viable executable format to unify PE and COFF.

Isn't this already done better by jart's APE I mentioned earlier[0]? Though I imagine you could statically link SQLite to your APE and stuff a .sqlite DB into the ZIP file at the end, to use that for static storage. Or read-write storage if you also link zlib (though I'm not sure if you can portably overwrite your own executable on the hard drive).

--

[0] - https://justine.lol/ape.html


> I'm not sure if you can portably overwrite your own executable on the hard drive

You definitely cannot do this on Windows.


No, but you can

1. Write the new executable.

2. Rename the running executable.

3. Rename the new executable to the original name of the running executable.

4. Launch the new executable.

5. Exit the original executable.

6. Arrange for the new executable to delete the renamed original executable once the original has exited.

This is admittedly a bit fiddly, requires care if you need to preserve attributes and ACLs of the original file, and has obvious issues in scenarios where multiple instances of the original executable may be running concurrently, but it does work.

You also have to be careful to consider the ramifications of a system crash or snapshot backup in the middle of this process: ensuring the new executable is safely flushed to disk before it replaces the original is, under normal circumstances[1], easy (FlushFileBuffers); ensuring that one of the two executables always exists under the original name is hard (the Win32 ReplaceFiles API sounds like it should help here, but it doesn't).

The cleanest solution to the latter problem I've come up with is to use the HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\PendingFileRenameOperations registry key as a backstop to ensure the move (3) happens, if necessary, as soon as the crashed/restored system is booted, but this requires local admin rights (or wanton disregard for system security, i.e., changing the ACL on this registry key to allow ordinary users to write to it, and therefore any file on the system on every reboot).

[1] Here "normal circumstances" exclude cases where hardware lies[2] or users explicitly disable buffer flushing[3].

[2] https://devblogs.microsoft.com/oldnewthing/20100909-00/?p=12...

https://devblogs.microsoft.com/oldnewthing/20170510-00/?p=95...

[3] https://devblogs.microsoft.com/oldnewthing/20130416-00/?p=46...


This is one of the reasons I love Go, it can not only build to a single statically linked library, but you can embed files your app needs like html templates into that same binary. Deployment is copying that binary to where it needs to go.


Django in one file? How?


You just need a wsgi.py file:

    import os
    import django
    from django.core.wsgi import get_wsgi_application
    os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'mysite.wsgi')
    application = get_wsgi_application()

    ROOT_URLCONF = 'mysite.wsgi'
    SECRET_KEY = 'hello'

    def index(request):
        return django.http.HttpResponse('This is the homepage')

    def cats(request):
        return django.http.HttpResponse('This is the cats page')

    urlpatterns = [
        django.urls.path('', index),
        django.urls.path('cats', cats),
    ]
Put that file in /var/www/mysite/mysite/wsgi.py, point your webserver to that file and you are good to go. In Apache you do it like this:

    ServerName mysite.local
    WSGIPythonPath /var/www/mysite
    <VirtualHost *:80>
        WSGIScriptAlias / /var/www/mysite/mysite/wsgi.py
        <Directory /var/www/mysite/mysite>
            <Files wsgi.py>
                Require all granted
            </Files>
        </Directory>
    </VirtualHost>


> import django

Not really in one file, is it?


One file in your project.

'apt install sqlite3' will install more than one file too. But the DB that is in your project is just one file.


If you somehow managed to shove Python and Django into https://redbean.dev/2.0.html, then it might just work as bona fide one file.



SQLite in its current form is not going to "take over". But it's the right tool for so many instances where Postgres/MySQL/... were a massive overkill but were the only thing available in common use.


SQLite is not competing with RDMBSes. SQLite is competing with fopen().

There are of course solutions which wrap this fopen() replacement in a network/cluster-aware tools, e.g. https://github.com/rqlite/rqlite - these are competing with postgres.


rqlite isn't competing with Postgres since postgres does not provide any mechanism for highly available + serializable consistency and that's what rqlite gives you.

Postgres does have replication out of the box but it has a bunch of limitations meaning you have to manage it yourself (ddl isn't replicated I think). And the replication will not get you strict serializability.


It's absolutely competing in that it doesn't fall apart by design when there are many concurrent writers.


Exactly, i have website that is visited by couple people a day and I just want to store some data on every action in one table + some settings in DB. Perfect for SQLite and anything else if overkill. Also if I ever want to rewrite it in the future its not a big thing, its just a database.


I have a website that's visited by a couple thousand people a day, but since it's not a bloated mess that writes to the DB for every page hit, it's still faster than any competition site in that particular niche, and those deploy to expensive DB clusters to handle their spurious writes. :)


Are you able to disclose the URL of the website? How often are you writing to the database? Every few seconds or minutes?


It's a fan website for an obscure mobile game (that I don't really want to associate with my professional profiles, it is really just a hobby); the database sees a bunch of write transactions per week, whenever an editor writes new content.

If it wasn't for non-technical editors requiring an interactive WYSIWYG backend, it could've been made with a static site generator, but as it is, Django needs a couple dozen megabytes of RAM at worst for logged in users.


Not OP but just checked the logs for one that I'm running.

As a first approximation I grepped for PATCH in the web server logs and it seems to service about 30 of those per second on a $40/month vps. I haven't load tested it so not sure how high it could go. That's not including any POSTs etc which may also generate DB writes.

Bear in mind this isn't particularly performance tuned either as the webserver uses go (slow in general) and the portable go sqlite3 driver (also slow)


> webserver uses go (slow in general)

Go isn't slow in general, it's pretty damn fast without getting into the crazy handcraft artisanal stuff with e.g. C/C++. It's, IMO, the perfect balance between speed, features, language complexity, standard library richness.

For reference, a random benchmark that shows that even the optimised to hell Twister Python web server is slower than the out of the box, in the standard library, `net/http` in Go.


I just run django on nginx-passenger in debug mode, but I don't vomit dozens of megabytes of Javascript into the output so I still hit sub-second render targets. I think there's maybe two dozen lines of JS for some interactive calculator widget.


It has taken over already. Sqlite is the most popular database for some time already.


Isn’t that a bit like saying that the iPhone is the most popular computer?

To stretch the analogy: if you need a database server, neither SQLite nor the iPhone are a good fit.


I think Sqlite is an option in many cases when people want to use Postgres. I believe there is more overlap between Sqlite and Postgres than iPhone and a general computer.


SQLite is pretty great but I’ve spent my career building multi-user apps, and this is a category for which SQLite is simply a bad fit - as they point out themselves at https://www.sqlite.org/whentouse.html

While there is significant overlap in the basic DML operations due to them both being SQL based, getting SQLite working for my use cases would be like using an iPhone to serve my app over the internet: a whole lot of work, and many compromises, for no good reason.


I like the idea of SQLite, even more with tools like Litestream [0] when the database is replicated - in this case, just continuously backup to S3 (and from S3 during startup). I found often for my and my friend's hobby projects, single server instances can be optimized easily enough to handle small and medium traffic. In such cases, I don't even bother with a fully managed database because SQLite covers all the needs and when it's located on the same machine as the server is extremely fast [1].

[0] https://litestream.io/ [1] https://www.sqlite.org/speed.html


There are some configurations that can allegedly allow SQLite to be more suited to higher and more concurrent load. I have never had to use them but people have written about it like at: https://kerkour.com/sqlite-for-servers


I love SQLite, so please don't think that I'm a SQLite hater, but:

I've never seen SQLite used in a setup which multiple machines connect to the same database over a network.

For example: A web application with a web server, a worker/job server, and a database server.

In these instances MySQL or PostgreSQL seem to be much better choices.

So will SQLite ever be able to "take over" in these scenarios?


Yes PostgreSQL (and others) are a better choice in these scenarios. I think the point is that a lot of applications/systems might not even need this separation since a single server would be able to handle the load. In this case a local SQLite database could be a serious performance enhancement.

A lot of factors play into this and it certainly does not work in every case. But I recently got to re-write an application at work in that way and was baffled how simple the application could be if I did not outright overengineer it from the start.

That is just anecdata. But my guess is that this applies to a lot of applications out there. The choice is not between PostgreSQL/MySQL and SQLite, but between choosing a single node to host your application or splitting them between multiple servers for load balancing or other reasons. So the choice is architectural in nature.


> I've never seen SQLite used in a setup which multiple machines connect to the same database over a network.

Noting that the sqlite developers recommend against such usage:

https://sqlite.org/whentouse.html

Section 3 says:

3. Checklist For Choosing The Right Database Engine

- Is the data separated from the application by a network? → choose client/server


> I've never seen SQLite used in a setup which multiple machines connect to the same database over a network.

If you actually mean "database server", i.e., SQL is going over the wire, I don't see why you'd ever structure things that way. You lose both SQLite's advantages (same address space, no network round-trip, no need to manage a "database server") and also lose traditional RDBMS advantages (decades of experience doing multiple users, authentication, efficient wire transfer, stored procedures, efficient multiple-writer transactions, etc).

Assuming that it's the worker / job server which is primarily issuing SQL queries, what you'd do is move the data to the appropriate server and integrate SQLite into those processes. (ETA: Or to think about it differently, you'd move anything that needs to issue SQL queries onto the "database server" and have them access the data directly.) You'd lose efficient multiple-writer transactions, but potentially get much lower latency and much simpler deployment and testing.


If you need replication on the app level, SQLite doesn't make sense, because it's not built for networks.

But...

Many projects won't ever need that. A bare metal machine can give you dozens of cores handling thousands of requests per second, for a fraction of the cost of cloud servers. And a fraction of the operational complexity.

Single point of failure is really problematic if your system is mission critical. If not, most apps can live with the possibility of a few minutes of downtime to spin up a fail over machine.


Turso (https://turso.tech/) offers a solution in that scenario. The advantage with SQLite being that each machine has a local copy of the database (optionally I think) for reads so it’ll be extremely fast, and writes happen to one primary database but are abstracted and replicated.


> I've never seen SQLite used in a setup which multiple machines connect to the same database over a network.

Cloudflare D1 https://developers.cloudflare.com/d1 offers cloud SQLite databases.

> For example: A web application with a web server, a worker/job server, and a database server.

I've been giving it a run on a blogging service https://lmno.lol. Here's my blog on it https://lmno.lol/alvaro.


In a nutshell, if you have a database that will have multiple instances talking to it - you are better off with a client-server database, like Postgres, MariaDB, SQL Server, Oracle, etc.

SQlite, generally speaking, is a FANTASTIC local database solution, like for use in an application.


If you split DBs so each has only one writer then it probably is possible even in vanilla SQLite...


You can have a backend talking to sqlite and everything else interacting with backend apis


What's the point of this? If you have multiple applications on multiple systems accessing the same DB, it seems to make more sense to just use PostgreSQL, since it's specifically designed for concurrent operation like this, instead of trying to handle this in your own custom backend code.


If you have multiple applications on different servers communicating with the same database, then yes you would need to run a database such as PostgreSQL.

If you run a single application on a server that needs a database you might want to consider SQLite, regardless of your needs for concurrency/concurrent writes.


Having an API in front of a database for full control of what is available is extremely common.

> trying to handle this in your own custom backend code

It's not writing some extra custom code, it's simply locating all of the code which interacts directly with your database on one host. Splitting up where your code is so that what would be function calls in some places if everybody interacted with the database are instead API calls. This kind of organizational decision is not at all unusual.

And if you're using SQLite it's probably because your application is simple and you should have some pushback anyway on people trying to mAkE iT WEBScaLE!! (can I still make this joke or has everybody forgotten?)

A lot of premature optimizers get very worried about concurrency and scalability on systems which will never ever have concurrent queries or need to be scaled at all. I remember making fun of developers running "scalable" Hadoop nonsense on their enormous clusters which cost more than my yearly salary to run by reimplementing their code with cut and grep on my laptop at a 100x speedup.

I've worked places where a third of our cloud budget was running a bunch of database instances which were not even 5% utilized because folks insisted on all of these database benefits which weren't ever going to be actually needed.


It's not a particularly unusual situation: it's very common for a database to effectively be entirely owned by an application which manages its own constraints on top of that database. In that circumstance sqlite is pretty interchangable with other databases.


It’s not unusual but is never performant. Adding an api layer and network hops on what should be a database shard or view is why enterprise software sucks so much ass.

Why does the api take 3s to respond? Well it needs to call 6 other apis all of which manager their own data. The problem compounds over time. APIs are not the way to solve cross organization data concerns.


Using SQLite inside an API doesn't add network hops.


“An application controls its database.”

You’ve fully misunderstood what I said. When you have 500 applications, the graph of calls for how any one api resolves will go deep. Api1 calls 2 calls 3 and so on.

Vs creating an organization wide proper way to share and manage data.


The number of applications doesn't need to create depth in the API layer. They're not related. If I have a service that sends emails, whether I have one or a thousand applications calling it doesn't matter.


MySQL is better than PostgreSQL.


There are absolutely production web apps running sqlite as the datastore.

The "one writer at a time and the rest queue" caveat is fine for most web applications when writes happen in single digit / low 10s of ms


Is there documentation on how to configure SQLite in the manner you’re describing?


From the performance angle, there's one quoted down thread at https://news.ycombinator.com/item?id=40656043

From the webapp angle check out, for example, pocketbase.io which is an open source Go supabase style backend which wraps sqlite. They have benchmarks etc available.


So many people have multiple machines when they would be better served by just having the one.

I mean look at the prices from Hetzner.com.

  Shared vps:    16 core cpu,  32 GB ram, 320 GB disk for €  38.56
  Dedicated vps: 48 core cpu, 192 GB ram, 960 GB disk for € 343.30
The amount of stuff you can run for peanuts. It's amazing.


All the recent attempts to monetize SQLite seem wrong to me, since its very core is about how to handle data without any third parties, embeded in your app and mostly on local storage.

Any cloud offering feels to me like offering a custom STL as a SaaS: if you're going over the network, why not use a database designed for this?

The only usecase I can see would be something like mini-DBs in an S3, which again would just be a library to abstract away.


I mean it's not by the Sqlite authors. That like opposing someone "monetizing" any open source library by using it?


I am a big user of SQLite, we use it everywhere in our software. But I think I would like something in between Postgres and SQLite.

A few issues I have with it:

- Migrations are a real PITA, you often need to copy whole tables over since the DDL available is so limited. Even basic things that are not breaking change like dropping a foreign key is not possible

- Single file doesn't scale that well, you will see select performance degradation, vacuuming will be extremely long/take a lot of disk space. We started splitting our databases in multiple files and using ATTACH but you lose a lot of nice things like FK support.

- No parallelism except on sorting, this one is annoying when you have compute heavy select (like searching on compressed data). It will max out a core but it can't spawn workers for the select. It can for the ordering part so it's not really a "technical" limitation.

Still grateful it exists though.


Firebird database can run both as an embedded database and as a server. I’d like to see something like Postgres adopt this model.


I would love to read this in transcription. I find my attention span for videos is less than for text. Does anybody have any other info on this topic?



That transcription seems to be really bad. It's full of strange errors. :(


I wonder what they used to generate that transcription?


I was kinda wondering too, and did a (very shallow) dive into the JavaScript on that page. I'm almost positive they are using Deepgram(dot com)'s speech-to-text service. I ran whisper.cpp on that audio file on my laptop, and it does a reasonably well job too.


They discuss their transcription process at 23:15 in episode #621

https://syntax.fm/621?t=0:23:15


This episode is more of an ad for their cloud offering, rather than talking about SQLite and the reasons for the hype.


And at that point, if you are already talking to network over TCP then why not talk to Postgres or MySQL.

The beauty of SQLite is single process, same process simplicity and there are lot many cases where SQLite is more than enough.

Basecamp is using SQLite in their on premise offering.


SQLite is nice and simple, but I don't think I'd ever use it for a larger project I knew was going to have a sizable database.


SQLite can reasonably handle databases as large as you can fit onto your machine. The biggest limitation is needing to serialize your writes.


True but it lacks the tools you’d want to have to deal with larger databases, for example everything is one file and no table-level locking. You can split into different databases and use ATTACH but at that point you might as well install Postgres.

SQLite really shines when you know that a database can only get so large, let’s say you have a paid product that is only ever going to have a moderate number of users.


For me the biggest limitation is replicating the SQLite across machines. If my app is running on multiple nodes, then we need to write/use some tooling to replicate the database file across nodes. With that comes the problem of figuring out how we want to handle error scenarios like failed replication, partial replication and such other things.

And these are all hairy problems. At that point it might be just simpler to use a centralized Postgres or a proper distributed database.


I have just refactored a phone app to use SQLite from a previous in memory model and it is so much faster than I need. I could use a few more features, but already so amazing.


Not strictly related to the podcast, but if you'd like to have a local-first browser application running sqlite check out wa-sqlite. It can use both IndexedDB and OPFS as a backend filesystem.

[1]: https://github.com/rhashimoto/wa-sqlite


Having worked with sqlite in an enterprise production environment for vehicles many years ago, it's gotten much better over the years. It's also gotten a performance boost as everyone now uses SSD's. See Faq for details where they still mention rotational speeds and inserts etc.

With WAL mode Sqlite is much much better, in the past we had all kinds of problems with trying to read/write at the same time. At that point in time every table was a separate database as a workaround.

https://stackoverflow.com/questions/1005206/does-sqlite-lock...

https://www.sqlite.org/faq.html#q19


SQLite is taking over?

SQLite has already taken over. <10 years ago.


SQLite has been the "get X thing done, quickly and effectively" for a very long while now, while far too many would keep chasing the shiny new fads with not as much solid footing and not as boring. There's still that lingering webscale effect on its reputation unfortunately. Not to say it's appropriate in every case, and many of its functions are incredibly unwieldy as opposed to postgres' to mention one, but that's something you can live with if it any of its great upsides apply to you. What seems to be taking over now are services that bolt stuff on top of it, some of which end up being valuable FOSS additions, others may as well just be proprietary hosted instances.


I have become fond of Sqlite in the last few years.

I use it for client applications that need to store local data -- normally temporary data to send to the server, etc.

I am now using it for a Message Queue middle-man (broker) program I wrote, which uses sqlite to store the messages. They tell us the queue they belong to, if there is a delay before handing out to a Worker, the current status, etc. It has worked very well. The other reason why I was not concerned using sqlite was because the program would do one thing at a time.. so there would not be multiple requests trying to insert/update data at the same time. To me it was a win-win.

Designed to be fast and light.

Thumbs up to sqlite team!


How do people use it on bigger deployments? Do they manage to make a single "instance" work for enormous volumes? Do they make one database per customer shenanigans?


I prefer a dedicated database server over some 1 file solution.

You people have got to be kidding, or aren't professionals.

Well HN has never been one to make a whole lot of sense.

What's next, plain text file for saving data? Ridiculous


Some people are using a database for a single user, on a single machine. They don't want or need 1000+QPS, live failover, or massive concurrency. They just want something simple that works.

The Chrome browser stores many pieces of the user profile (bookmarks, browsing history, etc) in SQLITE databases. I'm not sure if you consider that "professional".


Well, if you're not the one paying for it, then sure, why stop at a dedicated server? Ask for a redundant multi-region multi-node cluster for your database. Who cares how much or how little value it delivers. Ask to hire a whole db admin team plus a devops team. It's not coming out of your paycheck.


Another big advantage: Super easy to backup


If you want a good head start with SQLite, Aaron Francis and Stephen Margheim are 2 big voices in the space that have helped me learn a ton!


Honestly, I think it's just hype and people jumping on the band waggon that don't have a lot of experience yet? It has it's places, and for that it's awesome, but an app db? Come on


SQLite took over many years ago. It's one of the greatest pieces of software out there.

Is there something new?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: