Hacker News new | past | comments | ask | show | jobs | submit login
Exciting SQLite Improvements Since 2020 (airsequel.com)
172 points by thunderbong on April 28, 2023 | hide | past | favorite | 65 comments



> Support RETURNING clause on DELETE, INSERT, and UPDATE statements.

I really enjoyed this one due to the elegance. We converted a bunch of normal methods into expressions because we could write them like:

  public long CreateCustomerRecord(string name, string email) =>
    sql.ExecuteScalar<long>(@"INSERT INTO Customers (...) 
      VALUES (...) 
      RETURNING Id", new 
    {
      //Param bindings
    });


“there is a common misconception that SQLite is a stagnant or outdated technology” this seems like a strawman argument. there seem few software projects with as consistent high levels of respect by everyone from seasoned neckbeard to serverless hipsters and everyone in between. even in the nosql peak when these arguments were maybe heard for mysql or postgres, sqlite was mostly spared as it was used as storage engine for more than one of the new kids.


It took me 13 years to move from "wow SQLite! yeah totally underrated!" to "Yet another article where the first thing is SQLite underrated/unappreciated/unknown...& its underpinned every document format for years and years and years..."


The clustered primary key (NO ROWID) setup that came in with SQLite version 3.8.2 (2013-12-06) makes it very fast indeed when used for a large persistent key-value store.

And later improvements have just continue to accrue.

This from a guy who just implemented a persistent object cache with it, and was blown away by how well it works. And my requirement was all SQLite versions 3.7 and later, so there's conditional code. (ROWID or not, UPSERT or not).

Not to mention there are probably ten or more of these databases in your mobile phone. We haven't heard, at least I haven't, about any monstrous day-1 vulnerabilities in this code.

A really good design choice, SQLite is, if your application can live with its local file system requirement.


I wonder how this compares to using redis for key value caching purposes?

Definitely reduces the need for another dependency if that’s your thing and it fits your needs


I'm working on comparisons between redis and SQLite for object caching. So far things look favorable.

I did this project for WordPress, because object caching helps performance (and therefore carbon footprint) a lot and cheezy hosting services don't offer redis. But most of them have SQLite.

Oddly enough, SQLite is really slow on the one BSD hosting service I've tried.


Why use redis at all if you don't need shared cache?


I still find it funny that if I write something that's supposed to support MySQL/PostgreSQL/SQLite it's almost always MySQL that needs tweaking or outright doesn't support something, while between PostgreSQL and SQLite it's mostly same SQL


I believe SQLite generally uses Postgres as a reference

https://changelog.com/podcast/201

> JEROD SANTO

> ...Postgres, which you say you use as kind of a reference implementation of at least the SQL stuff.

> RICHARD HIPP

> Yes.


In fairness, Postgres seems to be the most "SQL" of the variants out there... at least in terms of my understanding of the common SQL dialect(s). So probably not a bad first choice.

By comparison, every single time I've touched MySQL/MariaDB I find at least one new thing that irks me to no end. UTF8 vs UTFMB4 or whatever it is for starters. Indexes on Binary data fields not being case insensitive (binary) by default, even if the default index for text is different is another. Not sure if it's still an issue, but that you can use ANSI quotes for everything but foreign key definitions was another I seem to remember. The fact that magic quotes could escape out of quoted strings another still.

And that's just off the top of my head, and doesn't get into some of the default data handling that's just bad. SQLite has some similar issues there, but imo it's far more forgivable given SQLite's footprint and embedded nature. I also really like(d) Firebird when I'd used it in the past, but it's not nearly as popular.


Postgres also appears to aim for very close compatibility with oracle; for a long time that was what you migrated your oracle workloads to when they got too expensive.


Older than 2020 but another indispensable feature SQLite added was window functions in 2018.


It is interesting to read about it proposed in 2014 https://www.mail-archive.com/sqlite-users@mailinglists.sqlit...

Hipp:

> You used the word "immense" which I like - it is an apt description of the knowledge and effort needed to add windowing functions to SQLite (and probably any other database engine for that matter).


why is windowing hard? isnt it kind of a rolling filter?


It allows for recursion in the query (fetching a tree of a category structure for instance where it's just a id,parent_id,priority tuple)

Handling that efficiently without conditioning the data first using something like nested set or materialized paths is going to be a challenge when the depth is unknown.


Exactly because it is so. You have to compute as little as possible and make it O(1) for a row added and deleted, not O(window_size).


This is my favorite feature.


" Often overshadowed by newer and flashier database management systems, many overlook the continued innovation and evolution of SQLite."

Absolutely, the last improvements of sqlite are just incredible.


One of the best baas out there is Pocketbase built on top of sqlite. I tried all the other popular ones out there like Supabase, NHost, appwrite but Pocketbase just blows them out of the water.


I worked on a system a few years ago that has timeboxed data... In other words, the bulk of the data was for a singular event at a time. I seriously pushed at the time to just use SQLite then the backup/recovery/archiving of data would be much easier. Instead, what we got was a rube-Goldberg contraption of MS-SQL with functions and stored procedures. Supporting multiple deployed versions became that much harder over time. The clients were different govt jurisdictions, so actually being able to deploy was complicated to say the least, and often required a month of planning.

In the end, I still think that SQLite would have been a much better approach for this particular use case.


> compiling SQLite to WASM

Is there support for "compiling" WASM to java so we can use Sqlite from java without using any JNI library?


There might be a route if using GraalVM which supports WASM: https://www.graalvm.org/latest/reference-manual/wasm/


> can use Sqlite from java without using any JNI library

Ah is that a mac thing? On linux, you just set up gradle with a dependency for the jdbc driver, and you get sqlite available to use without needing to install anything on the OS. It's pretty magic.


You’re literally doing what parent comment is arguing against.


He's asking why would you do it the hard way.


I hope we see more of this sort of use of WASM (creating universal libraries). Same argument applies for Go/cgo.


If you do not insist on the on-disk format then H2 is pretty much the Java world equivalent.


If you're a SQLite fan who does stream processing, we (Estuary) recently introduced a capability to write transformations as event-driven SQLite [0].

Basically you get a provisioned SQLite DB to which you apply whatever migrations you wish, and write SQL lambdas that are run with each input document, where your lambdas update your tables an/or publish outputs via SELECT.

[0]: https://docs.estuary.dev/concepts/derivations/#sqlite


I really love SQLite as a technology. My apps don’t require much data so it’s always been my main choice.


I like improvements, but I like speed and light demands for RAM or computation even more.


I briefly looked to see how its footprint has increased, but couldn't find anything compelling.

The binary has remained tiny, with most OS's under 2MiB, but that won't really indicate memory usage.

Do you have any articles to show its resource usage over time?


Of of curiosity, how configurable are sqlite builds?

Can one disable json support, for example? I'm not sure what other "categories" of features three might be. Certainly there's a lot of builtin functions; how configurable is sqlite in picking buultins to omit?


> Can one disable json support ...

Yeah, it's pretty configurable, including the JSON support.

Being able to include/exclude things can be done at compile time:

https://www.sqlite.org/compile.html ← lists the options for including/excluding stuff

And there's a function for loading 3rd party extensions at run-time too, which itself can be turned off. :)

There's a fair amount of 3rd party extensions too.

eg: https://github.com/nalgeon/sqlean

GIS stuff, encryption support (several varieties), Excel/ODS support, and tonnes of other things


Excel/ODS support ? I only see csv support in provided links?



Thanks. Shame it's read only, althought I guess using odbc driver in excel to read sqlite is the way to go


There's also a fairly popular JDBC driver for SQLite as well:

https://github.com/xerial/sqlite-jdbc

Mentioning that because from (very) rough memory, Excel can work with JDBC too.

So if the ODBC approach doesn't work for someone, there's potentially another thing they can try. :)


Neat use of Haskell for scripting with Stack: https://docs.haskellstack.org


How does SQLite compare to newer in-process db's like DuckDB ?


SQLite is primarily a transactional, row-oriented (OLTP) database, whereas DuckDB is an analytical, column oriented (OLAP) database.


Can you expand on that a little?


More info here and in the linked paper: https://simonwillison.net/2022/Sep/1/sqlite-duckdb-paper/


Has SQLite ever attained multiuser functionality or is that still the main thing it lacks?


Multiuser in what fashion?

If you absolutely need high concurrency, then by their own admission pick something else. [1]

But for most things you can survive queuing up writes.

1. https://www.sqlite.org/whentouse.html


The SQLite developers seem to be experimenting with a solution to that:

https://sqlite.org/hctree

The announcement blog post:

https://sqlite.org/forum/forumpost/d9b3605d7ff40cf4

HN thread about it from a few months ago:

https://news.ycombinator.com/item?id=34434025


I assume you’re asking about concurrent writes and yes, if you use WAL mode you can have that too.

https://www.sqlite.org/wal.html


SQLite does have an optional "user authentication" extension, though I've not personally tried it out:

https://www.sqlite.org/src/doc/trunk/ext/userauth/user-auth....

The widely used Go SQLite library by mattn says it supports it, if that's useful:

https://github.com/mattn/go-sqlite3#user-authentication


A real exciting improvement would be support for NFS, instead of hiding behind "some server implementations don't implement locking properly" in 2023...


If you need your SQLite database to be available over a network I think you'd do a lot better layering a dedicated network protocol on top of it as opposed to trying to get something like NFS working, which is evidently a poor platform for files that need transactional updates made to them by multiple users at once.


> evidently a poor platform for files that need transactional updates made to them by multiple users at once

What makes this the case?


https://access.redhat.com/solutions/120733 explains it. The critical bit is the root cause at the end. Which is that NFS sees access to any part of the file as access to all of the file. So any access locks it for anyone. Therefore shared access to the database will cause random hangs due to client behavior.

If you turn off locking, then there is no way to avoid data corruption.

And this is with NFS working correctly. Which is not a safe assumption given that widely used platforms like OS X implement it wrong.

In short, there is a reason that we've joked since the last millennium that NFS stands for "No File System". And the joke is still relevant today.


Thanks for posting that! I've always wondered what the deal is.

How do other file systems do it differently that allows for concurrent access to the file?


https://www.sqlite.org/lockingv3.html explains what it wants to happen.

Note in particular that multiple processes can read at a time, and only slowly escalate into a write lock which is held as short a time as you can before going back to the normal state. While NFS assumes that if you read, you may write, and may not take care to make sure you have the most recent version WHEN you write. (These are all important assumptions to make for random programs written by random programmers. Few programmers can be assumed to take the care that databases do around getting locking logic correct.)


Thanks. Now I finally get it, but am still no less annoyed that this doesn't "just work."


Also, you cannot use WAL mode as all processes cannot see the shared memory.

Below are well-known limitations of WAL mode.

https://www.vldb.org/pvldb/vol15/p3535-gaffney.pdf

“To accelerate searching the WAL, SQLite creates a WAL index in shared memory. This improves the performance of read transactions, but the use of shared memory requires that all readers must be on the same machine [and OS instance]. Thus, WAL mode does not work on a network filesystem.”

“It is not possible to change the page size after entering WAL mode.”

“In addition, WAL mode comes with the added complexity of checkpoint operations and additional files to store the WAL and the WAL index.”

https://www.sqlite.org/lang_attach.html

“SQLite does not guarantee ACID consistency with ATTACH DATABASE in WAL mode. “Transactions involving multiple attached databases are atomic, assuming that the main database is not ":memory:" and the journal_mode is not WAL. If the main database is ":memory:" or if the journal_mode is WAL, then transactions continue to be atomic within each individual database file. But if the host computer crashes in the middle of a COMMIT where two or more database files are updated, some of those files might get the changes where others might not.”


If it was a solid platform for this I imagine SQLite would work already!


It does work... Sort of. From what I've read, the NFSv4 server properly implements locking (and I think most of those bits are in the Linux kernel now anyway), but sqlite won't support it anyway. And I am able to run sqlite on NFSv4 with minimal problems. Every once in a while I do get a hiccup, but it's not clear why.


Eh, I can understand not wanting to deal with NFS fuckery


Is there something SQLite would need to do to support an NFS-mounted filesystem that already supported proper locking (fcntl) and sync'ing (fsync)?


It does work but it also rarely results in hanging locks, that's with NFSv3 with NLM. Actual data corruption only happened once or twice.

So it's not a super-reliable thing, but when you can't have a real database server or you can only make a network share accessible to the right groups, say, due e.g. organizational dysfunction, then it works, most of the time.


I've had the same experience. I have a NAS VM with nfs4 on Debian, but my services are running on other VMs (since they're in a DMZ vlan). A lot of selfhosted stuff runs on sqlite since its just easier. So there is only one user per db, and 95% of the time I have no issues. Every once in a while I get some sqlite i/o error, and most of the time it's fine but every once in a while I'll have to restart a container.


Are you running it in WAL mode?

I've had applications (not on NFS, but with multiple processes accessing the same SQLite database file at once) which throw occasional I/O errors in default journal mode but didn't throw errors at all once I switched into WAL mode.

https://til.simonwillison.net/sqlite/enabling-wal-mode


WAL mode requires shared memory (unless you disable concurrency through exclusive locking) and therefore doesn't work on network shares.


Great point.


I think most of the apps use WAL nowadays (at least from what I remember when I looked into this a while back) but I'll have to check again! Thanks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: