Hacker News new | past | comments | ask | show | jobs | submit login
SQLite Release 3.39.0 (sqlite.org)
190 points by sqlsite on June 25, 2022 | hide | past | favorite | 20 comments



> "Add (long overdue) support for RIGHT and FULL OUTER JOIN."

This is a big update and makes it much nicer for ETL work.


Hear hear!


SQLite bucks the trend of modern software by becoming faster and better over time rather than slower and worse like pretty much everything else.


Lots of modern software is getting faster and better over time as time is spent optimising it. Sure, there is also lots of software that gets more bloated as time goes on, but the big problem is functional software (or parts thereof) being replaced by a more bloated rewrite because of broken incentives (e.g. when performance of managers is evaluated based on number of product launches).


Thought experiment:

If I take an O(n^2) algorithm and I modify it using an "upper triangle" optimization, I cut the time in half, but it's still O(n^2).

Similarly, if my code had m modules that were decreasing in speed by log n each release, then the speed of my application could reasonably be O(mlogn), and if only half of them are getting slower, then it's still O(mlogn). Ideally I would like to live in a world where 0 of my dependencies or tools get slower over time, but I am never going to see that world. If I could live in a world where log m of my dependencies get slower over time, or even sqrt(m), then I'll see some real progress (where 'progress' means breaking the regression cycle). That requires most tools to aim for that goal, not some, not half.


> If I take an O(n^2) algorithm and I modify it using an "upper triangle" optimization, I cut the time in half, but it's still O(n^2).

That is why the Landau symbols are a strong simplification to get an estimate how fast the runtime grows in the worst case as n -> ∞.

When two algorithms are in the same Θ(...) class, you must also consider the constant that the Landau symbol "hides" to get a more fine-grade measure of the algorithm's worst-case performance (even if two algorithms are not in the same in the same Θ(...) class, such a more fine-grained measure does make sense because "for sufficiently large n" does not imply "for the n that are relevant for the application").


In the old days it would have been pretty typical for me to build an application by gluing a bunch of independent libraries together, in which case any slowdowns they might experience are purely additive, and so would not effect the complexity (although linear slowdowns are still not just annoying but potential company killers, and so what an academic tells you to worry about doesn't mean all that much in the crucible of production).

But that hasn't been the case for decades now, and so you can't think of it that way. All of my libraries are themselves gluing together as many libraries as I'm gluing together, and so I can include 20 libraries directly in a large application and end up with 300. There are no guarantees that these effects will be cumulative instead of multiplicative. Assuming they are is not dangerous, it's negligent.

If you can't assume a lower bound, or an upper bound, then you don't know Ω or O unless you code review the entire code base, which almost nobody does. Best you're going to do on that is empiricism, which is the bare minimum amount of responsibility (which is still skipped by many people).

If you don't know Ω and you don't know O, then any claims you're making about Θ are bullshit.


The only thing I see getting worse is cloud-reliant things.

As far as I'm concerned the majority of basically all commodities, from software to hardware to non computer stuff like tools, food, and even clothes, is way better than it was 10 or 20 years ago.

For some reason people's general happiness, the arts, etc, don't seem to have the same monotonic improvement though.


> Add new binary comparison operators IS NOT DISTINCT FROM and IS DISTINCT FROM that are equivalent to IS and IS NOT, respective, for compatibility with PostgreSQL and SQL standards.

In PG `IS` is not exactly the same as `IS NOT DISTINCT FROM`. I forget the details, but with row values these can have different meanings.


Below is some interesting speed benchmarking.

"A series of tests were run to measure the relative performance of SQLite 2.7.6, PostgreSQL 7.1.3, and MySQL 3.23.41. The following are general conclusions drawn from these experiments:

SQLite 2.7.6 is significantly faster (sometimes as much as 10 or 20 times faster) than the default PostgreSQL 7.1.3 installation on RedHat 7.2 for most common operations.

SQLite 2.7.6 is often faster (sometimes more than twice as fast) than MySQL 3.23.41 for most common operations.

SQLite does not execute CREATE INDEX or DROP TABLE as fast as the other databases. But this is not seen as a problem because those are infrequent operations.

SQLite works best if you group multiple operations together into a single transaction."

https://www.sqlite.org/speed.html


Interesting, to a small point:

> Note: This document is very very old. It describes a speed comparison between archaic versions of SQLite, MySQL and PostgreSQL.

> The numbers here have become meaningless. This page has been retained only as an historical artifact.

For some context, this page represents the state of things nearly 20 years ago.


For single machine workloads, it's probably still largely true. Though so sensitive to details that it's not a particularly useful benchmark. SQLite is often literally unbelievably fast for many operations, and having it operating in the same process gives it a substantially lower minimum that separate dedicated processes can't compete with.

But it's not really designed to do things asynchronously (e.g. no background optimizing while idle), or as-efficiently for very large (or strange) operations. Simpler caches and indexes, more straightforward behavior, etc.


sqlite_db_name docs say:

"Applications that need to remember the string long-term should make their own copy. Applications that are accessing the same database connection simultaneously on multiple threads should mutex-protect calls to this API and should make their own private copy of the result prior to releasing the mutex."

But it sounds like they should advise to mutex-protect all operations that can change the schema then, not just calls to that particular function.


> But it sounds like they should advise to mutex-protect all operations that can change the schema then, not just calls to that particular function.

No?

Do note the problem condition:

> Applications that are accessing the same database connection simultaneously on multiple threads

So the issue is likely the value returned by the function is borrowed from a per-connection datastructure (a mapping of some sort). The consequence is obvious when said connection is shared between threads, while the schema update is thread-safe, retaining data owned by the connection is not.

And do note that connection thread-safety is already shaky: while sqlite compiles with thread-safety enabled by default, lots of functions from the C API can return utter nonsense for shared connections e.g.

> If a separate thread performs a new INSERT on the same database connection while the sqlite3_last_insert_rowid() function is running and thus changes the last insert rowid, then the value returned by sqlite3_last_insert_rowid() is unpredictable and might not equal either the old or the new last insert rowid.

One of the reasons why the addition of RETURNING was a great boon.

Personally, I'd strongly recommend not sharing sqlite connections between threads anyway for those reasons.

Either create one connection per thread, or use a pool to move connections between threads as needed. And downgrade thread safety to SQLITE_CONFIG_MULTITHREAD if you have the opportunity.


Performing a separate request for the last-inserted row has always struck me as a literally insane idea. Of course it's incredibly error prone! How could it be anything else?

How the heck did it come into existence?


> How the heck did it come into existence?

Probably implementation simplicity: sqlite_last_insert_rowid() dates back to the "dark ages" of mysql (it was added in 2.2.3, released 2002-01-16). The alternative is to add RETURNING, which for that sole purpose is rather complicated. sqlite only added RETURNING in 3.35.0 (2021-03-12), and that was also motivated by the addition of upserts (ON CONFLICT clauses), for which last_insert_rowid really doesn't work.

Furthermore, sqlite really only added multithreading support reluctantly, per its FAQ:

> (6) Is SQLite threadsafe?

> Threads are evil. Avoid them.

Multithreading support was added in 2.2.3 (2002-01-16), but it wasn't even possible to move connections between threads until 3.3.3 (2006-01-31), and using the same connection from multiple threads concurrently was not supported until 3.5.1 (2007-10-04).


Yes, that was always kinda nonsense. You really need RETURNING.


> Personally, I'd strongly recommend not sharing sqlite connections between threads anyway for those reasons.

I have to admit I am surprised this would work at all without blowing up in your face in surprising and horrifying ways.


.


To what end? Distributed? More features?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: