Hacker News new | past | comments | ask | show | jobs | submit login
The Design of SQLite4 (sqlite.org)
120 points by harporoeder on June 17, 2022 | hide | past | favorite | 19 comments



>SQLite4 was an experimental rewrite of SQLite that was active from 2012 through 2014. All development work on SQLite4 has ended. Lessons learned from SQLite4 have been folded into the main SQLite3 product. SQLite4 was never released. There are no plans to revive it. You should be using SQLite3.

https://sqlite.org/src4/doc/trunk/www/index.wiki


I love the complete clarity of this paragraph. There is no way to misinterpret it.


I'm kind of glad this didn't happen . It might have been nice if there was a performance boost, but it would have taken years for 4 to get to the level of trust 3 has, and possibly caused confusion for devs.

At this point sqlite3 is basically like UTF-8 or even ASCII. It will be there in basically any decent language. It's the default choice of embedded database.

Having 2 versions might de facto deprecate sqlite3 in some environments. There might be transition issues. And people would wonder if there will be a sqlite5.

The one thing I would have liked to see, is a sqlite-like database that operates on folders instead of files, and is optimized for being used in a modern multidevice environment, where files can't be efficiently modified, just overwritten, you don't have locking, you might want to version control things, and you want to reuse filenames if possible to avoid cluttering sync indexes with tombstones.

Other than that, sqlite3 seems to be good enough.


Thanks for the context. I totally thought SQLite4 is a new thing.


Yes, it's unfortunate the page doesn't indicate any useful dates for the content.


> Lessons learned from SQLite4 have been folded into the main SQLite3 product.

Did decimal arithmetic make it?


Mostly no. There is an extension that provides a core of it (operators aren't overloaded, it's all function calls, and “all” is 3 operators total).



> This Is Probably Not The Project You Are Looking For

> SQLite4 was an experimental rewrite of SQLite that was active from 2012 through 2014. All development work on SQLite4 has ended.

You got me really excited for a moment there.


Adding a [discontinued] tag to the title would be nice.

The given link points into the documentation of it instead of the landing page (https://sqlite.org/src4/doc/trunk/www/index.wiki) which means there is no warning that it was discontinued.

Got me all hyped up for nothing.


Looks like it's basically the same as sqlite3 except it fixes a couple odd things that sqlite3 has for legacy reasons (namely, in sqlite 3 foreign key support is disabled by default and needs to be explicitly enabled, and the actual primary key in sqlite3 is an automatically generated value called rowid, not the specified primary key).


For a moment I thought we might be getting new field types in sqlite. Then I red that this is actually quite old and discontinued :(


On the explicit index cover mentioned in the end, generic database question. Why do indices point to the primary key (which then needs to be looked up), rather than the address of the page where the data is stored directly? Wouldn’t you want to save a primary key lookup? What’s the benefit?


In some db storage engines like sqlite and innodb there is no way to lookup the storage location in a heap because the table itself is an index using the primary key always. This is normally called an index organized table or clustered index. Some db engines like MS SQL Server and Oracle allow you to choose if a table is index organized or just a heap. Postgres give you no choice, tables are always heaps.

If the table is index organized then the very common operation of looking up a row by primary key is a single operation and there is no duplication of primary key columns in a separate index which can save significant space on narrow tables with only pk indexes. However secondary index must store the columns in the pk to identify the row and must do two index transversals to find the row vs storing a physical row location and jumping there.

I like having the choice of table layout and use clustered indexes more than not so would prefer them if I had no choice, vs PG having no option for a true IOT.


I’ll bite. And I apologize in advance for how uninformed I am about the specifics as they apply to sqlite’s actual implementation.

I’ve reproduced below the phrasing I think you’re referencing[1].

As mentioned in the next section regarding “COVER ALL”, there is a cost tradeoff between storage space and indirections that can and must be played here. However, your question is about why not indirect to the physical storage location. The answer is _probably_ more to do with how databases maintain support for concurrent transactions. In Postgres, the MVCC (multi-version concurrency control) mechanism is to avoid overwriting tuples on disk when updates or deletes occur and instead write new values to a new disk region and make them only visible for transactions starting after point in time X. This allows read only transactions to run concurrently and correctly with an uncommitted update/delete without synchronizing on a mutex or otherwise locking each other out.

Other systems choose instead for any updating transaction affecting a given row to acquire a lock during the course of a transactional modification that might occur “in place” (and let’s note: variable width columns participating in updates are a real problem for “in place” updates even being possible).

In either case, the transactional semantics are about the “logical” operation. At no point should two rows with the same primary key be visible. However, you can store the same logical row at two physical storage locations if you otherwise ensure mutually exclusive visibility of those tuples. There is a lot of room for nuance and special case handling of any engines implementation to choose how exactly storage relates to logical in this regard.

Having an engine traverse an index to resolve the correct tuple to read via primary key (logical) instead of storage location (physical) is a very pragmatic approach for an engine like SQLite. First, the space overhead is about the same, but more importantly- the flexibility to add enhancements and specialization is wide open in the future. You aren’t initially coupled to any specific logical/physical tuple lookup scheme that primary-key lookups aren’t already subject to, and if you want to add a special kind of index later you always can. Let’s remember that most SQLite databases are small, and the extra indirection is very cheap. What’s remarkable to me is just how far it can go. It’s really the the little engine that could.

[1] SQLite4 adds an optional clause to CREATE TABLE statements that defines extra columns of information that are duplicated in the index. This allows the application developer to explicitly create covering indices in SQLite4 without resorting to trickery and subterfuge. For example:

CREATE INDEX cover1 ON table1(a,b) COVERING(c,d); The cover1 index above allows queries of the form "SELECT c,d FROM table1 WHERE a=?1 AND b=?2" to be executed with only a single lookup into the storage engine. Without the extra COVERING clause on the index, SQLite4 would have to do two lookups in the storage engine; one to find the primary key based on the values of a and b and a second to find the values of c and d based on the primary key. The COVERING clause makes the values of c and d available in the index, which means that they can be extracted directly from the index without a second lookup.


This got me hoping for an sqlite where adding foreign keys didn’t require making a whole new table.


You don't need to create a new table to add foreign keys - you can do it by setting PRAGMA writable_schema = 1, then updating the table definition directly in the sqlite_master table to include the foreign key constraints.

This sounds terrifyingly dangerous, so you should absolutely take a backup first - but I've found it to work just fine in practice.

Here's where I implement that in my sqlite-utils Python library and CLI tool:

https://github.com/simonw/sqlite-utils/blob/2d84577202e22767...


Doesn't this work?

  cp db.sqlite3 backup.sqlite3
  echo .dump tablename |sqlite3 db.sqlite3 >old.sql
  vi old.sql # make changes, ensure pragma FOREIGN_KEYS=off; drop table before insert
  sqlite3 db.sqlite3 <old.sql


DuckDB seems to be the real "modern SQLite".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: