- Foreign key checking (and cascading deletion for that matter) are turned off by default. You need to enable them using `PRAGMA foreign_keys = ON;`.
- There are practically no downsides (and a number of upsides) to using the WAL journalling mode (at least for "use a local database and store it on the disk" use cases). The main one being that reads will conflict with writes if you don't enable it! (which is a problem in a multi-threaded environment) Unfortunately, that's another feature you must remember to enable: `PRAGMA journal_mode = WAL;` (for obvious reasons, this one "stays enabled" after you turn it on).
- Full auto-vacuum cannot be enabled after you start writing to the database unless you enabled incremental auto-vacuum. If you're unsure, it's a good idea to enable incremental auto-vacuum to keep that option open. But, here again, that's not the default: you need `PRAGMA auto_vacuum = INCREMENTAL`.
This tends to be explicitly problematic if you have to perform one or more ad-hoc queries using the SQLite command line on an app's database, and forget to apply the relevant PRAGMAs! I wish there was a way to add "default" PRAGMAs on a sqlite database file to avoid this.
(note: some of these defaults are configurable when compiling sqlite from scratch, but if you're dynamically linking with an OS-provided instance of the library, you can't really do that).
I mean Postgres is a fine piece of software but if your website gets 500 visits a day, you don't need Postgres; just use SQLite.
SQLite favours a very permissive approach, while PostgreSQL favours locking everything down with strict types and strong integrity checks, and it's very easy to verify lots of details of your data before it's accepted into the database. That takes work to set up properly and maintain, but it catches bugs early and reduces the number of headaches you get when trying to use your data in new code.
Scalability is a very different problem that happens to favour PostgreSQL, but most projects indeed never get far enough that this matters.
But from a developer experience point of view, the main difference between dynamic and static typing is the following:
Dynamic typing requires a lot of extra checks to ensure everything is what you expect it to be, and a lot of annotations to document what everything is supposed to be.
In other words, dynamic typing has a high cognitive tax.
With this in mind, I find that PostgreSQL imposes a cognitive tax in terms of maintaining a separate process (potentially on a different machine) that a lot of times is outside your direct control. Not to mention all the strange ways in which it can break.
SQLite just works. There's nothing to configure. So there's a lot less cognitive tax.
There is still of course the tax of having to ensure that all your sql statements are valid and that they return what you expect them to return, but this is also the case with PostgreSQL.
RE static vs dynamic typing, I'd say that it's dynamic typing that has higher cognitive tax - you as a programmer are fully responsible for ensuring types agree up everywhere, whereas with static typing, all of that job is done by compiler. It's easier to change stuff in the code when you know the compiler will catch your dumb mistakes. And I say that as someone who loves writing in Common Lisp.
But the point about SQLite vs a typical RDBMS is spot-on, IMO. SQLite is a library. It operates on files. That's it. It's entirely local. You can use it without having to become a sysadmin, without having to set up a whole service on the OS on which you might not even have root privileges anyway. You don't have to make global changes to the system just for your program. Moreover, if your product is of the distributable kind (desktop apps, self-hosting web stuff) you don't need to make your users become sysadmins, manage a system-wide service, acquire root rights they might not have, etc.
SQLite is local. Its data is local. That's, IMO, its strongest benefit.
Yes, that's exactly what I said.
If you actually need any of these things you can enforce them in code, by convention, but you're better off installing Postgres.
Also writes are serialized. That one really can't be worked around.
This is true, and explicitly acknowledged by the author:
> SQLite does not compete with client/server databases. SQLite competes with fopen().
If you're deciding between Postgres and SQLite, in most cases you're doing it wrong (there are a few exceptions, such as serving a low-traffic almost-entirely-read dynamic website can be accomplished easily with either). Usually you want to choose between SQLite and "anything else that involves writing directly to the local filesystem".
Think about requirements first, then tech/implementation approach.
But yeah, like everyone else, I often just start with SQLite anyway!
SQLite has supported CHECK() for a while. So, for example, you can do:
CHECK(LENGTH(col) IS BETWEEN 1 AND 16)
Across multiple processes, but there is a shared cache and read uncommitted mode available otherwise.
None of the apps I've written have hitten any kind of crazy traffic peak, so I started wondering why the hell I was using postgres or other database types after reading https://www.sqlite.org/whentouse.html.
Seeing how simple SQLite has been for me to use has inspired me to write a bunch of dead-simple good-enough approximations for other tools that exist. High up on my list is a Graylog competitor that just uses SQLITE FTS (https://sqlite.org/fts3.html/https://sqlite.org/fts5.html), that just is super easy to start and handles that micro-to-mid-size case before you really need something like graylog or ELK.
Blog post if anyone's interested:
I think for most smal projects SQLite + FTS offers a pretty simple easy to setup solution that doesn't require much compute power, either. The simplicity is also a bonus.
The concurrency model just doesn’t match up well with multiple web processes doing work.
If you are writing on every request, Murphy’s law says that too many of those 500 visitors will be loading pages at exactly the same time...
If you're logging user visits for example, you can queue those up in a list (in the code) and flush it to the database every few seconds instead of flushing it on every visit.
That won't work in the language that most server-side website code is written in. PHP processes are created and destroyed per HTTP request, and even the FastCGI implementation won't let one share data across requests. PHP does have a sessions feature, but that data is written to either the filesystem (by default) or a database on every request!
I just don't understand why people can't see the logic behind using something operationally simple like SQLite.
On the flip side, setting up a PostgreSQL instance is so easy now with docker that you could go ahead and use it with a fraction of the admin overhead that it used to require:
I just find it weird and confusing.
Also, how do you setup a database in docker? Docker does not support persisting data as far as I know. So if you need to restart the thing, all the data will be wiped. At least that's how it seems to be.
So, if you want to keep it simple:
docker run postgres --detach --volume /save/my/data/in/this/path:var/lib/postgresql/data
Even if you don't specify a path for the data, the container would save its data in a Docker volume (a file), which doesn't get deleted until you explicitly ask it to. If you stop the container and restart it, it'll reattach to the same volume and find the data. If you delete the container without explicitly deleting its volumes as well, you'll be able to create another postgres container and attach the old volume to it.
In some systems such as kubernetes, data is not saved on a persistent disk. But by default docker containers can retain data. However this is generally solved by using volumes, which you can read about here: https://docs.docker.com/engine/admin/volumes/volumes/
Why do you ask?
There are a variety of opinions on the goodness of using docker for running a database.
Install docker and docker-compose, write a simple docker-compose.yml file, and then `docker-compose up -d` in the same directory as the docker-compose.yml file. You now have a full Postgres instance running on your machine at localhost:5432. Stop it with `docker compose down`
DB files are stored in the `pg` folder in the same directory as docker-compose.yml. You can specify this on the `volumes` property.
/usr/lib/postgresql/9.5/bin/postgres -D .mydatadir
This is especially helpful if some developers use OS X or Windows.
I don't understand the "all of that" phrasing, as if what I just described was a large amount of work.
Docker is useful for running components of your own code or for isolation, not for managing versions of core applications like a database or a mail server for most people deploying ready-made apps.
All our "system software" — Postgres, Redis, Memcached, RabbitMQ, Elasticsearch, etc. — run as containers on Kubernetes. Doing so allows us to start new versions without considering the host OS. For example, we can upgrade Postgres to a new release the moment it's out, without having to wait for Debian or Ubuntu or whatever.
It also allows us to run different versions (e.g. a new one for testing in our "staging" environment) concurrently without any concerns about which node is running what. In fact, almost nothing needs to be configured with information about "nodes"; everything is routed to the right place, and the client only needs a name to connect to.
Sometimes we start temporary instances of software, e.g. to experiment with. Bring the container up, do the testing, bring container down; nothing ever got installed on the host OS.
I've tried several different dev environment setups and this approach yields the fewest frustrations and inconsistencies.
Production is a different story but that's not what I'm discussing here.
I can see an app relying on features not available in older releases, but databases - of all software - have rock-solid backwards compatibility.
To me, the value is in reducing the number of potential issues a developer can have in their environment and making the build and runtime as consistent as possible. And when the solution is so easy to use, I don't see much of a downside to the docker approach.
My team switched from Vagrant to docker compose for our development environments and since then we've had much more stability and have wasted much less time tending to them.
My whole point is that I want to simplify my setup, so that I don't need tons of third party servers.
I have written fairly large web-apps taking fairly decent traffic, and relying solely on SQLite, unless really pressing circumstances required otherwise. Healthy exercise too: Sharpens your focus, "Do I absolutely need to do a write here?".
Not so much for quite a while, though.
How easy it is, in practice, to switch from one SQL database to another?
Although to be fair, I have no idea if this is any less common with other DB's.
Anyone know more details?
Edit: ok, I believe it now: https://www.sqlite.org/th3.html
SELECT json_extract(RawData, '$.sourceId') SourceId
, json_extract(RawData, '$.messageId') MessageId
, json_extract(RawData, '$.message') Message
, json_extract(RawData, '$.message.parties.name') Party
, json_extract(RawData, '$.timestamp') Timestamp
ON SourceId = Source.Id
WHERE Source.Name = 'Foo Company'
ORDER BY Timestamp
Plus, you know, the whole SQL thing.
Even as an experienced programmer, I find SQL to almost always more elegant for expressing logic than Pandas or R. My workflow is usually having a script execute a SQL statement to do the heavy data work, and then importing the results of that SQL statement into another programming environment (e.g. R/ggplot2 for visualizations).
The flat files method may lead to partial or empty files in certain failure cases (crash during writes) that may or may not be more gracefully handled by SQLite.
I had an interest rate app where the backend data collection process consists of a shell script to run curl to download rates from various places as CSV, import CSV into Sqlite, runs a few SQL to clean up/merge/fix default/dedup the data. The process is idempotent and can add new data to existing data. At the end I got a long history of clean up-to-date interest rate data in a Sqlite table.
Of course, SQLite can also emulate a key-value store quite well, with a table with 'key' and 'value' columns.
(thought it was a typo, now I’m not sure)
Then you can "SELECT person.name, array_agg(hobbies.name) FROM person JOIN hobbies ON (person.id = hobbies.person_id)" to get your json representation back (at least with postgresql, "array_agg" isn't in standard SQL)
That's a pretty simple join, any real database layout I've seen goes much more complicated and much deeper than that.
If you needed it all in one flat table for some reason, SQLite supports views.
It makes it possible to expose a custom, application specific database format as a sqlite table, and suddenly you can run SQL queries on your data!
And it goes beyond single tables e.g. PG-Strom uses FDW to implement GPGPU scans, joins, aggregations & projections (and provides a pg/CUDA while at it)
Once you're ready get yourself a cup of the SQLite koolaid
The DB Browser (they changed it from sqlitebrowser at the request of sqlite IIRC) also has in my experience, a very helpful and responsive dev team: https://github.com/sqlitebrowser/sqlitebrowser/issues
Thanks. We definitely try. :D
(I'm one of the dev's on that project)
But I really disliked that my database teacher decided to use MySQL to teach about databases (that was about 10 years ago).
Several times he would talk about specific concept, then mention that MySQL doesn't have it (like CHECK constraint, enforcing foreign constraints, aborting statements through triggers (emulated by throwing exception) etc). If we are going to learn universal database concepts, why not use database that supports them. I personally learn things better when I can try runs out and see how they work.
Sorry for my rant, it just reminded me about that.
So you are advocating people who have issues with simple apt-get or yum install mysql or <app> to instead install docker and deal with the complexity of docker, before they can use the app?
I stumbled upon SQLite around 2001 and have been using and admiring this gem of a software since that time, but hearing that it is aviation-grade is definitely a surprise to me...
My understanding (although I can no longer find the page in the sqlite3 docs) was that because of caching, using :memory: is unlikely to make much difference in practice.
A few excerpts from this page:
Get to know Firebird in 2 minutes:
Firebird is derived from Borland InterBase 6.0 source code. It is open source and has no dual license. Whether you need it for commercial or open source applications, it is totally FREE!
Firebird technology has been in use for 20 years, which makes it a very mature and stable product.
Don’t be fooled by the installer size! Firebird is a fully featured and powerful RDBMS. It can handle databases from just a few KB to many Gigabytes with good performance and almost free of maintenance!
Below is a list of some of the Firebird’s major features:
Full support of Stored Procedures and Triggers
Full ACID compliant transactions
Multi Generational Architecture
Very small footprint
Fully featured internal language for Stored Procedures and
Support for External Functions (UDFs)
Little or no need for specialized DBAs
Almost no configuration needed - just install and start using!
Big community and lots of places where you can get free and good support
Optional single file embedded version - great to create CDROM catalogs, single user or evaluation versions of applications
Dozens of third party tools, including GUI administrative tools, replication tools, etc.
Careful writes - fast recovery, no need for transaction logs!
Many ways to access your database: native/API, dbExpress drivers, ODBC, OLEDB, .Net provider, JDBC native type 4 driver, Python module, PHP, Perl, etc.
Native support for all major operating systems, including Windows, Linux, Solaris, MacOS, HP-UX and FreeBSD.
64bits builds available
Full cursor implementation in PSQL
Connection and Transaction Triggers
TraceAPI - know what happens in your server
Disclaimer: I'm not associated with them in any way. Just have used the product a bit, and think it is somewhat good, based on that.
Firebird is free for commercial and educational usage: no license fees, installation or activation restrictions. No double licensing - Firebird license is based on Mozilla Public License.
The mix of high performance, small footprint, supreme scalability, silent and simple installation and 100% royalty-free deployment make Firebird a highly attractive choice for all types of software developers and vendors.
It is used by approximately 1 million of software developers worldwide.