> What I do think would be awesome would be an embeddable Postgres library/binary that could use a single state file on your local filesystem for development
I wired Postgres up for our local dev. I don't believe in mocking the database, so all our tests and local dev run against a real Postgres instance.
The main tricks:
- Store the Postgres installation in a known location.
- Set the dynamic loader path to the installation lib dir, e.g., LD_PRELOAD.
- Don't run CI as root (or patch out Postgres' check for root)
- Create and cache the data directory based on the source code migrations.
- Use clone file to duplicate the cached data directory to give to individual tests.
One thing I'd like to pursue is to store the Postgres data dir in SQLite [1]. Then, I can reset the "file system" using SQL after each test instead of copying the entire datadir.
Yep, I used that trick as well. The evolution we went through:
1. Each test runs all migrations on a fresh database. Each test spends 2.1 seconds on db setup.
2. Each test suite runs all migrations. Each test copies the template database from the test suite. Each suite takes 2.1 seconds to run migrations, but cloning a template database takes 300 ms.
3. Bazel caches the data dir and rebuilds it once for all test suites. Reduces the initial test suite setup from 2.1 seconds to 160 ms (to copy the datadir).
4. Each test in a suite uses clonefile to copy the datadir. Reduces db setup overhead from 300 ms (to copy a template database) to 80 ms (to clonefile a datadir).
Currently, most of our testing overhead is clonefile and cleaning up the datadir. I'm interested in a single file sqlite FS because I could be clever and use LD_PRELOAD to replace Postgres's file system operations with sqlite, avoiding
most syscalls altogether.
Nice, any chance you’ll package all this up one day? I appear to be getting better numbers than these right now but I suspect it’s because the DB is simpler and fairly empty to begin with, so presumably we could be going even faster.
I've tried--macOS doesn't make it easy. It's not a clear performance win since tmpfs with Postgres fsync=off is mostly memory-backed storage via the page cache already.
Another problem is that parallel tests can exhaust memory.
The slowest part of our tests is syscall overhead. A mostly empty Postgres data dir for a medium-sized database with a few hundred tables consists of 3k files. On my M1 macOS, it takes 120 ms to delete the entire data dir. Copying is cheaper at 80 ms using clonefile.
I wired Postgres up for our local dev. I don't believe in mocking the database, so all our tests and local dev run against a real Postgres instance.
The main tricks:
- Store the Postgres installation in a known location.
- Set the dynamic loader path to the installation lib dir, e.g., LD_PRELOAD.
- Don't run CI as root (or patch out Postgres' check for root)
- Create and cache the data directory based on the source code migrations.
- Use clone file to duplicate the cached data directory to give to individual tests.
One thing I'd like to pursue is to store the Postgres data dir in SQLite [1]. Then, I can reset the "file system" using SQL after each test instead of copying the entire datadir.
[1]: https://github.com/guardianproject/libsqlfs