I was hoping from the title that it aims for postgres SQL compatibility, but I can't find it explicitly mentioned in the docs. This really makes me think I really want something like sqlite://memory which completely disregards speed or even persistence. Instead you could say for example "open an in-memory database that behaves like postgres 9" and run your tests against it. With typical fixtures of 10 or so rows, you wouldn't even need a query plans - just naive filte+copy over whole tables for most operations.
No problem I was curious as well and hunting it down, this project does seem interesting, I'm a huge fan of database systems like SQLite and H2. There's something wonderful about being able to copy a simple file and share it with other developers. This one seems rather interesting as well.
One of the authors of DuckDB here: we use the PostgreSQL parser, and try to be generally compatible with the syntax used by SQLite, Postgres and MySQL. In some cases those are unfortunately mutually exclusive (e.g. null ordering - we have a PRAGMA to change this system-wide). Making the SQL dialect “as compatible as possible” with these systems is very much one of our goals, and if you find any incompatibilities feel free to open an issue :)
I'd note that when moving from pg to $other the thing that really trips me up isn't the syntax changes, it's the lack of ARRAY and ROWTYPE.
I'm not sure whether those are in scope for you but it'd be nice if the docs said "to be implemented" or "out of scope" somewhere ... and my apologies in advance if they do and I somehow missed it.
We already have support for LIST and STRUCT (which I think are equivalent to ARRAY and ROW, respectively). There is still some functionality missing there (notably storage, and several functions) - but the base functionality is there and the remainder definitely will be added.
I would just create a per-test-file database (and delete it the next time the test runs). The overhead is very small (compared to booting Postgres) and it works exactly like production because it is exactly production.
In general, I am not comfortable using an "almost equivalent" database for tests. Every database engine has its quirks, and if you don't know what they are, your program that passes its tests only "almost" works in production, which is annoying.
Oh that looks great, and very close to what I had in mind for the next upgrade of our in-house testing framework. Definitely going to give that a try soon. Thanks for posting it!
I would also love this, in the past I’ve used the H2 in-memory database for tests with Spring boot applications.
The Hibernate ORM handles configuring the H2 schema but they’re not fully compatible, so it means I have to be careful to not rely on Postgres-specific features. I generally am not testing database logic, just need a working persistence layer to get the tests running, but an in-memory Postgres implementation would be amazing.
Although it's something I'd love to write, supporting it over a long time as pure FOSS would probably be a sad experience. I wonder what model could be used here. Maybe something like patreon or "sponsor a feature" with an open-source project? FOSS core + proprietary Java / .NET bindings because corps can pay?
Yeah, for sure this is the type of niche, high-effort work that would be used extensively by the closed-source world and yet struggle to find adequate support.
Same here, and I run the same set of migrations that run in production. To he clear, this is only done once per test session, not for individual tests, and the tests are written in such a way that they don't interfere with each other.
The overhead is actually pretty small, less than 10s. I'd saw too much for unit tests, but we'll within the tolerable range for integration/functional tests. Compared with the time I'd spend hacking together some brittle and unrealistic in-memory alternative, I much prefer to use a real database.
Those solutions still have a high overhead. There's acid compliance, serialisation in memory, maintaining indices, and many other layers. Compare it to an ideal testing solution with no initialisation cost and insert being literally: parse the query, add a new entry to a list, done.
But you want to be testing against something that is as close as possible to the deployment environment. So if that means acid, indices etc, then that's what it is.
You can still do them in a trivial way that works like production. For example: if some column has a unique index, look at all rows and compare the new value. You don't need an actual index for it. (And definitely not a fancy concurrent access btree) For transactions/mvcc you can literally make a copy of everything.
You shouldn't rely on any "just like X" systems for tests, you should use X exactly, including correct config and version as production.
What you want is having a Postgres server, but optimized for tests. What I've done is
1. One instance per pipeline
2. Parallelize by having separate databases per process
3. Build schema once
4. Before each test, purge the database (or run tests in transactions so you can just rollback in the end)
5. Run in Docker, use tmpfs volumes for writes (no disk writes)
Currently I'm using https://pypi.org/project/testing.postgresql/ which spawns a temporary server per each test suite, which has quite a lot of overhead.