I was hoping from the title that it aims for postgres SQL compatibility, but I c...

giancarlostoro · on Sept 20, 2020

> SQL Parser: We use the PostgreSQL parser that was repackaged as a stand-alone library. The translation to our own parse tree is inspired by Peloton.

Right at the bottom of this page:

https://duckdb.org/docs/why_duckdb.html

viraptor · on Sept 20, 2020

That's awesome news, thanks for that!

giancarlostoro · on Sept 20, 2020

No problem I was curious as well and hunting it down, this project does seem interesting, I'm a huge fan of database systems like SQLite and H2. There's something wonderful about being able to copy a simple file and share it with other developers. This one seems rather interesting as well.

dd36 · on Sept 20, 2020

That’s what made Excel so popular.

mytherin · on Sept 20, 2020

One of the authors of DuckDB here: we use the PostgreSQL parser, and try to be generally compatible with the syntax used by SQLite, Postgres and MySQL. In some cases those are unfortunately mutually exclusive (e.g. null ordering - we have a PRAGMA to change this system-wide). Making the SQL dialect “as compatible as possible” with these systems is very much one of our goals, and if you find any incompatibilities feel free to open an issue :)

mst · on Sept 20, 2020

I'd note that when moving from pg to $other the thing that really trips me up isn't the syntax changes, it's the lack of ARRAY and ROWTYPE.

I'm not sure whether those are in scope for you but it'd be nice if the docs said "to be implemented" or "out of scope" somewhere ... and my apologies in advance if they do and I somehow missed it.

mytherin · on Sept 20, 2020

We already have support for LIST and STRUCT (which I think are equivalent to ARRAY and ROW, respectively). There is still some functionality missing there (notably storage, and several functions) - but the base functionality is there and the remainder definitely will be added.

mst · on Sept 22, 2020

Could only find a pull request on struct, and nothing at all on 'list' because I just got a billion 'select list' articles. Also nothing in the docs.

Struct looks 10x uglier but perfectly easy to generate, mind, but the whole thing is non-trivial to find :/

jrockway · on Sept 20, 2020

I would just create a per-test-file database (and delete it the next time the test runs). The overhead is very small (compared to booting Postgres) and it works exactly like production because it is exactly production.

In general, I am not comfortable using an "almost equivalent" database for tests. Every database engine has its quirks, and if you don't know what they are, your program that passes its tests only "almost" works in production, which is annoying.

oauea · on Sept 20, 2020

And if you need to run expensive migrations, use a template database.

https://www.postgresql.org/docs/current/manage-ag-templatedb...

majodev · on Sept 20, 2020

While PostgreSQL templates allow to scaffold new databases (one per test-spec) quite fast, it's even faster to have a warmed up pool of databases ready: https://github.com/allaboutapps/integresql#background

Disclaimer: author.

oauea · on Sept 20, 2020

Oh that looks great, and very close to what I had in mind for the next upgrade of our in-house testing framework. Definitely going to give that a try soon. Thanks for posting it!

chrisdalke · on Sept 20, 2020

I would also love this, in the past I’ve used the H2 in-memory database for tests with Spring boot applications.

The Hibernate ORM handles configuring the H2 schema but they’re not fully compatible, so it means I have to be careful to not rely on Postgres-specific features. I generally am not testing database logic, just need a working persistence layer to get the tests running, but an in-memory Postgres implementation would be amazing.

viraptor · on Sept 20, 2020

Although it's something I'd love to write, supporting it over a long time as pure FOSS would probably be a sad experience. I wonder what model could be used here. Maybe something like patreon or "sponsor a feature" with an open-source project? FOSS core + proprietary Java / .NET bindings because corps can pay?

chrisdalke · on Sept 20, 2020

Yeah, for sure this is the type of niche, high-effort work that would be used extensively by the closed-source world and yet struggle to find adequate support.

vips7L · on Sept 20, 2020

IIRC h2 supports a postgres compatibility mode. Though I'm not sure how feature complete it is.

http://h2database.com/html/features.html

c17r · on Sept 20, 2020

Man, I’ve wanted the “compatible SQL engine with only RAM storage” for testing for YEARS. Closest I got was some shenanigans with MSSQL’s LocalDB.

mStreamTeam · on Sept 20, 2020

I use docker to spin up new postgresql DBs on the fly

GordonS · on Sept 20, 2020

Same here, and I run the same set of migrations that run in production. To he clear, this is only done once per test session, not for individual tests, and the tests are written in such a way that they don't interfere with each other.

The overhead is actually pretty small, less than 10s. I'd saw too much for unit tests, but we'll within the tolerable range for integration/functional tests. Compared with the time I'd spend hacking together some brittle and unrealistic in-memory alternative, I much prefer to use a real database.

toyg · on Sept 20, 2020

I do that too but it’s a noticeable overhead, I guess it wouldn’t be great for mass-testing scenarios.

striking · on Sept 20, 2020

RAM FS + fsync off?

viraptor · on Sept 20, 2020

Those solutions still have a high overhead. There's acid compliance, serialisation in memory, maintaining indices, and many other layers. Compare it to an ideal testing solution with no initialisation cost and insert being literally: parse the query, add a new entry to a list, done.

RantyDave · on Sept 20, 2020

But you want to be testing against something that is as close as possible to the deployment environment. So if that means acid, indices etc, then that's what it is.

viraptor · on Sept 20, 2020

You can still do them in a trivial way that works like production. For example: if some column has a unique index, look at all rows and compare the new value. You don't need an actual index for it. (And definitely not a fancy concurrent access btree) For transactions/mvcc you can literally make a copy of everything.

Avamander · on Sept 20, 2020

Have you tried postgresql with libeatmydata?

xyzzy_plugh · on Sept 20, 2020

I'd build this, but I'm not sure if I'd be able to get anyone to fund it.

dkarlovi · on Sept 20, 2020

You shouldn't rely on any "just like X" systems for tests, you should use X exactly, including correct config and version as production.

What you want is having a Postgres server, but optimized for tests. What I've done is 1. One instance per pipeline 2. Parallelize by having separate databases per process 3. Build schema once 4. Before each test, purge the database (or run tests in transactions so you can just rollback in the end) 5. Run in Docker, use tmpfs volumes for writes (no disk writes)

It runs fairly quickly.