
PostgreSQL Templates - kiwicopple
https://supabase.io/blog/2020/07/09/postgresql-templates/
======
majodev
We are using PostgreSQL templates to speed up our integration tests. This
allow us to have full blown (and non mocked) databases which are isolated to
each parallel running test.

We have recently open-sourced our core-server managing this (go):
[https://github.com/allaboutapps/integresql](https://github.com/allaboutapps/integresql)

Projects already utilizing Go and PostgreSQL may easily adapt this concept for
their testing via [https://github.com/allaboutapps/integresql-client-
go](https://github.com/allaboutapps/integresql-client-go)

~~~
randombytes6869
We have found H2 in Postgres mode to be the fastest way to test db stuff, at
least in Postgres and Java. In Go you might lose most of the speed advantage
since its not running "in JVM". It also doesn't support many of pg's advanced
features

~~~
majodev
I'm generally not a fan of emulating database behavior through a mocking layer
while testing/implementing. Even minor version changes of PostgreSQL plus it's
extensions (e.g. PostGIS) may introduce slight differences, e.g. how indices
are used, function deprecations, query planner, etc. . It might not even be an
erroneous result, just performance regressions or slight sorting differences
in the returned query result.

We try to approximate local/test and live as close as possible, therefore
using the same database, with the same extensions in their exact same version
is a hard requirement for us while implementing/testing locally.

However, I have no experience with H2 (and the modern Java ecosystem in
general), so cannot talk about how close their emulation implementation
resembles PostgreSQL behavior.

------
KingOfCoders
We're using PostgreSQL teams to create a database from a template for every
integration test worker (Jest).

This way every worker works on their own database. The speedup was 5x-7x as
tests are now running in parallel.

~~~
jayd16
Do you store test data in the template?

My team uses a migration framework (in our case Flyway) such that we can
easily create completely fresh DBs on a factory fresh(+ user roles) instance
of Postgres. We do this for our testing as well. This means the schema tools
are always tested as well. Is there a benefit to using templates? They're
faster perhaps?

~~~
tln
I'm using templates as well for the speedup in testing. We build a postgresql
docker image with the migrations + seed data, then create a few databases for
parallel testing.

When migration or seed files have changed, rules rebuild the image. That way
the price of migration (about 2 minutes for our Django app) isn't paid on
every test run.

Creating a DB from template is 2 seconds.

------
thinkingkong
I've used this in the past for testing depending on how much seeding was
required prior to the test, and how much parallelism was required. Other than
that I haven't been able to find any useful real world scenarios to exploit
this functionality. I suppose if you wanted to have a source database and a
bunch of children and could prevent connections to the source it might be
helpful... somehow?

~~~
Squid_Tamer
I worked on a SaaS project who's customers are companies in an old-fashioned
industry. A common requirement was that their data shouldn't be stored in the
same database tables as any other customer. They were fine with it being co-
located on the same cloud servers, using the same credentials... as long as
their data wasn't on the same tables. So I used postgres templates to make a
database per customer.

Didn't make much sense to me either but whatever pays the bills I guess.

~~~
outworlder
Just don't fall into the trap of creating a database schema per customer.
Having different databases is better (as TXID is not shared)

~~~
chrismorgan
Can you elucidate or provide some reference about this? I’m not familiar with
any problems in this space (it’s not clear to me what problems a shared TXID
could cause), and schemas seemed neat.

~~~
outworlder
That all depends on how many customers you expect to have in a single
database, and how many tables each one needs. EDIT: also, transaction volume.

Ok, so let's imagine your application requires 100 database tables.

If you have 1 customer, you have 100 tables. Every transaction performed in
this database (including updates) will increase the age(datfrozenxid) by one.
You now have 100 tables you need to vacuum.

If you only had one schema, that's no problem at all. Even with the defaults
and a high number of writes, the default 3 autovacuum workers will quickly
deal with it. Even though they can only vacuum a table at the time.

Now your business is growing. 1 customer becomes 10, 100... 10000?

At 10k customers, you now have 1 million tables (maybe more, as TOAST tables
are counted). Every single transaction increases the 'age' of every table by
one. It doesn't matter that 99% of all tables have low TXIDs, the overall DB
"age" will be on the table with the highest count.

The default 3 autovacuum workers will definitely NOT cut it at this point(even
with PG launching anti-wraparound autovaccums). You can augment by "manually"
running vacuum, but in that case the overall database age will only decrease
once it's done vacuuming all 1 million tables, so start early(before you reach
the ~2.1 billion mark)

Alternatively, you could write a script that will sort tables by age and
vacuum individually. This may not help as much as it seems, as the
distribution will depend on how much work your autovacuum workers have been
able to do so far. Not to mention, even getting the list of tables will be
slow at that point.

So now you have to drastically increase the number of workers (and other
settings, like work_mem) – which may also means higher costs – and you'll
still have to watch this like a hawk (or rather, your monitoring system has
to). There's no clear indication that the workers are falling behind, you can
only get an approximation by trending TXID age.

You can make this work, but it is a pain. Even more so if you haven't prepared
for it. For a handful of customers or a small number of tables or a small
number of transactions this may not matter. Our production systems (~200
tables, excluding TOAST) started to fall behind after a few thousand
customers. We have had at least one outage because we didn't even track this
at the time and the database shutdown. 20/20, but nobody remembered to add
this to the monitoring system.

Another unrelated problems with multiple schemas is: database migrations are
much more complex now, you have to upgrade all schemas. This is both a
blessing and a curse. Database dumps also tend to take forever.

------
thinkingemote
i use templates to create a "quick save" backup of a database quickly during
development:

createdb -T old_db new_db

~~~
silviogutierrez
Neat trick. Is this any faster than pg_dump and pg_load? Note: there's plenty
of optimization you can apply to pg_dump and even use pg_restore, none applied
below.

Like so:

    
    
        function save_database() {
            mkdir -p /tmp/database-snapshots
            DATABASE_NAME=$PROJECT_NAME
            pg_dump -Fc "$DATABASE_NAME" >"/tmp/database-snapshots/$DATABASE_NAME.dump"
        }
    
        function restore_database() {
            DATABASE_NAME=$PROJECT_NAME
            DATABASE_DUMP="/tmp/database-snapshots/$DATABASE_NAME.dump"
    
            if [ ! -f "$DATABASE_DUMP" ]; then
                echo "No dump to restore"
                return
            else
                psql postgres -c 'SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE pid <> pg_backend_pid()'
                psql postgres -c "DROP DATABASE IF EXISTS \"$DATABASE_NAME\""
                psql postgres -c "CREATE DATABASE \"$DATABASE_NAME\""
                pg_restore -Fc -j 8 -d "$DATABASE_NAME" "$DATABASE_DUMP"
            fi
        }

~~~
gkop
Yes, it's significantly faster because it doesn't require rebuilding indexes
(among other reasons probably). Downside is it consumes more disk space.

Stellar is a tool that wraps the template mechanism, and has some benchmarks:
[https://github.com/fastmonkeys/stellar](https://github.com/fastmonkeys/stellar)

~~~
jpitz
My understanding has always been that creating a database is essentially a
series of file copy operations from the template.

~~~
gkop
That may be true! The point is the template has the indexes already built
ready to copy. Where the dump file does not.

------
stansler
Another approach is to use thin clones of Postgres to create disposable
databases for fast iterations.

We've made Database Lab tool working on top of CoW file systems (ZFS, LVM)
that is capable of provision multiterabyte Postgres instances in seconds. The
goal is to use such clones to verify database migrations and optimize SQL
queries against production-size data. See our repo:
[https://gitlab.com/postgres-ai/database-lab](https://gitlab.com/postgres-
ai/database-lab).

------
FanaHOVA
Is this similar to doing something like `rails db:schema:load` and then `rails
db:seed`? i.e. set a brand new db to a certain schema + preload a set of
fixtures into it?

~~~
mccolin
The effect will be relatively similar -- you arrive at a new database with the
proper schema/structure in place, but the difference is in execution. Doing
this _within_ the database engine is likely 10x faster than loading the schema
through a client library making a series of CREATE and ALTER statements (ala
the schema:load rake task).

------
spapas82
Does anybody know why template1 isn't utf8 by default? Because of that I
always need to create new databases from template0 to use utf8 f.e like this:
`createdb -T template0 -E UTF8 newdb`

