
Show HN: PgRebase, codebase management for Postgres - _pctq
https://github.com/oelmekki/pgrebase
======
oelmekki
Hi everyone,

I've been playing a lot with postgresql lately, writing custom functions,
views, types, etc.

I find it cool, especially regarding performances. But one thing was annoying
me: migrations.

When I wanted to edit an existing function, changing its parameters, I would
have to retrieve the function in my generated structure.sql file, generate two
migration files (up and down), paste the function in the down file, paste it
again in the up file, copy the signature in a drop function statement, edit
the function, then copy the new signature in a drop statement in the down
file. This was painful.

With pgrebase, you can just write your pg codebase as pure files, and have
pgrebase watch them. Any change will be instantly loaded in the database. A
non watch mode can be called on deployment, after your usual migration steps,
to sync your codebase.

Let me know what you think!

~~~
ddorian43
Can't you just, like, replace the function ? And postgresql has the best
migrations, transactional (most of them).

~~~
_pctq
You can't use `CREATE OR REPLACE FUNCTION` if function's signature changed.

~~~
pritambaral
Would `DROP FUNCTION IF EXISTS ...; CREATE FUNCTION ...;` be satisfactory?

~~~
oelmekki
It is, provided you pass to `DROP FUNCTION` the exact previous signature of
your function.

Then, if you want to be able to revert your change, you also need to write a
down migration, having a `DROP FUNCTION` with the exact same signature than
your _new_ function, and a `CREATE FUNCTION` that recreates the old one.

This is exactly the pain point my root comment describes and which pgrebase
solves :)

~~~
oelmekki
It's clear from this thread that the point of pgrebase may not be understood
if you don't play often with functions, so let me provide a bit of context
about what comes _before_ pgrebase.

Why do we use migration tools?

Data is the most important thing for a business. You can loose your code and
recreate it. If you loose your data, your business is basically done. Yet,
your code often needs change in your database structure. So you write sql in
files to describe those changes, because we can't just drop all tables and
recreate them on each deploy, we have to preserve data.

It is not really efficient to "just drop sql files in your codebase", because
database does not live in that codebase, and you can't be sure databases on
all your installations, locals and servers, are up to date. Plus, you can't
just load those files in any order, there is an order to be respected. For
that reason, we use timestamped up migrations files.

But then, it means that full code related to a given table does not live in a
specific and dedicated file. Should you want to revert the change (especially
common in dev env), how would you do? For that reason, next to up migration
files, we have down migration files.

All of this works just fine, but it has ultimately a single purpose: syncing
schema without loosing data.

If you want to manage functions in that context, it will be painful.
Postgresql won't allow you to replace a function if its signature doesn't
exactly match. That is, you can't change a function signature, you have to
first drop it, then recreate it. With migrations, this means that when you're
migrating up, you need to drop the old function and create the new one ; and
when you're migrating down, you have to drop the new function and recreate the
old one.

Remember of how the purpose of this is to not loose data? Well, you can drop
all your functions and recreate them, you won't lose any data. This means you
can store them in dedicated files, which means you can revert files
individually. So, this is what pgrebase is doing : managing all your non data
altering pg code so that it's not needlessly painful to use.

------
sbuttgereit
I think this is interesting and starting down a better path than migrations;
there are other tools which similarly keep schema files organized in a
structural way rather than as a timeline... a much superior approach.

Having said that, the dependency management here will have to get better
before I could consider it.

 _You can specify dependencies for files using require statement, provided
those files are of the same kind. That is, function files can specify
dependencies on other function files, type files can define dependencies on
other type files, etc._

This alone is a show stopper for me.

Say I have a table (ignore formatting and other formalities):

CREATE TABLE test_table (id uuid primary key, test_val text, test_val_num
numeric);

Now I need a PL/pgSQL function as so....

CREATE FUNCTION testfunc() RETURNS VOID AS $BODY$ DECLARE myTestRec
test_table; BEGIN [do stuff] END; $BODY$

My function declares the myTestRec variable to be of type test_table which
means for the create function statement to work, the test_table table must
exist first. But I cannot, according to the documentation, express or depend
that the dependency will be recognized and enforced. Lack of more fluid
dependency management would be a show stopper for me.

I like that this tool is implemented in Go; the tool that I use which has this
sort of approach is Sqitch ([http://sqitch.org/](http://sqitch.org/)) and I
hate the perl dependencies that need to be fulfilled to get it to install. But
couldn't leave it for something like this until a greater number of scenarios
are supported.

[Edit for clarity]

~~~
oelmekki
Hi sbuttgereit,

You're totally right that the dependency graph handler is extremely crude, I
wrote it in one evening because I had functions depending on other functions
and this caused my import to fail. The dependency statements only exist to fix
that, for now, and that's indeed where I think there'll be the most work to
do.

You're also right that you still need a migration tool to manage tables
schema. I did not intend to replace migration tools, actually, I'm satisfied
with them as far as dropping/creating table and adding/removing columns are
concerned. The focus of PgRebase is to manage pure non data altering code,
which you may edit a lot and this is what migration tools are not good at. The
codebase managed is flushed and recreated at will, I didn't want to manage
anything that would risk data loss. During deployment, pgrebase is expected to
be called after migrations: first you migrate your data, then pgrebase
regenerate the non data code.

Something that I probably should add in the documentation: the fact that a
code kind can only require an other code kind should not be a problem because
of the order they are processed. First, pgrebase processes all custom types,
then views, then functions, then triggers. Usually, triggers depends on
functions (thus, they're already there), functions can use views or types
(already there), etc.

The rule of thumb is: if it's something you're editing a lot and that can be
dropped and created without data loss, then its place is in pgrebase.

~~~
sbuttgereit
Check out the Sqitch tool I mentioned: it's _not_ a migrations tool, it's very
much the sort of approach you're taking, but it is more complete and mature;
it's solved some problems you haven't yet. Yes, tables require special
handling, but still doable in an idempotent fashion in a single "table" script
using anonymous blocks.

Also remember in PostgreSQL, views and tables are de facto custom types which
are created at table/view create time implicitly; not at custom type create
time. If PgRebase processes object kinds strictly in the order you say: what
you are really saying is that PgRebase can only support a certain subset of
PostgreSQL database designs. An example would be if I have a table where a
column is DEFAULT myfunction('default_cust_type'), there I need the function
before the table... if I have a view which retrieves a the output of a
function (maybe at the top in a CTE)... now I need the function before the
view... if that function also relies on table_a as a type, I need table_a ->
function -> view... you can't just assume that the object types are always in
a certain order and it will be all OK. As a design decision... that won't work
for a number of databases.

I would strongly advise you to build your dependency model on file references
rather than on PostgreSQL object kinds. This way you don't need to worry about
what objects are permitted to depend on another (since this can be quite
fluid) and rely on the user of your tool to manage the dependencies. With that
approach you don't loose the structural file organization that you are aiming
for now while allowing database developers to be in control of the deployment
process.

~~~
oelmekki
Yes, having a better dependency resolver is something that will happen,
eventually. Edge cases will have to wait for that :) (the workaround,
obviously, is to use classic migration tools for those, since you still need a
migration tool anyway).

------
chridal
This is really cool! Having to manage functions manually is actually the
reason I avoid using them for the most part. Exciting!

~~~
oelmekki
Thanks! Same here, I would always hesitate before writing a function just
because of that pain. I went totally wild this week with pg after building
this :D

------
MarHoff
Hi nice idea! I'm mostly a DB guy and I also face this kind of challenges
because I'll soon need to share my pg codebase over different servers.

Until now I was more investigating the native extension mechanism of
PostgreSQL: [https://www.postgresql.org/docs/current/static/extend-
extens...](https://www.postgresql.org/docs/current/static/extend-
extensions.html). But I totally get that a more ad-hoc and versatile solution
can be more fitted.

Don't be to rude because it's really minimalist, I actually just published
this afternoon my first tiny test on GitHub here :
[https://github.com/MarHoff/Test](https://github.com/MarHoff/Test)

So i definitively had to say Hi there ;)

~~~
oskari
Packaging your functions in an extension has the advantage that it allows you
to call remote functions when using the postgres_fdw Foreign Data Wrapper.

PostgreSQL 9.6 added extension function pushdown as a new feature allowing
functions to be executed on the remote side as long as the extension is
registered with the foreign server. This avoids having to fetch all the data
from the remote side to execute a function locally, see
[https://www.postgresql.org/docs/current/static/postgres-
fdw....](https://www.postgresql.org/docs/current/static/postgres-
fdw.html#AEN181695)

~~~
thejosh
Downside is that every developer who works locally has to have the up to date
extensions, or if you use docker-compose or something and have the extensions
in your repo/sub-repo you could include itn that way.

------
petepete
This looks great. I wrote something similar last year (almost identical
directory layout) but it wasn't anywhere near as clever; each database object
directory had an `up.sql` and `down.sql` file and depending on the 'mode' in
which the script was invoked (up, down or refresh - which was down then up)
ran the SQL scripts in order.

