
PostgreSQL example of self-contained stored procedures - weinzierl
https://sivers.org/pg2
======
sivers
I'm the author. Thanks to Ludwig for posting this here.

I've been doing all of my database work like this for 5+ years now, and love
it. Many more examples here:

[https://code.sivers.org/db-api/](https://code.sivers.org/db-api/)

It works great when you want to write simple shell scripts, too. They look as
simple as this:

[https://github.com/sivers/store/blob/master/getdb-
example.rb...](https://github.com/sivers/store/blob/master/getdb-
example.rb?ts=2)

People have been asking how I do unit testing. The main thing is to have a
fixtures.sql file, which is a dump of data to test against:
[https://github.com/sivers/store/blob/master/store/fixtures.s...](https://github.com/sivers/store/blob/master/store/fixtures.sql)

Then in the setup before each unit test, just drop the database schema, and
reload it:

[https://github.com/sivers/store/blob/master/test_tools.rb?ts...](https://github.com/sivers/store/blob/master/test_tools.rb?ts=2)

Examples of my API unit tests here:

[https://github.com/sivers/store/blob/master/store/test-
api.r...](https://github.com/sivers/store/blob/master/store/test-api.rb?ts=2)

And also testing things like database triggers, and private functions:

[https://github.com/sivers/store/blob/master/store/test-
db.rb...](https://github.com/sivers/store/blob/master/store/test-db.rb?ts=2)

Feel free to ask me anything here, or email me:
[https://sivers.org/contact](https://sivers.org/contact)

— Derek

~~~
1996
I read this a few days ago. I was waiting for it to show on HN because I have
a few questions.

I want to do the same, but with big databases (the TXID wraparound is a
problem every few months). For now, the queries are stored in the code. They
rarely evolve.

How do you manage versioning? Queries in code mean you can ensure everything
match.

How easily other people interact with that? Not many developers know SQL well.
Poor SQL skills can slow down a server for other users.

What did you gain in practice? It is nice, like CI/CD or version control, but
I see little gains for something that does not change very often.

Currently, most of the queries are materialized views that can be refreshed
separately. It seems simpler to me.

In practice, how is this different from refreshing materialized views?

~~~
sivers
Versioning: sorry I'm not sure what you mean. Same as if you had queries in an
ORM and views using the results, you'd need to branch and make changes, then
merge the branch together at once. I just do the same. If the database
structure changes, I run the ALTER TABLE commands, when switching to that
branch.

Other people : I guess like anyone choosing a language, you're excluding those
who don't know it. I'm assuming more know SQL than Rust or Elixir or whatever.
And more should.

Gain in practice : Two best improvements were:

(1) Having all data and data-logic (which might also be business-logic) in the
same place. No worrying that some external code somewhere might have old
data/business logic in it. One definitive source. Like Rich Hickey's classic
"Simple/Complex Hard/Easy" talk -
[https://www.youtube.com/watch?v=rI8tNMsozo0](https://www.youtube.com/watch?v=rI8tNMsozo0)
\- I like that this is simple, un-complex. The data and external code don't
need to be braided/complected together to work.

(2) The freedom do switch my external code from Ruby to Go or Elixir or
whatever, and not have to rewrite all that functionality. It's all just simple
API calls.

Sorry I haven't looked into materialized views yet, so I don't know how this
compares.

~~~
1996
Thanks a lot for your reply!

It seems the main gain for your is to move between languages and allow initial
iterations/bugfixes without touching the application code.

The database has more maintenance issues (like rolling txids) than the
application code. I am not sure I want to add more complexity and potential
issues to the database.

FYI, a materialized view is a potentially long or complex query whose results
are cached, so you can say 'select * from complex_query_result' to get them,
and refresh the complex_query_result whenever you feel like it. You can also
update the query that generates complex_query_result.

In practice, MV can give you speed (as you can refresh the MV when you
need/want, while keeping the results) and also put the data-logic inside the
database (as the MV is defined initially, and can later be updated) if you
don't need super fresh results. If you do, use a regular view.

In either case, you can use the view approach when parameters are needed, iff
you can reduce you query to where parameter=something on the view. Otherwise,
you need to use languages like pl sql.

As the materialized views queries just return the results to be processed, and
I have very little extra to do, your approach seems overkill for my use case.

~~~
Twisell
Unless it changed very recently be aware that you can't update sql definition
of a materialized as easily as you would update a view. There is currently no
"CREATE OR REPLACE" option so any dependencie build onto a materialized view
can quickly become a real pain (been there). Sometime, it might be easier to
stick to the old trick of a table updated by a refresher function (possibly
called as a trigger).

~~~
mhd
Dependencies are a general problem, like changing the type of a table that has
dependent views. It's a good idea if your database update scripts/migration
software can handle something like this. Other databases don't tend to be as
strict as Postgres here (I only found out that some rarely-accessed views
never quite worked after migrating from Oracle)

I've had good luck with these functions:
[https://gist.github.com/mateuszwenus/11187288](https://gist.github.com/mateuszwenus/11187288)

Allows you to save-and-drop dependent views (materialized or regular) and then
restore them after your updates.

~~~
Twisell
Deeply agree about dependencies being a general problem.

Thanks for the script that look pretty clever #bookmarking. Like particularly
the approach of "drop what you saved, no more no less". DROP CASCADE is
simpler but can have undetected side effect, if this script fail to backup all
dependency, logically you will get an error when attempting to delete target
and that rocks.

------
tashian
One unmentioned reason this is powerful: It avoids concurrency bugs and
inconsistencies introduced by write skew, in cases where the application layer
(eg. Ruby/Python) makes business logic decisions based on data that has become
stale.

These are hardest to detect and avoid in the case where business logic depends
on no rows matching (count(*) == 0), eg. meeting room scheduling systems,
because there is nothing for a transaction to lock!

Stored procedures can avoid this entirely, even under the weaker transaction
isolation levels, by keeping all the logic in the DB and running it
atomically. At Zipcar we took this approach to writing the web reservation
system in the early 2000s. Later, it allowed us to easily add a telephone
reservation system without duplicating business logic.

It worked great back then and it still works great, if you have good ways of
managing the tradeoffs (version control, unit testing, etc).

~~~
Merad
Ive worked on several apps that tried this approach and each of them was a
mess. I’ve seen two fundamental problems with building your app on top of
stored procs:

* Tooling around sql is generally inferior to what’s available for <pick your favorite language>. I’ve yet to see a company with effective automated tests around their database... it’s far more common to have _no_ tests around the database. Even if you’re the unicorn that does have all of that figured out, it still tends to be more difficult for your devs to write and test code.

* Most devs are poor to mediocre when it comes to sql. You’re either going to have to hire more specifically for sql, or force your devs to do complex work (your business logic) in a toolset that they aren’t that good at.

IMO this is a case where a few applications may have significant concurrency
issues that warrant the database business logic approach... but for your
average app, it’s unnecessary and makes life more difficult for your team.

~~~
ramraj07
This has always struck me as odd, how can _any_ competent developer suck at
SQL especially after spending maybe a week or two trying to write logic in it?
As "languages" go it can't get any simpler, you're just directly forced to
think about data in a model that's extremely close to the data itself. Seems
to me that a dev that can't think well in SQL with minimal training is not a
good dev at all, and this could actually be a nice filter.

------
llimllib
I'm curious if anybody has practical experience developing a reasonably
trafficked website in a similar manner.

I wrote apps which consisted mostly of oracle stored procedures when I first
got out of college, and it was a pretty awful experience, but it was also at a
place where the development knowledge was minimal.

I've since been getting closer and closer to writing postgres in this way. My
current app (which I architected) sticks close to postgres and uses many of
its features like custom types, enums and json, but we don't rely on it for
stored procedures at all. Instead, all application logic lives in the go API
app.

I've been considering moving some things directly into the database, but I'm
nervous about it because the app runs well and I like the data/logic division.
Also go seems easier to learn for new developers than SQL, and it's a lot
easier to deploy a new app server than it is to reconfigure a database server.

~~~
thijsvandien
Every once in a while I encounter somebody who heavily promotes that way of
working. There are a couple of arguments I've heard in favor of it. Often it's
a mix of performance, ultimate data integrity, and "simplicity". I just don't
buy it.

If performance is so critical, whatever you gain by moving everything to the
database is lost by the fact that databases are harder to scale. You may be
able to get by with a single instance a little longer, but when eventually you
don't anymore, you've now worked yourself in a corner. It's an 80/20 game
anyway, so just use an occasional stored procedure when it's really worth it.

For data integrity, I've always found databases not expressive enough. Sanity
checks, okay, but you can't capture a non-trivial domain in some custom data
types, check constraints and foreign keys. Even if you introduce stored
procedures that will be responsible for keeping everything proper, you need to
go crazy with permissions to block circumventing those. Might as well build
your own API then. (I do find it difficult where to draw the line when it
comes to what to enforce in the database still.)

"But you don't need another API! Just use the database as one!" Then I ask how
they do testing, and the answer basically is: "We don't make mistakes or we
find out about them (in production) soon enough." That pretty much ends the
discussion for me. Surely there must be ways to devise a basic testing
framework for stored procedures, but why bother? I don't want to spin up a
database just to test some logic. Never mind testing, what about refactoring?

From a theoretical point of view, I can see the potential, but practically...?
The tooling isn't there, and perhaps that's for a reason. Maybe databases are
just not supposed to be used that way. Despite the extra functionality they
offer, I treat them mostly as a data _store_.

What am I missing? I would love to be proven wrong.

~~~
dagss
We write most code in backend (go), but we DEFINITELY spin up a new, real SQL
database in for every test in our automated test suite. ("CREATE DATABASE" is
rather cheap; use a fresh docker DB instance for each suite run, or run it
permanently while developing.

Mocking away DB is both an inefficient way to work and lets the tests cover
less. I will never write software without running tests against a real DB
again.

Our setup totally would let you test anything in stored procedures (although
we do not use them much).

~~~
marcus_holmes
agree on the mocking the database - I wrote an entire test suite based on
mocking the database responses, and then found a ton of live problems caused
by the database - backend interaction (things like structs not aligning with
the actual return records from functions, input format for arrays of ids being
different from the mockup to the actual driver, and hstores).

Now I have a test database, which I create from a version-controlled
schema.sql, and each test creates its own fixtures in the test db. Works way
better.

------
bfung
This is an old pattern (like early 2000s MS SQL app development) and works
wonderfully for the majority of applications.

It does become complex and awkward in two case, 1) variable option selection
and 2) at a certain "scale" / quantity of inserts/deletes.

Variable option selection

Take for example a search interface of dependent items and item options, like
cars. The query for car options for different car models generally returns a
variable list of options. Further filtering results based on the variable list
of options, when the list of options is all in one table, is most efficient
with an "in clause". Trying to represent the "in clause" using
functions/stored procedures is a little awkward as there is list construction
and deconstruction involved, and sql is not great at list/string parsing.

Number of inserts/deletes

At some point, a successful app will reach a point where it's more efficient
to batch insert & delete -- a couple hundred rows. This goes back again to
using "in clauses' and the stored procedure approach makes things a bit
awkward, as then the store procedure will need to argument parse and check.

Not saying what's done here is bad - it's actually really good for many, many
reasons. I'd actually recommend the approach and avoid what I've described
above unless it makes the desired functionality hard and complex.

------
grizzles
This is how I do relational databases in 2019. My database is a self contained
module more akin to an rpc api than a database. My code has virtually no
database logic in it at all. I don't know but it feels like functional
databases could be a thing. We should call Martin Fowler and ask.

~~~
BubRoss
How is that not how databases are typically used?

~~~
pmontra
Returning JSON to a client is relatively recent. The standard for web apps was
to get data from the db and generate HTML on the server. A JSON layer would be
a useless overhead.

Furthermore about everybody knows how to use an ORM. Not many people can write
store procedures. Maybe the question is why is that so. I heard "I know
[language], I don't know SQL, so I write queries in [language] " so many times
even from people I would have expected to have a much broader knowledge. (That
also applies to networking technology, they know HTTP and nothing more.)

By the way the code doesn't address authentication and authorization, those go
in the layer between the client and the database.

~~~
ramraj07
Do you think it's an acceptable excuse for a dev to say they don't know SQL?

~~~
pmontra
No, it's not acceptable. Still it's evident that it doesn't prevent getting a
good job.

------
mslot
Stored procedures can also be a great way to scale out transactional
workloads.

In Citus (a sharding extension for Postgres by Microsoft) we recently
introduced a stored procedure call delegation feature.

If your tables are distributed by person_id and you have a stored procedure
that takes person_id as a parameter, you can run e.g. "SELECT
create_distributed_function('lineitem_add', 'person_id')". The procedure calls
for a given person_id value will then be fully delegated to the PostgreSQL
server that stores the data for that person_id without additional round-trips
between statements or distributed planning overhead. It'll "magically" take
care of procedures that access data from other servers as well.

I realize we're basically going full circle to PL/Proxy, except in Citus
almost everything else (distributed queries, DML, DDL, transactions) works as
well.

------
chatmasta
I’ve been using this pattern recently. Very early stages but I like it so far.

I wrote some boilerplate tools for easily loading schemata and extensions.
This is a must have for being able to write isolated tests.

I keep all data definitions in schemata, which can be managed by standard
migration scripts. I keep function definitions in extensions, which do not
create any data stores but expose functions with well defined interfaces for
querying the underlying data. I rely on standard Postgres extension loading to
manage migrations.

I really like this pattern, but there is a degree of unapproachability to it.
It takes a lot of effort to make sure tests work well - but it’s not so bad.
Since your DB functions do everything, the app code is just a shell around
those functions (few lines of code to call each function), so by testing those
functions you are exercising their Postgres counterparts.

With tools like postgraphile and Postgrest available, this is becoming a more
appealing option. What we need is more tooling for managing deployment,
installation, migration, verification of and communication with the stored
procedures from within application code. Combined with Postgres PL libraries
like plv8, it should be possible to write JavaScript code within your
application toolchain that you can test locally, but deploy to run in Postgres
context.

------
jimktrains2
I've been working on some projects and offloading as much as I can to the
database.
[https://github.com/jimktrains/jskerp](https://github.com/jimktrains/jskerp)
and
[https://github.com/jimktrains/jskplan](https://github.com/jimktrains/jskplan)
being the two I'd like to get further along with.

Working in the DB like this is both very nice and ... very odd? It doesn't
feel like most development environments and feels like going back to C or pre-
object Ada (I believe Pl/SQL is based on Ada syntax-wise) or Pascal where
you're writing lots of global functions that operate on structs.

Once I gain a little more experience developing like this, I'd love to think
about that and see if there'd be a way to build a language that feels more
"modern" and that can be transpiled to SQL (including building diffs from a
current database).

------
dillonmckay
How would somebody implement this for a team with versioning, source code
view, and deployment?

~~~
msvalkon
One can create the stored procedure and store it in an .sql file which can be
version controlled like any piece of code. These can then be deployed in a
number of ways.

------
i_s
This kind of approach can be good for getting data into the database,
especially for non-trivial updates, but is not good for getting data out. The
problem is if you want things like custom group-bys, sorts, joins, windowing,
etc, you end up reinventing a lot of concepts, or copy+pasting procedures with
slight variations. It basically takes SQL's biggest issue, query
composability, and makes it even worse than it already is.

~~~
EB66
> The problem is if you want things like custom group-bys, sorts, joins,
> windowing, etc, you end up reinventing a lot of concepts, or copy+pasting
> procedures with slight variations.

I could see how you could get that impression if you've worked on a project
with poorly planned functions, but our experience has actually been the
opposite of that. On the contrary, using PostgreSQL functions has allowed our
company to be more consistent and declarative with our return data and has
helped us eliminate repetitive SQL code. Particularly so when used in
conjunction with our RESTful APIs.

Don't forget that in Postgres you can define custom composite return types
(i.e., a return type that includes another return type). You can have a
Postgres function that returns nested objects like this (I'm using JSON for
clarity):

[ "name":"Bob" ,"age":13 ,"address":"123 Main St" ,"cars":[ { "make":"Toyota"
"model":"Camry" "license":"NF98549" }, { "make":"Toyota" "model":"Corolla"
"license":"NF5649" }, ] ]

The ability to enforce that kind of return type for a given function is
incredibly helpful

> It basically takes SQL's biggest issue, query composability, and makes it
> even worse than it already is.

I think that the bigger problems are ORMs like SQL Alchemy. Where it not only
introduces a new layer of abstraction but also empowers engineers to request
data from the database in a completely ad hoc "unregulated" manner. In large
projects without a strong lead or code review process, it can quickly turn
into a spaghetti mess that seriously complicates database re-factors.

When you use functions, you have clearly defined inputs and outputs, better
re-usability and much more consistent interactions between the application
layer and database layer.

------
nneonneo
I’m reminded of the fully-functional HTTP web server (!!) implemented in pure
PostgreSQL for the PlaidCTF competition this year:
[https://cr0wn.uk/2019/plaid-triggered/](https://cr0wn.uk/2019/plaid-
triggered/)

It does everything in SQL - header parsing, HTTP routing, and even includes a
micro templating engine for generating pages.

------
ris
As neat as this is, I'm not sure I'd recommend this - I'm not sure what the
plan would be when it came to scaling.

~~~
segmondy
Tell us about your scaling problem. Do you really have a scaling problem?

~~~
ris
Ok, where does this go when it needs to grow beyond a single machine? Is it
time to set up replication? Partitioning?

~~~
segmondy
How many orgs do you know that need to grow beyond a single machine let alone
an indie developer? This post was from an indie developer not some big corp
engineering blog. I don't know what you mean by partitioning, I'm assuming
sharding and not partition/splitting of tables, because this approach is not
mutually exclusive to that. MySQL and PostgreSQL can scale up very vertical if
designed well. 99.99% or even more never need to scale horizontally.
Nevertheless, this can be scaled horizontally carefully a bit. You grow your
replicas. Your main DB will be your transactional DB. All procedures that are
doing inserts/updates/deletes and select will call that DB. All non
transactional SELECTS will call the read only replicas and treat those as your
analytical. You can easily have something like one main DB, 3-5 read only DB.
Add caching, Add queues and you can have a system that can serve 1+ million
users daily.

------
dandigangi
Why are stored procs still used? My experience with the black hole (or really
I should say not in version control) pushed me away from them.

~~~
detaro
Why not put them in version control, and have your deployment process pull
them out of the repo and apply them to the database?

~~~
ben509
It's a bit more complicated than just "upload all my stored procedures" in
practice. I think we need a version control / deployment system that is
designed for databases to make this work.

We want it to regard the database as the source of truth, and understand that
while you can blow away a development database, you can't do that to prod.

We want it to be able to track which clients are using which stored procedures
so it can prohibit dropping stored procedures that are in use. Ideally, when
those clients upgrade to newer versions, it's then able to drop them
automatically.

Stored procedures can bind to tables, views, types, etc. so it needs to be
able to recommend migration plans.

~~~
cryptonector
I don't get it. You have a source file. It has your store procedures and what
not. You deploy it. You have the server startup load it. Done.

What's the problem?

Well, you might complain that some schema changes are hard to make this way,
and that's true, but for stored procedures it's straightforward, and even for
other things you can manage and make the update atomic (though it'd be nicer
to have better SQL language support for schema changes).

As with everything, you need to mind backwards compatibility. This is true for
C, C++, Rust, Java, and SQL/PlPgSQL. There's nothing special about this case.

~~~
ben509
> You have the server startup load it.

We have to take down a production database to do this?

Or do you mean the app server is going to do it, in which case how do you
handle different app servers with different versions of the stored procedure?
Especially if we're doing phased deployments?

~~~
cryptonector
No, you can load it at any time. Just write (and test) SQL that is safe to
load for upgrading a schema.

------
omani
postgrest for those who dont know.

