
Ask HN: Have you ever used PL/pgSQL – the PostgreSQL programming language? - vanilla-almond
If you have, what was your experience using it? Was it positive or negative? When do you think it is suitable to use it?<p>Here is an overview of the language:<p>https:&#x2F;&#x2F;www.postgresql.org&#x2F;docs&#x2F;10&#x2F;static&#x2F;plpgsql-overview.html
======
tomlock
When I didn't know any better I wrote a finite state machine in it.

In my opinion, try and use as much SQL as you can first. Functions are often
an optimization barrier for the postgresql query planner. The postgresql
implementation of SQL has a bunch of neat functionality in it that replaces
things that often happen in apps, like window functions, and the docs are so
great.

They are hard to debug. Try to keep their functionality small.

My experience, overall, is positive. Consider using it if the amount of data
transferred to the app and evaluated there is getting absurd. But try to
creatively solve the problem with pure SQL first before moving to functions.

~~~
arisb
Functions don't have to be optimization barries. If you mark them as stable or
immutable I believe postgres will inline them whenever they are used such that
they can be optimized.

~~~
davidgould
Optimization barrier means something a bit different, it means that the
optimizer can't push down filters or change the join order, or flatten
subqueries into joins. That is, it splits the query into parts that are
separately considered for rewriting and plan selection.

In postgres, CTEs are optimization barriers, that is, the implementation plans
and executes the CTE separately and the main query consumes the result of the
CTE. Pure SQL functions can sometimes be inlined, that is, made part of the
containing query, but functions in the other languages cannot. In particular,
if a function runs queries itself those happen in a new execution context
(portal) and don't share anything with the function callers plan.

~~~
tomlock
These are both great observations.

VOLATILE/STABLE/IMMUTABLE is definitely a choice worth considering and making
in an educated way when writing a function.

Also, using WITH to create CTEs makes SQL easy to read, but sometimes will
slow down a query considerably. I have a feeling that new releases will make
attempts to mitigate this issue in some cases.

------
dizzystar
I've used it. There is a time when milliseconds count, and nothing else will
help you get to that speed. I've taken some things that were running 30
seconds in app logic and reduced it down to 500ms. It's great for solving
these little hot-spots. I also think the JSON processing is excellent.

I think it is pretty simple to use, but I've also had to fix a lot of poorly-
written PL/pgSQL. There is a LOT of nuances to the language that can blow your
feet off, and the language doesn't really scale out to multiple people working
on it. You just have to find one person who is good. You also need to have an
editor with snippets because the language is very verbose.

Personally, I'd rather use PL/pgSQL than any off-the-shelf ORM, but as I said,
it wouldn't scale to more employees very well. The language solves one of the
main use-cases of ORMs via the use of entities.

~~~
Something1234
PL/pgSQL supports entities? This is kind of amazing. Do you have a resource to
pick this skill up?

~~~
dizzystar
I guess I should be more specific by what I mean by entities. I probably used
the wrong word here:

[https://www.postgresql.org/docs/10/static/plpgsql-
statements...](https://www.postgresql.org/docs/10/static/plpgsql-
statements.html#PLPGSQL-STATEMENTS-EXECUTING-DYN)

PL/pgSQL gives you %I and %L, which allows you to specify table and column
names. Basically, if you want to `select * from %I`, you have the option to
call the specific table.

~~~
Something1234
So what's the best way to leverage this feature in crud applications?

------
mr_toad
A lot of data scientists and analysts favour PostgreSQL for ETL and heavy duty
data manipulation. PL/pgSQL lets you perform fairly complex tasks on datasets
that may be too large to handle in memory (e.g too big for R and pandas).

And unlike a lot of ETL and reporting tools it doesn’t involve a lot of
pointing and clicking. You can put PL/pgSQL in version control along with your
R and Python code. That’s very handy for repeatability.

------
kennytilton
I wrote an entire BPM run-time automation in pl/pgsql. The BPs were stored as
PG rows, the deciding data was in the PG DB, and PG triggers automated quite a
bit, so the firt seemed perfect. The strong static typing was annoying (I am a
Lisper) but did have the usual payoff: if it runs at all it is probably right.
Debugging over a socket was rough, but I could take the failed query and drop
it straight into a pgAdmin query and get back a very clear error. Guy building
the BP front-end picked up what I had done in a few days, and he had never
done pg/plsql before.

It took a week to code up, another 2-3 for expressing the initial huge BP and
a random BP test driver. Worked great.

My only question is whether I would do it again the same way. I liked that
there was not some external system that had to be running for the BP to run,
but it is not hard to arrange that and then one gets a much more accessible
programming environemnt (such as Lisp). I guess I will not know until I get to
do it again.

~~~
mooreds
How did you handle deployment and version control?

~~~
kennytilton
I left before the system got much use, so grain of salt and all that. It was a
Rails shop so deployment would have been by a migration. Not sure what you
mean by version control. The source was in git. Had we refactored
significantly we would have had to write a migration. But the system was in
the end quite simple. I refer to it as "BPM" but real BPMN is a bloated
monstrosity -- I just captured the essence, and that was dead simple. One fun
thing I did not mention was implementing "funcall" in pgsql. That meant that
edge cases could be covered by writing a custom pgsql function that would do
exactly what was needed. That meant the rest of BPM could stay simple.

------
mifreewil
I wrote a basic bitcoin exchange using triggers and PL/pgSQL back in 2012. You
would insert into an orders table and an after-insert trigger would run the
order matching algo written in a PL/pgSQL function. It wouldn't have scaled up
too well, I suspect, but it worked really well for the initial product that
allowed us to acquire initial customers and raise a Series A.

I would say my experience was pretty positive, I would consider doing
something like this again for the right use case. It saved a lot of back-and-
forth queries between the app and database and allowed the logic in the app to
be much more simple than it otherwise would have been, but at the cost of
insert throughput into the triggered tables.

If you are at all considering this route, I would recommend the excellent
project, pgtap, for testing: [https://pgtap.org/](https://pgtap.org/)

I used pgtap, along with the cli utility, pgprove, to run tests.

------
mtmail
We use it extensively for [http://nominatim.org/](http://nominatim.org/) It's
superior when writing loops or even recursive queries that usually require
follow-up queries (and thus multiple back-and-forths between your app and the
database). Debugging and test-driven development on the other hand is hard. So
is versioning functions and updating them. I'd only recommend it if all other
options of improving the performance of database queries were tried.

[https://github.com/openstreetmap/Nominatim/blob/master/sql/f...](https://github.com/openstreetmap/Nominatim/blob/master/sql/functions.sql)

~~~
xaranke
Very cool. Out of curiosity, what would be the best way to get involved in
Nominatim?

~~~
mtmail
The email list
[https://lists.openstreetmap.org/listinfo/geocoding](https://lists.openstreetmap.org/listinfo/geocoding)

List of small
[https://github.com/openstreetmap/Nominatim/issues?q=is%3Aiss...](https://github.com/openstreetmap/Nominatim/issues?q=is%3Aissue+is%3Aopen+label%3Asimple)
and larger projects
[https://wiki.openstreetmap.org/wiki/Google_Summer_of_Code/20...](https://wiki.openstreetmap.org/wiki/Google_Summer_of_Code/2018/Project_Ideas#Nominatim)

~~~
xaranke
Cool, will take a look. Thanks!

------
kumarvvr
To those who have extensive experience with using plpgsql in production, For a
REST API application, do you recommend putting business logic in plpgsql?

~~~
combatentropy
I like to put all intelligence in the database, instead of the middle language
(PHP, Python, Node, whatever). People argue about whether to put all your
intelligence and business logic in your database, and I think you should
google it and read about it for a long time. I say the moment you make a
column unique or add a foreign key, you have already begun. What most people
do is have part in their database and part in their middle scripts. And it's
not by design so much as by how well they knew SQL at the time. People
typically learn the middle language first, be it Python or PHP or what have
you. They know how to write the business logic in that language, and they only
have so much time and energy, and I don't blame them. Learning SQL in general
and Postgres in particular takes a long time, but the more I've learned, the
more I move.

Now that doesn't mean PL/PGSQL, because I like to use the simplest tool that
does the job. From simplest to most complex, you can:

1\. Put it in the table definition, things like: data type, null or not null,
unique, foreign key, and check constraints:

    
    
        create table orders (
            id serial PRIMARY KEY,
            item int NOT NULL REFERENCES items,
            qty int NOT NULL CHECK (qty > 0),
            ...
        );
    

2\. Put it in a database view definition. If your SQL is getting long and
complex, save it to a view, so your application doesn't have a tower of SQL in
its source code. Views can also warn you when you're changing tables in a way
that will break something ("You can't drop this column because this view
depends on it, you can't change this column's type to that because a view
depends on it, etc."). Of course you can drop and recreate the view after
changing the table, but at least it warns you. Other times, like if you rename
a column, I think the view updates automatically. None of this would happen
automatically if the SQL is hardcoded in your Python script.

3\. Put it in an SQL function (SQL, not PL/PGSQL).

4\. Put it in a PL/PGSQL function.

That's the order I take, from smallest hammer to largest.

~~~
kumarvvr
Thanks for the detailed reply.

What is your observation on the performance of such a system, when compared to
systems where the middle language has all the logic and the database is used
just as a dumb datastore?

Also, what are your observations on the scalability of such a system you have
mentioned above?

~~~
combatentropy
> performance

Performance is usually better in the database, because the data is local.
However, like any language, there is more than one way to say it in SQL. When
the query is complicated, with joins and subselects and aggregates, I have
made it literally a thousand times faster by rearranging and rewriting things.
I'm not even talking about adding indexes yet.

> scalability

I think 99% of applications could scale just fine with Postgres.

------
krueger71
Positive, I would say. Postgres in combination with PostgREST, using database
views as resources with a sprinkle of PL/pgSQL in instead-of triggers can
build you a performant REST-API real quickly (and still keep your relational
data model private).

------
Tostino
Yeah I've written a ton of plpgsql for the application I've been developing
for the past 5 years. In some cases it is much much faster than any other
solution I've found due to data locality. Even if the language itself is not
fast in many respects, being right next to the data can do wonders in some
cases. As others have said, debugging leaves a ton to be desired. As does
testing. It's a great tool for the right problems though.

------
bjjbj
I've used Oracle PL/SQL and PL/pgSQL pretty extensively, as well as TSQL (MS
equivalent). Mostly for HR, Finance and trading systems. Overall positive, but
like many languages, it's easy to misuse and can quickly become unwieldy.
DBMSs don't really have much support for arranging objects. In Oracle, you
have packages, but overall it's up to you to maintain and document why certain
objects exists.

------
torgian
My current project uses Postgresql along with pipelineDB. It's great for
handling a lot of data, but I don't pretend to fully understand it. I'm
focused on front-end, but my job naturally needs me to learn at least the
basics of data manipulation and so on.

My company likes using it simply because it's great for that heavy duty data
manipulation.

------
thorin
I've used Oracle PL/SQL (and sometimes SQL Server TSQL) extensively for the
last 20 years. It's way easier to write data manipulation in this specific
type of language than to use Java/.net/C etc. Linq for sql has improved things
within Microsoft ecosystem but then you have the big overhead of the ORM etc.
I know the Postgres version is heavily based on the Oracle language which has
been very successful and endures in big business: banks etc. It's a great
language to get stuff done, but the ease of use can cause problems if you
don't engineer your solution properly.

For data it's great however my current site uses it for many other things such
as heavy xml tranformations, web service calls etc and then the gaps in the
language start to show through and you've be better off moving into something
general purpose, but you can do almost anything with it if you want...

------
slipwalker
(me and my team) wrote an entire private label system on this ( and perl ) by
early 2000´s; on pgsql 6, if i recall correctly. It was ( at that time ) a
very good option, to avoid the Oracle's pl/sql ( version 8.1.5 ) that would be
needed for the reporting tasks. Didn´t write much of pl/pgsql since...

------
cellis
Yes and I love it, and it’s amazing. I have been using it to write plpgsql
functions for postgraphile. Find out more at
[https://github.com/graphile/postgraphile](https://github.com/graphile/postgraphile)

------
pvaldes
I played a little with it in the past. It was positive, but needed some work
to upgrade. Now I just put a layer of perl before postgres to solve the same
problem. I find that is more easy to me.

------
grizzles
I have. I'd rate it neutral. If I was starting from scratch I'd probably try
out PL/Python or PL/V8.

------
tcbasche
Having used Oracle's PL/SQL, I can't really see a use case for this kind of
language. Your application logic should be kept within your application, not
on your database.

The exception to this of course being unless your application is already on
your database.

~~~
iracic
There is. When you need to treat your data without network delays. When
performance is essential. When you can't express it efficiently with non-
procedural SQL or need to have access with several technologies (can think of
it as service on db level). Depending of environments, deploying changes can
be much easier than patching app side.

~~~
tcbasche
That is a very good point :) I definitely did not consider the performance
benefits. You learn something new every day!

