
Are triggers really that slow in Postgres? - turrini
https://www.cybertec-postgresql.com/en/are-triggers-really-that-slow-in-postgres/
======
doh
Triggers are one of the best parts of Postgres.

We built a large ETL and machine learning operation in PG through triggers,
from simple algorithms calculating rate of change for updated datasets, to
identifying trends and connecting seemingly unrelated data.

Even at our scale [0][1] the performance tradeoff is absolutely worth it. The
best part is the consistency of the data. If the trigger dies, the whole
transaction is rolled back and you have to re-run it. This way we never end up
with different state of data that has to be fixed after the fact.

There are some downsides. Triggers are very hard to debug, optimize and
monitor. Versioning is kind of nightmare and there is no direct performance
information that you could capture. In our case it's even more as we are
running on citus and we have to deal with data distribution, colocation and
other issues.

But it's now two years and we're only adding to it.

[0]
[https://www.citusdata.com/customers/pex](https://www.citusdata.com/customers/pex)

[1]
[https://cloud.google.com/customers/pex/](https://cloud.google.com/customers/pex/)

~~~
weavie
> There are some downsides. Triggers are very hard to debug, optimize and
> monitor.

That's enough for me. My heart always sinks when in response to a problem
someone suggests a solution of "let's just stick a few triggers in"..

Its happened too many times in my career where some wierd problem that noone
can work out turns out to be caused by a trigger that noone realised was
there.

~~~
doh
I do understand. Triggers are ripe for improvements. We have some internal
tools built around them, but nothing significant. Most helpful are protocols
for humans to follow when dealing with a problem. Always start with the
trigger.

At the same time, triggers saved us hundreds of thousands of lines of code and
extremely complicated logic, that would be required would we wanted to replace
them.

------
wyc
The biggest concern that engineering teams seem have with stored procs is
maintainability. The consistency guarantees are indeed very attractive, but
perhaps not at the expense of keeping business logic in separate places, weak
version control/deploy solutions, and a new programming language to the stack.

If there were a compiler that could take business logic in the project's
programming language and manage mappings to stored procs (reminiscent of
LINQ's mapping to queries), maybe we could have the best of both worlds.

~~~
stuartaxelowen
Interesting, I think that a good CI/CD story is probably the biggest thing
holding broader use of stored procedures back.

~~~
hinkley
This sort of sentence is uttered by someone who is so far down a hole that all
they can see is stars. It’s a big hole that a lot of people are in but it’s
the consequence of a nasty trade off.

All of this is _trivial_ if your team decides that a shared database for dev
work is bad for repeatability and thus bad for scaling the team.

You should be able to spool up a local database with good sample data in it.
To do that your schema, indexes and triggers would be under version control,
and a data dump is somewhere people can get it.

Once you have this your CI system runs Postgres locally or in a container,
runs the drop create scripts, and then runs your integration and end to end
tests on the canned data.

In addition to getting a CI solution for next to free you get rid of the
concurrent access Wild West and this particularly painful conversation:

Why did this break and why didn’t you notice it before you pushed? Oh I saw
that problem the other day but I thought someone else was changing data (and
not my code being broken).

If the data is on your machine and it gets broken, then it is only your
machine that could have broken it. You can’t delude yourself into thinking it
was someone else mucking around. The problem is either in your code or in your
latest pull from master. _You are responsible for determining the cause_ , not
me, not the release manager, not QA. You.

~~~
manquer
Not all bugs are logical. Many times CI workflow does not catch poorly written
but logically correct queries because sample db has few records in it,
emulating large data sets with data patterns like production to get the same
kind of query plans is hard.

~~~
hobs
I would say this is the first objection that is on point in this thread, not
that figuring out deployment isn't hard, but the blog posts on how to solve it
for your platform are multitudinous.

Getting useful data loads to test queries is much more difficult but you have
a few options (at least coming from a SQL Server approach):

* Query hints to emulate larger sets of data (so you can see what type of IO you would get, of course multiplication is your friend)

* Check if your database offers something along the lines of DBCC CLONEDATABASE [https://docs.microsoft.com/en-us/sql/t-sql/database-console-...](https://docs.microsoft.com/en-us/sql/t-sql/database-console-commands/dbcc-clonedatabase-transact-sql?view=sql-server-2017) (gets you a copy of statistics from prod without the data)

* Building a data masking process so that the data is somewhat representative (in volume, mocking it such that the histogram of values is the same in your database is WAY harder)

* Building an isolated load test environment with something like distributed replay [https://docs.microsoft.com/en-us/sql/tools/distributed-repla...](https://docs.microsoft.com/en-us/sql/tools/distributed-replay/sql-server-distributed-replay?view=sql-server-2017) (record workload, replay, measure, make your change, replay, measure, yes - it is tedious but that's a performance regression test for you)

------
dizzystar
The best way to discuss any speed thing is with anecdotes, right? /sarcasm

I had a system that required the parsing of large json chunks. The system
pulled the json from an API, pushed the data into a json-type column, then
sorted the data into normal form.

I originally tried using straight Python to pull the data, but decided that I
ought to keep the original data for record keeping, plus testing was a lot
faster without constantly calling the API.

When all was said and done, the whole operation took about a minute to
complete. I decided to try trigger, which caused the entire process, from
call, to printing "done," to take less than a second.

In this case, the trigger was signficantly faster.

The danger of this anectdote, and all stories with databases, is that all
things have to be posted with "in this case." Any time you read triggers,
aggregation, CTEs, etc, are fast or slow, consider that this is almost always
told in a vacuum. There are so many variables, that the term "fast" is wholly
useless

~~~
yeukhon
I have no experience in using the json column in postgres (well, sort of did,
but it was just a json type, before json column was a real thing).

For me, I try to do as much on the database side like sorting, which does
require a good schema and table designs. This is why folks often criticize
MongoDB. One of the reasons was the convincence of “schemaless”.

When Mongo was first introduced, I think a lot of developers, including me,
saw Mongo as an excuse to move away from relational databases. So we began
dumping all kinds of shit. Doing fancy stuff on Mongo side is not possible
without a good design either.

What people probably did was just pulling data from multiple collections, and
do filtering and “joins” on the server (client) side. I would find myself
writing a for loop over doing a bunch of stuff. Yikes.

Of course there are other criticisms against MongoDB, but ultimately
developers like myself did not (and probably still) have any decent clues how
to use databases well. Learnig to use databases right is something I really
want to be good at.

~~~
philliphaydon
Sounds to me like your documents were poorly designed.

------
jerrysievert
short answer: no, not slow.

longer answer: you can do some really cool things with triggers in postgres,
my favorite is what I like to refer to as "writeable views" \-
[https://legitimatesounding.com/blog/stupid_postgresql_tricks...](https://legitimatesounding.com/blog/stupid_postgresql_tricks:_writable_views.html)
(2010)

~~~
mustardo
Interesting...

I can imagine uttering some four letter words while trying to figure out what
was going on if I was to take over maintaining a system that used this
(principal of least surprise)

~~~
gaius
Writing through a view is totally unremarkable and has been for a long time.
It’s quite common in Oracle and SQL Server lands. You can update via a CTE
too, very easy, very elegant.

------
stuartaxelowen
Triggers in postgres are amazing - I've got a full automated phrase-extraction
system built in postgres triggers, such that salient phrases are automatically
detected in marked text fields.

------
skibz
I've been maintaining a large (and quite old) in-play gambling system for the
past six months that uses Postgres triggers and pg_notify [1] as the core
mechanism for broadcasting websocket frames to SPA clients.

Hilariously, it turned out the system was violating a recommendation in the
Postgres official documentation for pg_notify:

> The "payload" string to be communicated along with the notification. This
> must be specified as a simple string literal. In the default configuration
> it must be shorter than 8000 bytes. (If binary data or large amounts of
> information need to be communicated, it's best to put it in a database table
> and send the key of the record.)

The original triggers were invoking pg_notify with row_to_json(NEW), often
causing the pg_notify calls to fail (which also caused the underlying
insert/update that fired the trigger to fail too, meaning we DROPPED new or
updated records) due to the JSON text being far too large.

[1]: [https://www.postgresql.org/docs/9.0/static/sql-
notify.html](https://www.postgresql.org/docs/9.0/static/sql-notify.html)

------
Someone
1) I don’t know _pgbench_ , but given the NULL checks in the trigger, I wonder
how often they actually change the records in this benchmark. If you want to
performance test triggers, wouldn’t it be better to unconditionally change
those fields?

2) as the blog post indicates, these are extremely simple triggers that work
locally. The IMO interesting cases are moving those _last modified_ columns to
a separate table, requiring an index lookup, or inserting rows into an audit
log.

~~~
fauigerzigerk
_> 2) as the blog post indicates, these are extremely simple triggers that
work locally. The IMO interesting cases are moving those last modified columns
to a separate table, requiring an index lookup, or inserting rows into an
audit log._

Depends on the question you're asking. Benchmarking extremely simple triggers
is the only way to find out whether the triggers invocations themselves cause
significant overhead compared to what you would otherwise have to do from
application code (such as inserting a row into a separate table or populating
two additional columns in the same table).

I think triggers (and stored procedures in general) are more attractive the
more roundtrips they help avoid.

