
One word in PostgreSQL unlocked a 9x performance improvement - vinnyglennon
https://jlongster.com/how-one-word-postgresql-performance
======
oefrha
Long as hell post with a clickbait title documenting how the author found out
(1) one can insert multiple rows with a single INSERT statement; (2) the
RETURNING clause. Spend your time elsewhere if you’re already aware; maybe
pick up a SQL textbook if you aren’t...

This is not an uncharitable read btw. Actual quotes:

> The first thing I learned was how to insert multiple rows with a single
> INSERT statement: ...

> Scouring the docs I discovered the RETURNING clause of an INSERT statement.

~~~
aaomidi
What SQL Textbook should I read?

~~~
wiml
Once you have a basic understanding of SQL, I found it really informative to
go closer to the source, e.g. C. J. Date and Hugh Darwen's "A Guide to the SQL
Standard", or perhaps other books by them.

There's a lot of incomplete and misunderstood information floating around in
blog tutorials or potboiler textbooks.

~~~
aaomidi
This sounds great!

------
malisper
I've personally experienced cases where making a slight change to a query in a
non-obvious way resulted in a 10x or more speedup.

My personal favorite is I was generating a query that pulled out a dozen
different fields from a large JSONb column. Naturally, you would think
Postgres would read the JSONb field once, then pull out the individual fields
from it. Instead, Postgres was reading the JSONb column once per each field. I
figured out this was the case because the number of blocks read from EXPLAIN
(ANALYZE, BUFFERS) went up proportionally to the number of fields I extracted
from the JSONb column. Reading the JSONb column was especially expensive
because Postgres needed to deTOAST[0] the JSONb column.

The obvious fix is to write a subquery to read the entire JSONb column and
then have the outer query extract the individual fields, but that doesn't
work! Postgres will inline the subquery, basically undoing your attempt to
prevent the unnecessary accesses. In the end, the solution wound up being to
add OFFSET 0 to the end of the subquery. That doesn't change the semantics of
the query, but it does prevent Postgres from inlining the JSONb column access.

[0] [https://malisper.me/postgres-toast/](https://malisper.me/postgres-toast/)

~~~
ris
Something you'll discover when you do enough query optimization is that
postgres' query planner isn't _clause-order_ invariant. i.e. a AND b won't
necessarily give you the same query plan as b AND a.

My first reaction was that this was awful, but the more I thought about it,
the more I was grateful that postgres (accidentally) gave me this knob to play
with. Optimization of complex queries is a very tricky thing, and most engines
won't do a good job of it 100% of the time. What I realized would have been a
_worse_ situation is if postgres sometimes picked a bad plan and there was
little I could do to avoid it without reforming the query (a pain when you're
generating the query with a query compiler already). Reordering the clauses
effectively allowed me to ask the planner to roll the dice again.

Following this, I may or may not have gone on to build a cache for the system
that timed queries and kept notes of "good" clause orders for common queries,
resorting to random ones otherwise....

~~~
shoo
A variation of this is that the query planner may produce better plan if you
add additional redundant constraints to the query that are logically implied
by the constraints that are already there.

Personally, it feels like a bit of a fragile mess trying to trick a sometimes-
clever-sometimes-dumb optimiser into doing what you want by subtle indirect
hacks - because the interface doesn't give you a way to directly override bad
automated optimiser decisions.

~~~
ris
> a fragile mess

Oh, I'm not saying it wasn't a fragile mess...

Seriously I actually considered the fragility of it to be a positive,
precisely because it wasn't a hard "optimizer decision" that I was forcing. If
you _force_ something like that, you've got to take full responsibility for it
- the planner will lose all intelligence over a certain decision and no longer
do the clever thing query planners do and take the current distribution of the
database's _data_ into account to allow it to make better decisions. When you
upgrade the database and the query planner gets smarter (or dumber), you've
got to re-evaluate whether that's still the right decision. I could imagine an
app with a bunch of out of date hard-forced decisions to be crippling
performance wise.

A look-aside of automatically deduced hints with a validity of less than a
week seemed like a much lighter touch.

~~~
shoo
Totally get what you're saying. Ideally there's a way for user to supply
additional information or constraints to the optimisation process which are
used to influence the results without turning the optimisation process off
completely. Although any scheme of doing that could, as you say, produce poor
results if the distribution of data changes over time or the db planner code
is changed.

I used to work on non-database decision support tool that incorporated a
custom optimiser which was used to spit out crude engineering designs for a
particular kind of construction problem. The optimisation problem was
difficult & the implementation to solve it was not state of the art: there
were a few preprocessing stages that were used to lock in some early decisions
using heuristics -- which helped massively reduce the search space, then a
global optimisation approach was run on the remaining sub problem. The result
was that the overall algorithm would locally optimise after perhaps locking in
a bad early decision. Once the software was delivered to the client and in use
by a small team of users I later discovered that the users had figured out
that by running the software repeatedly with very small adjustments to input
parameters (adjustments that should not obviously matter) they could bump the
optimiser into outputting wildly different designs. It was a little bit like
repeatedly pulling the handle on a one armed bandit until it eventually gave
you a decent output.

It was clever of our users to figure this out but the overall UI/UX was
appalling, they had to click and re run a somewhat slow batch process until it
produced a reasonable result. It would have been much better to give the users
a user interface where they could directly override or constrain parts of of
the engineering design problem in an ergonomic way, and then let the
optimisation algorithm loose to make the remainder of the decisions.

------
mey
To the author, if timestamp is only a timestamp, that will not guarantee
uniqueness. It may be sufficient for your use case, but unless there are other
constraints on your system design you may loose data (since they will be
treated as duplicates and dropped).

If timestamps are just wall clock timestamps. You should be looking at the SQL
type Timestamp (without timezones) as putting a primary key around text fields
is not optimal. Not 100% on PostgreSQL, but I would look at moving timestamp
to Timestamp and group to varchar.

~~~
raihansaputra
The author has a conference talk talking about CRDT and the timestamp
technique he uses. Really useful if anyone else wonders what's the problem
with wall clock timestamps.

dotJS 2019 - James Long - CRDTs for Mortals (20:44):
[https://www.youtube.com/watch?v=DEcwa68f-jY](https://www.youtube.com/watch?v=DEcwa68f-jY)

EDIT: btw, out of all the comments on the timestamp here, seems like only
yours is not complaining about the format, and even points out that the
article did not go in depth yet to specify the actual timestamp that he uses.

------
ohgodplsno
"Inserting in a loop is a bad idea" would have been a more real title.

The moment your insertions can be done in a loop, _batch them_. Batch them,
batch them, batch them. I can't stress this enough. Batch them.

Do keep in mind that your DBMS of choice can have a maximum amount of bound
variables (SQLite has 999 maximum by default if I remember correctly. It can
quickly become an issue.) If you fear that this might happen, split into
multiple batches, batch those batches.

> timestamp TEXT

> PRIMARY KEY(timestamp, ...)

Dear god no. Make it an actual `timestamp` type, or an int if you can't.

~~~
nesarkvechnep
I'm 100% with you on everything you said. It's too bad I've seen many
JavaScript devs do queries in loops. Queries that can easily be batched. It's
interesting for me what's the cause for this low level of relational database
understanding.

~~~
breatheoften
This is not a javascript only phenomenon...

I regularly encounter code running off of performance cliffs like this in
rails application i'm working on -- and it's incredibly difficult to climb the
application out of these cliffs ...

~~~
nesarkvechnep
Yes, it's a problem in rails too. My current project is a rewrite (in Node) of
big rails monolith which is full of synchronous http requests and SQL queries
in loops. The sad thing is senior Node developers just port the code, seeing
no problems with the aforementioned be.

------
mapgrep
This (or some of it) seems like “shot in the dark” engineering where the
author finds an error condition and instead of doing the work of finding root
causes simply twiddles some values until things work.

For example, a 21MB upload with a 72MB INSERT statement (100k messages
inserted) fails but a 5MB upload with 30MB INSERT statement (40k messages
inserted) works so author just limits to the smaller values and calls it a
day. But clearly the new limits are still near the edge of the performance
envelope and without knowledge of root cause of the error how do they know the
error won’t resurface under different conditions (more load on server, less
ram allocated to db, congested network, fuller disk)? How do they know a
config change from an upgrade or from a fix to another problem won’t lead to
another failure?

It saddens me to know in my gut that a lot of what passes for engineering
happens this way. This is not proper debugging.

~~~
alexbanks
Training and culture are probably the biggest failures here, I think. Most
devs are only equipped to add many print statements to things in terms of
debugging, and most companies/dev leads/managers are very comfortable allowing
arbitrary half-checked solutions to bugs or performance issues instead of
actually understanding the problem.

~~~
slimsag
> and most companies/dev leads/managers are very comfortable allowing
> arbitrary half-checked solutions

Not really true - they just don't have _any_ grasp on the problem aside from
"Yeah I thought you were working on that last week, are you stuck on it? Why
is it taking so long? - Oh so you do have a fix that works, cool so how soon
can that be merged?"

Almost all of these types of situations lead back to miscommunication or
someone in the discussion just not caring.

~~~
alexbanks
Yeah I disagree with your statement a lot, but YMMV.

------
IgorPartola
What is the purpose of storing the messages in the Merkle trie that is stored
in the database along with each individual message. It seems to me that the
trie here is the bottleneck, not Postgres. And am I reading it correctly that
the entire trie is stored as a single blob and has to be modified and written
back in its entirety? The whole architecture seems suspect and like its
performance is going to be terrible.

And yes multiple insert statements are better than individual ones and yes
getting the IDs back is good. Different RDBMSs will handle this differently
but batching means higher throughput at the cost of some latency which is
often a worthwhile tradeoff.

------
gfody
you might unlock another 9x and fix that crash by using copy to load your
messages instead generating arbitrarily large insert statements, and maybe a
little more still by using appropriate data types instead of text for
identifiers and dates.

~~~
murkt
It is much faster than INSERT indeed and I’m using it in places where I need
to insert a lot of data into Postgres. Can you get RETURNING from COPY,
though?

~~~
Smaug123
You can't RETURNING from a copy, but you can `CREATE TEMP TABLE foo ON COMMIT
DROP`, `COPY` into foo, then `INSERT INTO bar RETURNING` from the temp table.

------
dpratt
The usage of a timestamp (stored as a TEXT column!) as a primary key sets off
every single alarm bell I have in my head. This is a shockingly poor decision,
and a terrible way to enforce semantic uniqueness.

~~~
jacques_chester
You needn't worry, friend, you can just ON CONFLICT DO NOTHING. That's how
financial records are meant to be kept, I believe.

------
tobyhinloopen
Wait until he learns about using joins for 30x better performance compared to
n+1 queries

------
granitepail
I wonder how the naive timestamp approach works when multiple users are
syncing at the same time?

------
trashburger
Is "merkle trie" the correct spelling? I always thought "Merkle tree" was how
it was spelled.

~~~
WJW
There are both kinds. A trie is a prefix tree, which has benefits for space
efficiency when you want to store values with a lot of common prefixes.
Ethereum uses a mix of a Patricia Trie with Merkle Tree to get a combination
of both: [https://medium.com/codechain/modified-merkle-patricia-
trie-h...](https://medium.com/codechain/modified-merkle-patricia-trie-how-
ethereum-saves-a-state-e6d7555078dd)

OTOH, the wikipedia article they link to seems to imply a regular Merkle tree.
I guess it's not possible to really tell without knowing which Merkle
tree/trie library they use.

------
Smaug123
The correct way to do this, I believe, is to `CREATE TEMP TABLE to_insert ON
COMMIT DROP`, then `COPY INTO to_insert` and `INSERT INTO target SELECT * FROM
to_insert RETURNING`. This gives you all the perf benefit of `COPY` while
still giving you the ability to use `RETURNING`.

~~~
chinhodado
Can you explain more about COPY? I never used it before and would like to know
what its benefits are.

~~~
swaranga
COPY api in Postgres avoids a lot of the query parsing and other stuff that
takes up cpu time. I have used it in production and have seen great results.

There are cases where the COPY api is not sufficient. For instance it cannot
do the ON CONFLICT IGNORE thingy and instead fails the call. In that case we
fallback to INSERT ON CONFLICT IGNORE - since in our application we do not
expect too many duplicates this works well for us.

------
furstenheim
There are nicer ways to do batch insert in postgres.

With 'jsob_to_recordset' there is only one parameter which is an array and it
gets queried as a table. That has the advantage of avoiding different queries
for different inputs (it can be prepared only once, it's harder to commit
mistakes...). Also it doesn't have the overhead of creating a huge string.

Another option is copy to. Using pg copy node [https://github.com/brianc/node-
pg-copy-streams](https://github.com/brianc/node-pg-copy-streams) you can
insert/query as much as you want from/to node streams without memory
footprint. I've used it to pipe millions of records straight to an http
response (to download a csv)

~~~
chinhodado
Can you elaborate please? I would like to learn more about this.

~~~
furstenheim
Hey, I just saw your comment.

For the jsonb `INSERT INTO some_table (a, b) SELECT x.a, x.b FROM
json_to_recordset('[{"a":1,"b":"foo"},{"a":"2","b":"bar"}]') AS x(a int, b
text);` That one is super flexible and it can handle thousands of records.
Where you have the array it would be $1 and pass it as a parameter.

With streams it's trickier, probably you need to create a temporary table to
pipe everything and then another query to insert where you want. Also you need
to deal with async errors which is a pain. But you have 0 memory load. I've
used it to insert/select millions of records ``` pool.connect(function(err,
client, done) { var stream = client.query(copyFrom('COPY my_table FROM
STDIN')); var fileStream = fs.createReadStream('some_file.tsv')
fileStream.on('error', done); stream.on('error', done); stream.on('end',
done); fileStream.pipe(stream); }); ```

------
Southland
Why is the timestamp field text...

~~~
knorker
Why is group_id?

------
baq
The number of times people I work with would greatly benefit from reading the
_table of contents_ of our DBMS manual (...me included) is actually
staggering.

Read the fine manual of your database. Then read it again. Chances are you’ll
be very surprised.

------
akresic
Full disclaimer - Timescale person here - but your use case seems like an
example of what a time-series database is for, and given that TimescaleDB[0]
is implemented as an extension to Postgres, it's a real obvious fit.

It's designed around the idea of time-series data. It should give you better
insert and query performance since it's geared towards that workload. Aside
from that, you could even reap the storage benefits using native compression
that it supports.

[0] [https://www.timescale.com/](https://www.timescale.com/)

~~~
jacques_chester
The data is apparently for bookkeeping of some kind. Not really a time series
in the usual sense. In fact relying on timestamps for uniqueness in an OLTP
situation is optimistic bordering on problematic.

------
smt88
It's astonishing how poorly the author understands his tools, both relationals
databases in general and Postgres specifically. There are some "discoveries"
here (both ones that he made and others that he needs to be told about, like
how bad of an idea it is to use timestamp as a primary key) that would be in
the first few chapters of a Postgres book.

------
ptrwis
> A much bigger problem is #2. We need the result of the insert to know if a
> row was inserted or not.

You can write a trigger on insert and make second insert there, or enclose
first insert into CTE and then make second insert selecting from the first one

------
orangepanda
What is the order of rows returned by insert ... returning? Going over docs it
doesnt seem to be mentioned.

This would be nice to retrieve auto generated primary keys when doing bulk
inserts. Can this be done deterministically?

~~~
murkt
I would be very surprised to learn that RETURNING order is different from the
order of rows in INSERT statement.

Some ORMs (at least SQLAlchemy) do that already, they can fill primary keys in
your objects during single inserts, and in dicts during bulk insert.

~~~
grzm
Don’t rely on the order: its implementation dependent. Results are sets of
rows; unordered.

~~~
murkt
What do you mean by “it’s implementation dependent”? RETURNING is a Postgres-
specific clause, so of course its order is implementation dependent. There is
only one Postgres, as far as I know.

~~~
tpxl
There are many versions of one Postgres :) If it's not defined in the
docs/spec, it may change in the future (without notice).

------
jlgaddis
The blog post doesn't give me much confidence in _Actual_ (whatever that may
be).

------
lmilcin
Think of it the other way: one small mistake caused 10x performance
degradation.

How is that sounding now?

------
ccleve
Yes, inserting multiple values into Postgres really does work, but beware of
inserting too many at once.

I started generating insert statements of about 200k in length and it caused
Amazon Aurora to start dropping connections.

If you insert 10 or 100 values, you'll get a 10x or 100x speedup, which is
usually good enough.

------
meigwilym
But do DBAs hate him?

------
rory_isAdonk
Dear Lord why do we upvote this thinly veiled corporate sales pitch.

------
the-dude
I wonder if wrapping it in a transaction would have resulted in the same
improvement.

~~~
malisper
They do wrap it in a transaction, they just omit the code for it:

> This is mostly the real code, the only difference is we also rollback the
> transaction on failure. It's extremely important that this happens in a
> transaction and both the messages and merkle trie are updated atomically.

------
icheishvili
I know Ethereum uses Merkel Trees to store state but this app already has
Postgres and I imagine that it would be better architecturally to not write a
big blob data structure back and forth. WITH RECURSIVE queries and the right
table layout could make this entire thing way simpler to operate.

