Hacker News new | past | comments | ask | show | jobs | submit login
Making a Postgres query 1k times faster (mattermost.com)
228 points by d0mine 6 months ago | hide | past | favorite | 60 comments



OR Queries are performance killers because they often force a full table scan. Another alternative to the Or is actually a UNION.

A UNION allows you to use two separate indexes and can speed up queries.


If at all possible, use a union all rather than a plain union to avoid an extra sort / unique node.

I've used that OR > UNION ALL trick a number of times to vastly improve performance on specific queries. It's crazy how much of an effect it can have.

I wish Postgres would implement a planner optimization to automatically run queries with an OR more efficiently (e.g. use the same plan as with a UNION/ALL where possible).


I have a vague memory of this exact optimisation being one of the things Hibernate did automagically, about 10 years ago. I remember doing a fair bit of reading to figure out how & why it was so much faster.


I was able to implement this in the ORM layer of Django through a function that would inspect the query and split it if it had an OR under certain conditions.


Is this a django extension? Can you send a link if this is public code? I'd like to see if it speeds up anything on my app.


What do you suppose prevents the query planned to do the same thing?


I remembered reading a thread on the hackers mailing list about this starting probably 6 or 7 years ago... Nothing ever came of it if I remember right.

Found it: https://www.postgresql.org/message-id/flat/7f70bd5a-5d16-e05...

I just did some more digging, and it looks like there has been some more recent discussion in a different thread about the same topic though: https://www.postgresql.org/message-id/flat/567ED6CA.2040504%...


While OR can be performance killers, in other cases it works perfectly fine, or sometimes even better than the equivalent UNION ALL query. In this case the problem is that they are applying a LIMIT with ORDER BY so it can’t do the usual BitmapOr optimization and thus the index ends up being used purely for sorting purposes.


+1. If you really understand your queries (which, if your queries are complicated enough that you already don't use an ORM, you should), then my favorite set of "tricks" are CTEs to collect info and convert/translate as necessary, then unions to merge/filter said data from the CTEs. Add aggregate/window functions to this data, and you are good to go.

You can really do some complicated pipelining of data cleaning and filtering this way, much faster than resorting to a whole bunch of complicated inline SQL work in the clauses.

[edit] I should add, learn how to read an explain plan, this is vital for understanding and improving your queries.


This seems like the type of thing that a sophisticated query planner in 2024 would be able to figure out on its own? Maybe as a non database expert I'm expecting too much?


There were attempts to do this automatically ~7 years ago (https://www.postgresql.org/message-id/flat/7f70bd5a-5d16-e05...). It didn't actually make it in due to worries about the approach and properly de-duplicating the final resultset, as well as that approach to the optimization not playing well with duplicate inner-join removal optimizations (which were also being discussed at the time).

There have been more recent attempts to improve OR clause performance in another thread: https://www.postgresql.org/message-id/flat/567ED6CA.2040504%...


This can be mitigated somewhat by not having ORs at top level.

Of course UNION [ALL] does often yield better performance, yes.


FTA: “Why the query planner doesn’t automatically convert a condition like x > a OR (x == a AND y > b) to (x, y) > (a, b) is something I still don’t understand to this day, though.”

There’s a difference between the two. https://www.postgresql.org/docs/current/sql-expressions.html...:

“The order of evaluation of subexpressions is not defined. In particular, the inputs of an operator or function are not necessarily evaluated left-to-right or in any other fixed order.

Furthermore, if the result of an expression can be determined by evaluating only some parts of it, then other subexpressions might not be evaluated at all. For instance, if one wrote:

    SELECT true OR somefunc();
then somefunc() would (probably) not be called at all. The same would be the case if one wrote:

    SELECT somefunc() OR true;
Note that this is not the same as the left-to-right “short-circuiting” of Boolean operators that is found in some programming languages.”

https://www.postgresql.org/docs/current/functions-comparison...:

“For the <, <=, > and >= cases, the row elements are compared left-to-right, stopping as soon as an unequal or null pair of elements is found.”

So, this is using something akin to short-circuiting.

I think that can mean the two give different results in the presence of null values.

⇒ I would guess the optimizer isn’t smart enough to detect when the second, stricter query will be equivalent and faster.


I'm gonna save you 15 minutes.

The author filtered with:

    CreateAt > $1 OR (CreateAt = $1 AND Id > $2)
That can't use one index for both conditions. Instead, make it faster and simpler:

   (CreateAt, Id) > ($1, $2)
End of post.


I think a nicer takeaway is what the author summarized with:

1. Always use BUFFERS when running an EXPLAIN. It gives some data that may be crucial for the investigation.

2. Always, always try to get an Index Cond (called Index range scan in MySQL) instead of a Filter.

3. Always, always, always assume PostgreSQL and MySQL will behave differently. Because they do.


Agree it's well-known, in a slightly snobbish sort of way, but it could do with being more widely known for sure; it is a bit of a gotcha, and IMO TFA is quite a nice tutorial on how they went about it, investigating the issue and determining the problem etc., perhaps especially if the answer as you wrote isn't already somewhat familiar, the explain analyze -err-explainer is quite nice. Not reference material, but as an intro, hey this is how you might be able to debug a similar problem you encounter.


Is that logically the same filter syntax? I'm not familiar with the second construct. If creatat is equal to $1, does it compare the second argument?


Yes


Thanks. Too late though, I already read it.

The article felt like it was fumbling around when the initial explain pointed right at the issue. I didn’t know this specific trick, so I did learn something I guess.


That looks like a trivial optimizaiton / transformation. I am still quite puzzled how much work has gone into compilers, static code analysis, guided compilation (naturally comes w/ JITs, too). Yet, database optimizers of pretty much any database look rudimentary and unreliable. Sometimes I think they got stuck in the '90s and that's it.


Same with debugging and error messages. At least it feels that way to me.


That is a bit wrong and I'm pointing it out because your summary logically didn't make sense to me. So I did have to read the blog post after reading your comment.

The query should be CreateAt > $1 OR (CreateAt = $1 (NOT $2 like in your sample) AND Id > $2)

So the idea is here about paginating through posts that might be constantly changing so you can't use a simple offset, as that would give you duplications along the way. So you try to use CreateAt, but it could be possible that CreateAt is equal to another one so you fallback to ID.

But here I stopped reading the blog post, because I now think why not use Id in the first place since it also seems to be auto increment since otherwise you couldn't really rely on it to be a fallback like this? I don't have time to investigate it further, but tldr; that still left me confused - why not use ID in the first place.


It seems like the code is used both for indexing from scratch and for maintaining an existing index. In the case of maintaining an existing index you need to know where to start back up and Postgres doesn't guarantee that SERIAL primary keys are always in chronological order when there are multiple simultaneous sessions.

https://dba.stackexchange.com/questions/266405/does-ordering...


But if you are doing indexing, why do you necessarily care that it's specifically chronological?

Or besides that, if there are odds of CreateAt collision, and you are fallbacking to ID, you are still possibly not getting it chronologically?

And also if CreateAt does happen to equal to another record that is exactly the case where Postgres might most likely not have the auto incr chronological.

So still it seems like the edge case it tries to prevent it would still happen at least at similar magnitude of odds.


> But if you are doing indexing, why do you necessarily care that it's specifically chronological?

edit: on second review if live insertions were occurring then this code would have an edge case, however the indexing job has an endtime presumably chosen where they can be sure no more inserts will occur. Given that the choice to use a timestamp probably has to do with the fact that there are 4 different tables being indexed and you would otherwise have to track their IDs individually.

original:

    id, timestamp
    1 , 1000
    2 , 1001
    4 , 1002
    --------- limit stops here
    5 , 1002
    3 , 1003
ID only: If we use ID > 4 as our start point for next time and ID 3 was not inserted yet then we will have missed it

createdAt only: If we use createdAt > 1002 then we will skip ID 5 next batch

OPs strategy: Even if we use createdAt > 1002 and it skips ID 5 it will be caught by createdAt = 1002 AND ID > 4. The Order by createdAt asc, id asc guarantees that if the limit splits 4 and 5 that we see 4 first and thus don't miss 5. I think this does still miss the case where ID 4 is inserted after ID 5 however.


> I think this does still miss the case where ID 4 is inserted after ID 5 however.

Yeah, and I would think that is very likely to happen given it would be the same timestamp.

So the whole thing still seems like a flawed, and unnecessarily complex solution to me, which should just use one simple unique sortable field to do all of it.

Like my intuitive guess is that maybe the solution could save maximally 20% - 40% of the same edge case, which doesn't seem like a good solution. It is not going to solve the problem. It's just adding complexity that can cause other problems.

So if postgres does the type of caching where it allocates 10 auto incr IDs to each process, which causes sometimes IDs being out of order, then normally it would be just enough to wait after these allocations have performed and index then, you are not going to miss any rows.

I would assume these processes have some form of timeout if there was a case where they couldn't assign one of those IDs, and then this ID would just maybe not exist or if there was a mechanism to reallocate that would work too, but none the less, I think some form of postgres sortable unique id would have to work by itself.


Ha, that's a good point. Devil's advocate though, and because perhaps maybe they changed it slightly for the example in the blog post, it could be that 'CreateAt' is more like 'PublishedAt', i.e. doesn't include a possible draft period, but then id (which does correspond to actual record initial creation time, obviously) is as good as anything to disambiguate them - because it's actually arbitrary, only needs to be consistent at that point.


This is supposed to be a solution to a rare edge case though, and it seems to me then that there would still be some rare edge cases where the timestamp would mismatch with the order of the ID and so still cause the very edge case it was supposed to avoid?

Because it's ">" it might be missing that one record during what I think is pagination.


They could be using something like application generated time sortable UUID7s or some other sortable key

https://uuid7.com/

[edit] CreatedAt timestamp could be something from the client when the post is submitted or tagged from the ingest server and not when they actually are processed and hit the database


For sure, but still why not use that in the first place?

And I agree I shouldn't have said "auto increment".


It depends on the resolution of the timestamp. With 1000s of posts a second coming in, the fallback would be that it is at the same "timestamp" and then we fallback to a greater identifier. My guess is that the resolution of the timestamp allows for more efficient postgres index usage then the id index which my guess is larger and a string so not as efficiently searchable as the underlying identifier

(my guess is time based index offer faster search performance or lower overhead than a string based search index which doesn't understand it is representing encoded time data)


I'm only speculating here, but it doesn't seem right to me at all to end up with a solution like that. I don't think there would be a reason where this ID would perform so bad as an index that it would be worth what I would consider odd reasoning and tech debt if I'm understanding things correctly. And clearly the query is tech debt because now it did cause such an issue. But also I think current solution seems like tech debt to me, having high odds of causing issues in the future, if the DB driver was to be changed or something about the way DB resolves indexes was to change, etc.

And with fallback they would end up reusing that index anyway.


This could also come up easily if generating queries to back a paginated UI, where the user can choose a sort order by user-meaningful metadata. So, the user might ask to sort by CreatedAt and then the app/service would need to add the Id tie-breaker.


That is possible, although I would say there are other potential challenges there in this case, and arguably there should be a specific index for that case, and a lot would depend on what exactly is being sorted, how frequently new rows are being inserted, etc.

But here the case should be batch indexing, processing, so it seems like auto incr with a timeout of assignment if those auto incr ranges are cached would still be suitable.

Like as I understand the problem, there is one service (ElasticSearch) that is working on indexing, and it's getting batched rows from postgres, to then index, but make sure at the same time to not miss any in those batches. And it's fine that it doesn't immediately at this second or minute do the indexing, so it should be fine to wait for the IDs to have been allocated.


Thanks, I fixed that typo.


Not at all surprised that Laurenz Albe played a key role in figuring this out -- I've learned so much about postgres from his Stack Overflow answers over the years.


My question after a brief look - why not just use ID instead of CreateAt?


That was my initial thought as well. I'm hoping the original author reads this and can explain, because surely there must be a reason...?


I think because ID is random and not monotonically increasing.

From article one of them is: 'tpomh9yu1tffmdp6dopobwuc9h'

But the query is trying to page through the most recent posts first.

So a newer post doesn't necessarily have a higher ID or lower ID.

The ID is just there to have a stable tie break between the same timestamps.


One branch of the OR is comparing IDs as though they're monotonically increasing, so either they are or the query is broken.


I think it's a reasonable implementation. It is only used to separate rows with the same created timestamp. This query is run some time after the insert has happened, so it is unlikely that there have been more rows with the same timestamp at query time.

If this assumption is correct, then first sorting by the timestamp and then sort the id alphabetically will ensure that the pagination is deterministic.

I guess we could consider some edge cases where we have lots of rows with the same timestamp that is inserted after the ingestion query is run due to latency, but it might be acceptable for this use case.


In this case they should switch to uuidv7 or similar to have this guarantee


Switching IDs is going to be a breaking change across their whole product and customer base.

Thats a way bigger undertaking and decision then just optimizing a single query.

Like killing an ant with a Tsar Bomb.


One alternative could be to add a separate column that can be used to keep track of insert order. But they would need to consider the costs before they do this, as it could impact the insert performance.

The true "modern" cool kids solution would of course be to create a service that listens to WAL, inserts it into a Kafka cluster that is connected to a pipe for inserting into Elastic search. Much more fun and resumé friendly than optimizing a query. I bet the author of this blog would get a much larger audience.

Bonus points if the ingestion service was written in Rust and run on a serverless platform


Actually it's not a breaking change.

You can switch the uuid generation, wait a little bit and add an.'archive' switch which will use the old and slower query when the date is old.

Should definitely be helpful resource wise for a lot of people


Yes and sooner this gets started the better.

Otherwise it will cost more and more.

Especially for a chat app!


> WHERE (Posts.CreateAt, Posts.Id) > (?1, ?2)

Did not know you could do "tuple" filtering like this


Given Post.ID is monotonically increasing and this is a cursor-based pagination query, can this WHERE clause be effectively rewritten as:

WHERE Posts.CreateAt >= ?1 AND Posts.Id > ?2 (where ?1 and ?2 are the current cursor values)


I generate queries like this in a paging library I wrote at work, I did not use the tuple comparison because sometimes the sorting order is different between fields, each field in the tuple comparison would need a different operator, such as

WHERE Posts.CreateAt > ?1 OR (Posts.CreateAt = ?1 AND Posts.Id < ?2) ORDER BY Posts.CreateAt ASC, Posts.Id DESC

I now wonder if it might be possible to use a negation operator to invert the sorting order, like:

(Posts.Id, -Posts.CreateAt) < (?1, ?2) ORDER BY Posts.CreateAt ASC, Posts.ID DESC

and keep the performance improvement.


Hunting and solving bugs is good but there's a better alternative: having pipelines that run performance regression tests before commits are merged in master branch.


Don't have a postgres under the hand, would adding a AND condition WHERE CreateAt >= $1 AND (CreateAt > $1 OR (CreateAt = $1 AND Id > $2)) workaround the query planner limitations as well?


But > CreateAt > ?1 OR (CreateAt = ?1 AND Id > ?2)

Is not the same as:

> CreateAt, Id) > (?1, ?2)

Because it is like that:

> createdAt > ?1 OR (createdAt <= ?1 and Id > ?2)

Or am I wrong?!


Great post thanks for sharing this “from the trenches” it’s always a joy to read these deep dives! Thanks


Maybe I'm sleepy, but what is doing elasticsearch in that post?


consuming the data returned by the query


This post is from a few days back.

Does mm not have the stats table active and check it sometimes for this kid of issues?


[edit: holy downvotes folk, the referenced 'olliej' user is me - I'm not shitting on others]

Based on the last time I did anything involving databases, I think the easy way to do this is "find a query written by someone like olliej, and then fix the clear and obviously stupid choices" :D


To your edit:

Sure, but if you are going to be self deprecating, at least provide some example content of the hilariously bad queries you have written and how you have been chastised for them.


TL;DR:

Step 1: use the index

Step 2: see step 1


I wonder if using an ORM wouldn't actually be better than writing queries.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: