
Recursive Common Table Expressions in Postgres - jesperht
https://www.citusdata.com/blog/2018/05/15/fun-with-sql-recursive-ctes/
======
emilsedgh
The only problem I have with them is that it takes my brain 30 minutes to
digest the peace of SQL in front of me that uses WITH RECURSIVE clause.

I do whatever I'm supposed to do. 3 months later, a bugfix required, I find
myself at this piece of SQL that I know works but I cannot digest it for
another 30 minutes.

And my case is really simple.

But yeah they are amazing.

~~~
Klathmon
People need to comment SQL more!

We have a few ~200 line long queries in one of our applications. With comments
one of them is 460 lines.

When things get hairy, don't be afraid to explain why you needed this CTE,
what it's doing from a high level, point out that little gotcha that you ran
into a few times while developing the query, why you needed to do X instead of
the simpler Y, etc...

While there are downsides to excessive commenting (the big one is that the
comments can get "out of sync" with the code and just cause confusion), when
it comes to big/complicated SQL, I've found that more is better for the most
part.

~~~
segmondy
You can say that again! I started a DB review team at my company. Not only
queries, but Postgres allows for tables to be commented on, you can place a
comment on table, columns, constraints, domain, triggers, view, pretty much
anything.

Initially a few folks rolled their eyes at that, but now everyone loves it.
When you have thousands of tables, comments help, likewise when you have
massive amount of queries. I'm also in the camp of let the DB do the work if
it can do it faster, and likewise have 100-400 line queries that will turn
into 5x LOC programs which take 20x time to execute. Writing tons of comments
is important. Usually the query is right and when it works, no one remembers
it for a year or two till business rules change.

My approach to commenting such queries after modification is to delete every
single comment and start afresh. From memory, when I'm done. If I can't then I
have no idea what the query was doing which is very bad. Sometimes folks
modify query and get a happy result. If I can document it without reference to
old one, great! After that, I look at the old one to make sure that I'm not
missing anything.

~~~
Klathmon
I tend to take the "rubber duck debugging" approach to writing comments for
big hairy SQL statements.

After i'm done (I tend not to comment these behemoths until at or near the
end), I go and explain what the query is doing, sometimes almost line-by-line,
to a fake "rubber duck" as if it were a junior developer. Even sometimes
commenting what I didn't do, and why I didn't do it. Because as it turns out,
future me is a bit of a cocky asshole that always thinks past-me was some kind
of idiot that must have never thought to try that before...

~~~
rcates
I can totally relate to the future me being a cocky asshole part. I just had
this experience coding the other day. I got about 15 minutes in thinking "this
will be way better", git mv'ing files and committing etc until I ran into the
exact same roadblock and it all came back to me. Now there is a sheepish trail
of commits with messages explaining why I was wrong and it wasn't way better.
I could have force pushed the commit before I started the changes but I wanted
to shame myself so maybe I'll remember next time...

------
shrikant
Fun fact: Recursive CTEs are one of those things where, if you come in
relatively new to them, would make you go "who would ever want to use this,
and why?!"

Funner fact: Once you come across a problem to which the solution is to use a
recursive CTE (handling train movement graphs in an RDBMS? yes please!),
you'll find yourself having to explain your reasoning for it in a manner that
will let you truly grok recursive CTEs.

Funnest fact: Both the above will almost certainly happen more than once in
your data-wrangling lifetime.

(Or maybe that was just me...)

~~~
Amezarak
One common use case I've found for recursive CTEs is generating tables of
dates.

Very frequently, I have a question along the lines of "how many x have
occurred per day/week/month?", where x has a timestamp column, where the data
could be sparse.

One neat way to do this is to create a recursive CTE that begins by selecting
the first relevant date, then unions that with the period (day/week/month)
plus one up to the last date you're interested in.

Once done, you can left join your date CTE against a COUNT aggregation grouped
by the period you care about on your data table.

~~~
barrkel
> _One common use case I 've found for recursive CTEs is generating tables of
> dates._

If you don't have table-valued functions, just create a table with all the
dates for the next hundred years or so. The number of distinct dates is pretty
manageable. This is the approach I use when calculating stats data via MySQL.

~~~
shrikant
I'd second this for dates. Every data warehouse I've been involved in building
has always had a date table for ~100 years depending on the industry use case.

~~~
gmfawcett
This is especially common in data warehouses, as the dimension tables need to
be populated with all of the attributes that a user will be searching/drilling
on (including dates). I don't think I've ever come across a _transactional_
system with a dedicated date table -- you usually have access to a good suite
of date-manipulation operations in SQL -- but maybe there's a case for it.

------
dspillett
Be careful of performance when using CTEs in Postgres: unlike in other DBMSs
they are optimisation fences with regard to predicate pushdown so for some
queries will result in extra scans for every level of call needed.

Doesn't affect all queries of course, and where is does the difference may not
be significant compared to what else is going on (i.e. querying a small
tree/graph structure to pull out some large/complex data), but it is something
to watch out for when working with data of any appreciable size.

~~~
brlewis
It's possible that this note at the bottom of the article was not there when
you made your comment:

 _As a note CTEs at this time are an optimization fence in PostgreSQL, though
there are hopes of that changing in the future. Common Table Expressions are
an incredibly useful tool for reporting. At times the readability of CTEs
outweighs the performance impact, but consider the trade-offs when using them_

~~~
dspillett
It is also possible that I skimmed over that part!

------
shiado
I once built a Reddit clone and used recursive ctes for the comment trees.
Imagine each comment has a parent comment foreign key which might be null if
at top level. If you use a recursive cte you can simply start with top level
comments with null parents and recursively get all of the subtrees and end up
with a nice flat list of all your comments in the order you want. The other
way to do this is to just fetch all the comments and assemble the tree in
memory and then flatten it, which is what I believe many sites do.

~~~
btilly
A third way to do this is described in
[https://en.wikipedia.org/wiki/Nested_set_model](https://en.wikipedia.org/wiki/Nested_set_model).
Its performance for the common case of heavy reads and few writes is better
than either of the two that you describe.

------
whitten
Common Table Expressions (CTEs) are sometimes referred to as WITH clauses.
This article focuses on WITH RECURSIVE. They are useful ways to deal with tree
oriented data.

------
xpil
Why use the word RECURSIVE at all? The engine should recognize the
"recursiveness" of the CTE automatically, at the compile stage. Some other DB
engines have had this for years.

~~~
brlewis
I think it's for humans. Often humans looking through SQL code are doing so
for performance reasons, and a recursive query is fundamentally different from
a non-recursive query. It's something you might want to know while skimming,
before diving deep.

~~~
xpil
I see where you are coming from but the idea does not resonate with me. A CTE
is a CTE, recursive or not. Sometimes you do not care as long as it generates
correct data.

I can see an analogue though: some languages differ Sub from Function although
they are basically the same idea.

On the other hand, Postgres seems to be the only RDBMS that makes the
RECURSIVE keyword mandatory for recursive CTEs.

------
skissane
It annoys me they have made backward incompatible changes to this in Postgres
10, such that recursive CTEs that worked perfectly fine in 9.6 suddenly get
rejected in 10. See in particular the function in this StackOverflow answer –
[https://stackoverflow.com/a/46761197/2147204](https://stackoverflow.com/a/46761197/2147204)
– it works fine on 9.6, 10 rejects it with an error.

~~~
trinitry3
Nothing to do with CTEs. The queries were badly written to begin with and
should have used LATERAL.

~~~
skissane
Yes, I forgot the actual issue was not CTEs directly.

I agree the query could have written better (I am still getting my head around
how to use LATERAL), but it worked fine in 9.6 and stopped working in 10. From
a backward compatibility viewpoint, code working in one version should still
work in the next (even if it isn't the best code.) Or at least, start issuing
deprecation warnings one version before making it not work.

Anyway, posting this to HN has triggered someone to go rewrite my code for me
(thanks Ants Aasma, whoever you are), so now my Postgres 10 upgrade blocker is
solved :)

~~~
ants_a
You're welcome. Wanted to see how hard it is to port a query over.

Postgres generally tries its best to not break users code. However sometimes
it is necessary for making forward progress. In this case the undocumented
behavior of set returning functions within select list had some pretty funky,
mostly accidental, semantics that were getting in the way of executor
improvements. For example try to figure out how to explain the output of these
two queries on 9.6:

    
    
        select generate_series(1,2), generate_series(1,4);
        select generate_series(1,3), generate_series(1,4);
    

That is one example of a silent behavior change between versions that was
justified that applications that are seeing that behavior are probably broken
anyway. Set returning functions within case expressions had more reasonable
behavior so to avoid silent breakage they were made to result in an error.

Deprecation warnings are nice in theory, but in practice they would require an
unreasonable amount of effort to properly implement, not seeing any warnings
still wouldn't be a guarantee that your application works on new version. And
it seems most users ignore deprecation warnings anyway. Besides, it's not like
you can avoid making the changes, you just have slightly less schedule
flexibility on when to implement them.

------
lr4444lr
I never cease to be amazed at how well Postgres maintains decades-old ACID SQL
standards while churning out these innovative advances to the spec.

~~~
Amezarak
Postgres is great, but I don't think there is a serious rdms that doesn't have
recursive ctes.

~~~
twic
Does Oracle have them now? There was a time when it had some janky CONNECT BY
syntax of its own.

~~~
Zardoz84
Oracle >= 10 have it. CTEs are inside of SQL standard.

------
amelius
How do you pass parameters to the recursive function?

~~~
kqr
I wouldn't call it a function as much as a temporary table defined in terms of
itself. It is parametrised the way all CTEs are, by giving arguments in
parentheses postfix.

