
Recursive SQL Queries with PostgreSQL - MartinHeinz
https://towardsdatascience.com/recursive-sql-queries-with-postgresql-87e2a453f1b
======
seanhunter
It's worth throwing into the mix that CTEs can be an elegant solution to more
than just recursive queries. I've used them to "refactor" out common parts of
queries where the same subtable is referenced multiple times in the same
query.

So instead of

    
    
       select something from x join y on x.foo = y.foo and blah = 'cheese' where some_complicated_join_conditions ...
       union
       select something from x join y on x.foo = y.foo and blah <> 'cheese' where -ome_other_complicated_join_conditions
    

...you can have

    
    
       with nerk as (
         select something from x join y on x.foo = y.foo
         where
           hopefully you can find some common part of the complicated where clauses
       )
       select something from nerk where blah = 'cheese' and something
       union
       select something from nerk where blah <> 'cheese' and something_else
    

...obviously a trivial example but you get the idea. When I tested this on
postgres (some time ago) this was very significantly faster for my use case.

It also helps in use cases where you want to allow user-defined filtering and
ordering logic controlled by a gui. You can put your basic query as the CTE
and then have all the user-controlled bits and pieces in the final select and
they don't get mixed up together.

~~~
cbcoutinho
I do this as well and consider CTE's just a macro that is expanded when
executing the query - similar to an inline view.

I've run into issues before with systems where the result of a CTE is so large
that it is actually advantageous to instead place it into a temp table to
avoid re-fetching the data at each invocation. Sometimes that can be better
than a CTE for this use case - YMMV.

~~~
jimktrains2
While the constraint was recently relaxed somewhat in postgres 11 or 12, it is
worth noting that a cte can be an optimization barrier. This means that a cte
isn't just a cookie cutter replace operation. Usually this is ok, especially
if the cte result set is on the smaller side, but it's just something to keep
in mind.

~~~
seanhunter
Optimization in postgres (like all databases I've used) is super counter-
intuitive and baffling at times and the only real way to be sure of anything
is to try it yourself on your own data and see (and repeat the test whenever
you do a major db upgrade). As an example which may well have changed but
definitely was the case the last time I tested (which is circa pg9) it used to
be the case that using ANSI join syntax would make a query 5-10% slower. ie

    
    
       select a.foo, b.bar
       from a, b
       where a.b_id = b.id 
       and a.something ='blah'
    

would be reliably 5-10% faster than

    
    
       select a.foo, b.bar
       from a
       join b on a.b_id = b.id 
       where a.something ='blah'
    

even if everything was indexed etc.

Conversely, projecting off unnecessary columns used to result in a big
speedup. ie

    
    
       select a1.foo, b.bar
       from 
          (select foo, b_id from a where something='blah') a1
       join b on a1.b_id = b.id 
    

used to be significantly faster than either of those queries on wide tables.

edit: few typos and additional example added

~~~
jimktrains2
I'm not disagreeing, but it also helps to look at the EXPLAIN output
(including things like width and rows) for queries to better understand what's
actually happening. That helps build intuition, but like you said, major
version releases can change that behavior. Postgres is usually pretty good at
documenting those changes, though.

~~~
seanhunter
100% agree.

------
Wicher
Recursive CTEs are sublime, not only for orthodox trees: combined with
XMLTABLE you can even use it for recursively descending XML, normalizing it
into tuples of ([array, of, tag, names], value). Then you can join on that
table/view using simple and easily composable autogenerated SQL expressions
instead of Xpath. And you can string-join the element path with '/' to get an
Xpath expression that would get you a value from the source XML, if you need
to! I use this a lot to work with XML, even exposing it through an ORM
(Django's in my case). You can do a similar recursive descend on JSON.

------
st_goliath
SQLite also supports recursion:
[https://www.sqlite.org/lang_with.html](https://www.sqlite.org/lang_with.html)

I'm not sure if MySQL does _now_ , but about a decade ago when I did my
bachelors, in the database course we switched from doing exercises in MySQL to
PostgreSQL when recursion came up. Our local university has a database
research group headed by a professor who has a fetish for recursion, so we
spent a good chunk of time shoehorning graph and tree problems into SQL. As
the story goes, thanks to his constant pestering, DB2 supports quadratic
recursion.

~~~
Izkata
Definitely measure the performance with sqlite though. I recently found one of
my recursive queries had really bad performance compared with using a
temporary table, because the CTE didn't create an automatic covering index.
Had to separate it out into two queries (which changed it from something like
.3s to .009s + .001s).

~~~
SQLite
If you will kindly send your problem query to the SQLite developers, we will
do what we can to improve the query planner to make it run faster.

~~~
Izkata
Posted (and reran the numbers with some unnecessary transformations removed):
[https://sqlite.org/forum/forumpost/e4091b433a](https://sqlite.org/forum/forumpost/e4091b433a)

I may have misremembered since I did go a bit wild on this query (done while
learning about recursive queries), but pulling the subquery into a CTE doesn't
affect the query plan.

------
maweki
An important limitation is that the reference to the recursive table is only a
reference to the newly added tuples and you're only allowed to reference the
single recursive table once. This forbids, for example, having a NOT IN of the
recursive table besides the join you wanted to do anyways and it has some
unexpected behavior (as you may think you reference the whole recursive table
but you don't).

Limiting to linear recursion makes it less powerful and harder to program than
Datalog, but the algorithm is sufficiently fast. Which is probably necessary
as the recursive table is not indexed.

Also you're not only allowed UNION but also UNION ALL without a duplicate
check. There every new result has to be strictly smaller than the last or you
lose termination. Together with the strange referencing method to only the new
tuples, this is quite a footgun.

But not to be a spoilsport, we still teach it at our University for some
problems that would have required external programs to calculate the fixpoint.
You just have to know the limitations which the article does not explore.

~~~
radiowave
It's been a few years since I ran into this, but if memory serves you can work
around the limitation of only referring to the recursive table once by joining
a lateral subquery into the recursive part of the query. Inside the subquery
you're then free of this limitation.

~~~
maweki
Thank you for this tip. This sounds horrendous in terms of maintainability but
nice to see it's possible.

~~~
radiowave
I just found it in my notes. I don't think it looks too bad - everything you
were going to do in the recursive part, you just do inside the subquery
instead.

    
    
      WITH RECURSIVE my_cte AS (
        SELECT foo, bar FROM whatever
        UNION ALL
        SELECT (s).*
        FROM my_cte, LATERAL (
          SELECT ... -- refer to my_cte multiple times here
        ) s
      )

------
PaulMest
My team just started working on a project that will work with hierarchical
data. Have you looked into ltree[0] at all? Any thoughts on when that would or
would not be beneficial?

[0]
[https://www.postgresql.org/docs/current/ltree.html](https://www.postgresql.org/docs/current/ltree.html)

~~~
maweki
Use it. I think you get stronger guarantees on correctness and performance.

Not every of those path queries are easily implemented using recursive CTEs.

------
cyborgx7
With all this talk about nested trees of managers and subordinates, I'm
curious if it would be possible to trick a HR department into introducing a
loop into the structure, effectively creating a group of people that is no
longer subordinate to the CEO/Head of the Company.

~~~
lowdose
Thats exactly what HR did in many companies.

~~~
cyborgx7
You got a link to any tales about this happening? Would love to read about it.

------
pgt
Datalog Rules are better suited to recursive queries than SQL:
[http://www.learndatalogtoday.org/chapter/8](http://www.learndatalogtoday.org/chapter/8)

------
bionhoward
FOREIGN KEY (manager_id) REFERENCES employees (id) ON DELETE CASCADE

sounds dangerous for job security

