Hacker News new | past | comments | ask | show | jobs | submit login
Recursion with SQL (desple.com)
56 points by thebillkidy on Jan 23, 2015 | hide | past | favorite | 25 comments



For those who prefer standard syntax in an open-source system, see:

http://www.postgresql.org/docs/9.4/static/queries-with.html


I recommend this link to anyone contemplating storing/querying hierarchical data in sql/rdbms. The "LR Method" is the best solution I've found. Does not depend on vendor specific funcationality and/or keywords:

http://kamfonas.com/id3.html


We started writing a threaded discussion system using these, but it ended up being much easier to use Postgres's Recursive CTEs. Highly recommend!


this is also called nested sets.. it's nice but has some tradeoffs as well


...and no mention at all of recursive CTEs.

My favorite ridiculous use-case for them: https://wiki.postgresql.org/wiki/Mandelbrot_set

(For reference, that ran in 252ms on my low-end Linode.)


I didn't think Oracle SQL server has CTEs, but maybe that's changed.

Postgres CTEs are amazing, though. Surprisingly efficient for our particular usage case (threaded discussions that don't ever get too deep).


Oracle SQL Indeed has CTE's :) I mentioned those together with other special use cases in previous articles on my blog.


Oracle has recursive CTEs since, I think, 11gR2, and non-recursive CTEs since 9.2.


Beware!

If you're running one-off queries, you probably won't have issues using recursion, but there are a lot of instances where you really don't want to run a complex SQL statement for each of the 1M users reading data from your site. Selecting the proper database table structure can help dramatically!

For example, take any data structured as a tree. If there are more writes than reads, it's pretty easy to put new data in place and your read becomes recursive. When you have vastly more reads than writes it's almost always better to "pre-order" the data. I've had good success with using the modified pre-ordered lists for tree structures but other table formats and processing can be used for other data types. Sometimes it's just careful but non-obvious indexing.


So essentially this is Oracle SQL - does anyone know if there are analogous similar START WITH and CONNECT BY keywords for other platforms? (author mentions MSSQL has the them, but no mention of MySQL/MariaDB or PGSQL).


Recursive CTEs are strictly more powerful than Oracle's START WITH/CONNECT BY and are supported by most databases (pretty much everything other than the MySQL/MariaDB family -- SQLServer, Oracle, Postgres, even SQlite.)

To be fair to Oracle, START WITH/CONNECT BY has been available in Oracle a lot longer than CTEs as part of the SQL standard have existed.


No support in MySQL/MariaDB.

There is support in Postgres, sqlite, MSSQL, HSQLDB, Oracle, etc... for CTEs.

You can sort of emulate some recursive SQL properties using a temporary table:

http://brianv0.github.io/blog/2013/6/recursive.html


Teradata has the WITH RECURSIVE statement which is pretty easy to use.


They're nice to have but you can do some or all the same things using recursive CTE in PostgreSQL, for one.


Nice. A few tricks with dual I haven't come across before.

When you get into examples using tables it would be nice if you provided the table definitions and some sample data.

PS: psoug's connect by page (http://psoug.org/reference/connectby.html) is my favorite reference.


I prefer the syntax of recursive CTEs (as many others have mentioned).

A simple use case that I have in my work is generating a date table, and a recursive CTE sits at the core of this script:

TSQL Syntax:

WITH dateCTE (FullDate) AS (SELECT '20100101' UNION ALL SELECT DATEADD(DAY, 1, FullDate) FROM dateCTE WHERE FullDate < 20101231 )

In the example above, we generate a table of all dates in 2010.

I don't get much chance to play with Postgres, but from their documentation[0] it seems the concept transfers over readily by using WITH RECURSIVE.

[0]http://www.postgresql.org/docs/9.4/static/queries-with.html


In postgres you would use generate_series for that.

http://www.postgresql.org/docs/current/static/functions-srf....


Neat.

It seems we can simulate this behavior in TSQL[0] by using recursive CTEs in a function.

[0]https://developer42.wordpress.com/2014/09/17/t-sql-generate-...


There are, of course, many very large databases that would need something like this. However, I've found that most small sites I've worked on, a simple "lineage" system in a completely flat single table is more than adequate.

(The following is a link to my own website where I explain it using a Codeigniter library I built. It can be done in any type of code, however) http://codebyjeff.com/blog/2012/10/nested-data-with-mahana-h...


I like your final solution for a calendar. I've implemented a calendar like this in a jasper report (on Oracle), with hyperlinks to actions on that date . Your SQL is cleaner to read. However on my environment the week numbers don't line up correctly. I had to change the line

    , TO_CHAR(x +1, 'iw') weeknumber

    , TO_CHAR(x , 'iw') weeknumber
I guess this might be environmental as some regions may consider the start/end of the week on different days.


This seems to be taking advantage of a specific Oracle feature whose performance I have no idea about, there are suggestions that Common Table Expressions could do the same sort of thing but since CTE performance is generally horrible I'd be concerned that you get nice syntax and horrible performance. Does anyone have a suggestion as to how you could do something similar across different DBs without basically killing performance or am I missing something here?


I recommend actually trying it (e.g. PostgreSQL CTE) on a problem you are trying to solve, if you haven't already.

People tend to report the bad cases, and stay silent for the good cases, which can contribute to a sense of fear of the feature. It could have started from a real issue, or it could just be someone trying to solve the traveling salesman problem with fake data and then blogging when it's slow.


Recursive CTEs work quite well in PostgreSQL, and the performance isn't at all what I'd call "horrible".


In the read-mostly case, you can use an adjacency list, use a query with a CTE to generate a heirarchy in a materialized view which is updated when you write to the main table, and query against the materialized view. You still pay the performance cost for the CTE, but only on writes, not every read.

This doesn't help in a write-heavy use case.


I have not included CTE's because I wanted to show pure recursion queries without the use of this. I think CTE's will be faster though but I have yet to verify this.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: