SQLite now allows multiple recursive SELECT statements in a single recursive CTE

simonw · on Oct 21, 2020

This means you can use SQLite as a graph database. The draft documentation for this feature talks about that in more detail here: https://sqlite.org/draft/lang_with.html#rcex3

Also relevant: the draft changelog for the next version (due out in December): https://sqlite.org/draft/changes.html

abetlen · on Oct 21, 2020

While you can use recursive common table expressions to perform graph queries, certain queries like breadth-first traversal are not very efficient for anything but trees (not even DAGs). See my post on the sqlite forum from the other day for more details

https://sqlite.org/forum/forumpost/cd5ff0a1d4?t=h

jasonhansel · on Oct 21, 2020

IMHO fixing this is more a job for the query planner/optimizer than for the SQL language itself. I wonder if there's a more general optimization that would help with this and other similar graph-related tasks...

jandrewrogers · on Oct 21, 2020

It isn't a query planner/optimizer problem.

Generalized graph traversals require each query to have additional data structures that grow with table size. This becomes an expensive proposition as tables get large, especially if you have many concurrent queries. Some databases designed for graph processing have internals and storage models that make it more efficient to resource manage this extra state. A SQL database, which has other priorities, would not implement this.

bmn__ · on Oct 21, 2020

Still cannot use window functions in recursive queries, e.g. `select row_number() over ()` to enumerate and distinguish paths along the edges of a graph. This works in PostgreSQL, but is an error in SQLite, just tested with preview 3.34.0 2020-10-20. This greatly diminishes the utility of graph querying because only (nearly) trivial queries don't drag window functions into them.

nikbackm · on Oct 21, 2020

Have you suggested this feature (along with a use case) on the SQLite forum?

bmn__ · on Oct 21, 2020

It's not my responsibility. Hipp knows he's out of compliance with the standard.

roenxi · on Oct 21, 2020

I don't know anything about graph databases, but ... why do we want to use SQLite as one? Doesn't the graph database community have something better to offer?

SQL has a knack for doing really easy things well, and moderately complicated things badly. I would assume by default that anything involving graphs should not involve SQL. Recursive CTEs to me scream "you're about to spend hours debugging something trivial".

zozbot234 · on Oct 21, 2020

> I would assume by default that anything involving graphs should not involve SQL.

That would be a questionable assumption. SQL databases are a widely tested tool, and SQL itself can allow you to augment your "graph" with constraints and semantics that many graph-focused systems have trouble with. CTE's, while not entirely trivial, are not overly complex; they're not something you'd spend "hours" debugging.

emodendroket · on Oct 21, 2020

I don't feel that way at all. I think a lot of that feeling comes from the fact that everyone's so dependent on ORMs they just rarely write SQL. Once you start doing it more you realize how powerful and elegant it can be.

zimpenfish · on Oct 21, 2020

> Doesn't the graph database community have something better to offer?

Probably not in the same kind of resource-constrained way, no. Whilst it might be a suboptimal graph database, it's also not going to require 2GB of RAM to run...

oever · on Oct 21, 2020

An interesting upcoming graph database is oxigraph. It's written in Rust and might be able to cope well in 2GB.

https://github.com/oxigraph/oxigraph/

zimpenfish · on Oct 21, 2020

Ooh, ta, that looks handy, especially with supporting SPARQL. If it can load my 1.1GB TTL, I'll be beyond happy.

pbronez · on Oct 21, 2020

I’d love to learn more about your use case... I think SPARQL and RDF are pretty interesting technologies, but haven’t seen many real applications using them.

zozbot234 · on Oct 21, 2020

I'd describe SPARQL and RDF as general interfaces focused on interop, not "technologies" per se. You could use any technology to provide a SPARQL/RDF endpoint, including a relational database.

zimpenfish · on Oct 21, 2020

My current use case is that I'm working on something which is basically things in groups (which may also be in groups) and seeing if things like "items in group A which are linked to group C via linkages to items in group B" are easier with SPARQL + graph database than they are with Postgres.

(Previous use case was converting music industry data into RDF and providing query interfaces on top. That 1.1GB TTL is a CWR file converted to RDF as a stress tester.)

zozbot234 · on Oct 21, 2020

> "items in group A which are linked to group C via linkages to items in group B"

Relational databases are actually quite convenient for such cases because you can model each "group" of items via a database table, including potentially an associative table which provides "linkage" between items in the database. Graph-based models are generally more limited than that, e.g. RDF and SPARQL are limited to simple triples which link a "source" and a "target" entity (both of which are essentially non-typed) according to a fixed "predicate". You can sort of materialize triples and endow them with extra information, but it gets clunky.

zimpenfish · on Oct 21, 2020

> Relational databases are actually quite convenient for such cases

I believe that someone may have done this in a nice way but every time I've encountered it (3 times thus far), it's always ended with complex SQL and tables being bent out of shape to try and keep performance.

> RDF and SPARQL are limited to simple triples which link a "source" and a "target" entity according to a fixed "predicate"

But can also infer transitive relations based on those predicates - `A canSee B`, `B canSee C` => `A canSee C` - which is handy when you're trying to discover those relationships in your data.

zozbot234 · on Oct 21, 2020

> But can also infer transitive relations based on those predicates - `A canSee B`, `B canSee C` => `A canSee C` - which is handy when you're trying to discover those relationships in your data.

You can do this sort of inference in a view if you use relational databases. (A view is a sort of "virtual" table based on the result of some database query. Many databases can also materialize views for improved performance, though this can make it a bit challenging to manage updates.)

zimpenfish · on Oct 22, 2020

> You can do this sort of inference in a view if you use relational databases.

You'd need to use recursive queries though, I think, and there be dragons.

> [materialized views] can make it a bit challenging to manage updates

We're not using them with Postgres because refreshing materialized views is very much a blunt hammer and it'd cause more hassle than it would solve. Which is annoying because they've been great when I've used them previously.

hobofan · on Oct 21, 2020

If we are just going by resource usage, I'm sure that RedisGraph (as it's "just" a Redis module) would also fit the bill.

detaro · on Oct 21, 2020

Isn't that keeping everything in memory?

zimpenfish · on Oct 21, 2020

It seems it can persist to disk just like normal Redis stuff but only work with in-memory graphs - tricky if it's too big to fit, I suppose.

cf https://github.com/RedisGraph/RedisGraph/issues/152

hobofan · on Oct 21, 2020

It's keeping everything in memory (index and entities) with disk persistence, but doesn't have a big memory overhead when running with little data.

dmitriid · on Oct 21, 2020

> why do we want to use SQLite as one

Because SQLite is ubiquitous. It's simply everywhere. And it doesn't need a dedicated server to run.

anitil · on Oct 22, 2020

I would also add the following - another good reason is that it's probably going to be the most reliable, best tested part of any stack you decide to use.

rocqua · on Oct 21, 2020

My experience with graph databases so far has been that they are tuned more for nice schemas and queries than for graph analysis.

I've actually been considering migrating away from neo4j to sql with recursive queries for performance reasons.

pizza234 · on Oct 21, 2020

> SQL has a knack for doing really easy things well, and moderately complicated things badly. I would assume by default that anything involving graphs should not involve SQL

A schema may have only one, or anyway, comparatively very few instances, of recursion.

In this case, using two separate data stores would be too much overhead; if the CTE doesn't do anything particularly fancy, that is, if it just retrieves records recursively associated via keys, it's not implicitly hard to debug (actually, there isn't much to debug).

A point in case is GitLab's users management, which dropped MySQL also because of the lack of CTEs (at the time).

philjohn · on Oct 21, 2020

Some of the early RDF Graph Stores were indeed based on SQL databases (Jena RDB/SDB) but you're correct in that they didn't scale particularly well.

There are pure graph stores out there, AllegroGraph, OpenLink Virtuoso (although this is a strange hybrid of SQL + Graph technologies) and others - and for more advanced graph query constructs like path finding there are optimisations that are difficult/not well supported in SQL.

mongol · on Oct 21, 2020

Yes why not? If you have a data model where parts of the data is graph-like, and other classic relational, why would you not want to store the entire thing in the same database? If your domain is completely graph-oriented, and you have complex requirements on your database to support graph-oriented operations, it would be one thing. But evaluating use of a well known and mature SQL database is no bad starting point.

roenxi · on Oct 21, 2020

> Yes why not? If you have a data model where parts of the data is graph-like, and other classic relational, why would you not want to store the entire thing in the same database?

SQL is an implementation of relational algebra, and databases that implement it are relational databases for storing relational data.

Pointing at the name of a thing isn't an argument, I admit it, but asking "what could go wrong if we want to use a system tailored to relational data to deal with non-relational data?" strikes me as the sort of question that might turn out to have a compelling answer 6 months in when it is too late to easily change the database.

If I were involved in a project, I would be representing very hard that the data comes out of the relational DB using "SELECT * FROM table WHERE conditions" and then the clever graph things start happening. If the data needs to be read from disk using clever graph based algorithms, then use a clever, graph-based database. There are a bunch out there according to Wikipedia.

zozbot234 · on Oct 21, 2020

Graph data is not necessarily "non-relational". A lot of real-world data just can't be described as "purely graph-like" (no complex semantics or constraints) or "purely relational" (no recursive queries, etc.); often it's convenient to merge both feature sets. This is exactly what modern SQL with CTE's allows.

emodendroket · on Oct 21, 2020

> Pointing at the name of a thing isn't an argument, I admit it, but asking "what could go wrong if we want to use a system tailored to relational data to deal with non-relational data?" strikes me as the sort of question that might turn out to have a compelling answer 6 months in when it is too late to easily change the database.

This would suggest that pretty much every existing RDBMS has made some very bad decisions since JSON/XML types, arrays, and all sorts of other non-relational features have long been supported.

Given the nature of SQLite you probably aren't dealing with petabytes of data.

shawnz · on Oct 21, 2020

If the project involves spending 6 months writing code involving navigating graphs, then SQLite probably isn't the right choice. If it's one or two queries, that's a different situation

roenxi · on Oct 21, 2020

Sure, but (and I am apologetic to be labouring on this point) if you have a small one-off blob of data that needs to be processed as a graph, SQL is the worst language to do the processing and a database is the worst place to be doing it. Get it into memory, use a programming language that supports recursive functions.

This is the basic argument here. If the situation is delicate enough we need the database to be doing graph operations, why would we pick SQLite? If it isn't, why would we pick SQL over Python?

I don't think it is bad that SQLite supports this, just ... what are the circumstances where this is a good idea? Is there a reason to do graph-style algorithms in SQL?

SQLite · on Oct 21, 2020

The problem that prompted me to make this extension to SQLite was a web-page request, so it needs to be processed with low latency. I initially tried loading the graph into memory and processing it that way. But there are approx 100K nodes in the graph, only a few dozen of which are relevant to the answer. It took a lot of time to pull in 100K nodes from disk. Running the query entirely in SQL is not only much less code to write, debug, and maintain, it is also much faster.

mongol · on Oct 21, 2020

Thanks for adding it. I learned window functions using SQLite, looking forward to learn recursive queries next.

shawnz · on Oct 21, 2020

> SQL is the worst language to do the processing and a database is the worst place to be doing it. ... Is there a reason to do graph-style algorithms in SQL?

I don't think it's such a far-off fit from the relational database problem. Any time you have a table with a relation to itself, there's the potential to do graph-style algorithms on that data.

> If the situation is delicate enough we need the database to be doing graph operations, why would we pick SQLite?

Perhaps it has nothing to do with the situation being "delicate" and it is just a simple matter of it being less work and less lines of code to use a graph-style query in SQL, rather than re-implementing the graph algorithms in your application code or having to bring in an entirely new database system just to process one query.

mongol · on Oct 21, 2020

I agree. There is nothing wrong relationally with self-referencing tables. Support for traversing such data models may have been poor in the past but with the support added, what is the problem?

mmsimanga · on Oct 21, 2020

I write SQL for a living and am time woodworker (not good at all). Sometimes I get pieces of wood cut by industrial machines at the shop and sometimes I use my manual saw. I don't have money or space to set up a professional wood cutting machine in my workshop. Hand held saw not the best for cutting wood but sometimes it just makes sense. You are right, graph database things need to be done in a graph database but speaking for myself I have never set up or used a graph database. It is easier (not right) for me to learn recursive statements than to set up and learn to use a graph database.

emodendroket · on Oct 22, 2020

It's an unlikely comparison in the first place; who is considering, on one hand, using SQLite right on the client device, and on the other, maybe spinning up a graph database that will almost certainly not just run on the client device? I submit that it's next to nobody. You can run SQLite on a server, but nearly nobody does that either (and not without reason). The idea that recursive CTEs are some sort of arcane technique that's impossible to debug is also not my experience with them at all.

ChuckMcM · on Oct 21, 2020

Very nice. Graph following for the win. I wrote an inventory control system about 10 years ago that achieved this in a very hacky (read code outside SQLite) way. The requirement was to be able to track items, track users of items, track how much time an individual user had used and item, and to track maintenance and repairs of items. The would let you "charge back" wear and tear on things to the most responsible party, and to provide lifetime information in order to buy replacements in a timely way. All that hackyness would be a single statement with this change.

anitil · on Oct 21, 2020

I loved everything about that thread.

Slightly off-topic, but I want to point out how well Richard deals with a slightly rude commenter [1]. Keeps it classy and productive while still calling out the bad behavior.

[1] https://fossil-scm.org/forum/forumpost/7817eb709d?t=h

twic · on Oct 21, 2020

The commenter's comment [1] is absolutely fine, though.

Hipp's response is also fine! The only thing that wouldn't be fine is if someone inappropriately reacted to that comment as an insult.

[1] https://fossil-scm.org/forum/forumpost/7840137009?t=h

anitil · on Oct 22, 2020

It's possible I'm misreading the tone of "has ways to go" and "niche volunteer-based SCM".

0xffff2 · on Oct 21, 2020

If anyone comes across as rude in that exchange it's Richard. Maybe it's just his style (I don't have any experience to judge by except this comment), but he comes off as very abrupt here.

frou_dh · on Oct 21, 2020

I don't like SQLite because of this: https://www.sqlite.org/datatype3.html

> The type affinity of a column is the recommended type for data stored in that column. The important idea here is that the type is recommended, not required. Any column can still store any type of data.

I simply don't want weird rows (that violate the _expressly declared_ type of a SQL table) to be merrily allowed into persistent storage.

sradman · on Oct 21, 2020

> Why on earth would I want weird rows (that violate the _expressly declared_ type of a table) to be merrily allowed into persistent storage?

Some problems are easy to solve given a column of type VARIANT: 1. simple sparse columns instead of a multi-type EAV table, 2. a BIG Table style schema that is easily range partitioned.

The main reason in SQLite is that most SQL dialects just work. SQLite becomes a universal desktop workbench for SQL. This flexibility also applies to things like delimited identifiers [1].

The problem is not that SQLite uses VARIANT datatypes, the problem is that once a datatype is specified it is not enforced by the engine. SQLite could/should add a PRAGMA to enable Domain Integrity much like it does with Foreign Keys (PRAGMA foreign_keys = ON;).

[1] https://sqlite.org/lang_keywords.html

frou_dh · on Oct 21, 2020

I remember hearing something along the lines of the odd typing in SQLite being vestigial from its early days when it was closely aligned with Tcl.

sradman · on Oct 21, 2020

I'm not sure of the original inspiration but SQLite's core storage types with variable length encoding but it is the right way to implement a row store or a data serialization format, in my opinion.

Fixed length data types in SQL were a premature optimization that led to verbose syntax like LONG VARCHAR FOR BIT DATA [1] and incompatibilities tied to internal implementation details. There is no reason why more specific constraints can't be specified on top of the core storage types.

One can argue that SQLite is sorely missing an exact NUMERIC type but that holds true for most/all language runtimes as well. Datetime is a bit weird too but the format of the DB file is a thing of beauty.

[1] https://db.apache.org/derby/docs/10.1/ref/rrefsqlj30118.html

lijogdfljk · on Oct 21, 2020

Anyone have a recommended resource for understanding the practical application of recursive CTEs / Graph DBs?

dspillett · on Oct 21, 2020

Look for a good reference about handling trees in SQL. There are a number of books on the whole area, and many smaller more focused articles online.

The most common place I've encountered recursive CTEs being massively useful in real world work is working with company responsibility & reporting hierarchies (line management, regulatory supervision, "spans of control" reporting, ...) where there is not a fixed number of levels.

When there are a small and fixed number of levels (for instance every drone has a supervisor and senior supervisor and you don't monitor connections above that) there are often other solutions that people find easier to understand and/or perform faster but for deeper trees or those where the level structure is any less static, recursive CTEs are a godsend.

Other examples include nested taxonomies (nature, book or other publication filing, nested tags for categorisation), properly threaded discussion records, family trees (though these are graphs rather than trees unless you impose some strict limits on what you count), representing file-system like structures in the DB, ...

Graphs get tricker (trees are a subset of graphs) as you may need to consider multiple paths to the same node, infinite loops, and other complications, so I suggest looking into trees first then expending your research.

chmod775 · on Oct 21, 2020

Going out on a limb here because your question can be interpreted in multiple ways and I only feel qualified to answer one of them:

The most common example for graphs in DBs would be a graph representing the friends/contacts of users. It's one of these "you'll know when you need it" solutions.

Sometimes your data structure is a graph, and sometimes you want that graph to live in a database.

And in even rarer cases you want to search that graph in your database in some recursive manner.

jslakro · on Oct 21, 2020

What about representing states? Not sure about how graph dbs could perform on that scenario. There is a recent post talking about the topic and I'm still thinking about the best solution to store flows or similar. https://news.ycombinator.com/item?id=24764303

sradman · on Oct 21, 2020

Recursive Common Table Expressions (CTE) are hard to understand until you've worked through at least one real-world example. The SQLite documentation on the WITH Clause has a couple of simple examples [1] for an Org Chart and a Family; both represent hierarchical data structures.

The Fossil discussion involves the SCM's representation of a File System, probably the most common hierarchical data structure we regularly encounter. Hierarchical data structures are notoriously hard in SQL. Recursive CTEs are not easy but they are useful and efficient for hierarchical data.

[1] https://www.sqlite.org/draft/lang_with.html#hierarchical_que...

sradman · on Oct 21, 2020

Another good writeup is included in the SQL Anywhere documentation [1] which has a good example of a parts explosion problem [2]. I don't know if the syntax works in SQLite but it might be a good exercise to get it to work.

[1] http://dcx.sap.com/1200/en/dbusage/ug-commontblexpr.html

[2] http://dcx.sap.com/1200/en/dbusage/parts-explosion-cte-sqlug...

malkia · on Oct 21, 2020

I've used them (in postgresql) to evaluate the merging of "parent" and "child" values - e.g. one row can be parent, and if the child has NULL for value it takes it from the parent (though do not remember much of the details). This can go to deeper just first parent, but parent of parent too.

I was able to make a query like this in PostgreSQL, but forgot the details of how I did it (though once you sit down, you get it eventually).

dliff · on Oct 21, 2020

In the past I used recursive CTEs in Postgres to calculate the BOM of a product. Products consisted of "parts", and each "part" had a cost and could consist of any number of other "parts" (subassemblies). This was for a homemade ERP resembling software.

jiofih · on Oct 21, 2020

While the end result looks like a graph, I think that still qualifies as a standard relational DB use case though?

A graph database usually means you’re storing the graph nodes and edges as entities, so you can traverse the graph easily from any direction, and express different kinds of relationships.

dliff · on Oct 22, 2020

Yeah, wasn't trying to say this was an example of graph. Maybe it is slightly outside what would be considered "standard" relational DB use case? My impression of recursive CTE's in RDBMS is that if you need speed or any more advanced graph-like features, you should move to an actual graph db. So far I haven't taken on any projects that warranted that, though it does interest me.

Either way my intention here was to simply provide an example of a real-world recursive CTE use case.

nitely · on Oct 21, 2020

One application not mention in this thread is to traverse trees, like in the old forums where you had many levels of categories/subcategories. AKA hierarchical query. SQLite already has support for this[0]. It seems they are supporting multiple selects now.

[0] https://sqlite.org/lang_with.html

nonesuchluck · on Oct 21, 2020

Yes, chapter 2 in Martin Kleppmann's Designing Data-Intensive Applications.

molyss · on Oct 21, 2020

The thread is here : https://fossil-scm.org/forum/forumpost/bacf8cf88c?t=h&unf

hehetrthrthrjn · on Oct 21, 2020

If I'm not mistaken, this brings SQLite up to parity with Datalog.

lactobacillis · on Oct 21, 2020

Can you please expand on this. Thanks

slifin · on Oct 21, 2020

Unless you're using Clojure engines with Datomic pull syntax, SQL engines (most? all?) can't pull out nested results without serialization nastiness

carterschonwald · on Oct 21, 2020

Does this mean it’ll be easy to do graph transitive closure computations in sqlite now?

seg_lol · on Oct 21, 2020

Do you mean iterating on an agency matrix dimension times to get the full path length between any two nodes?

I have had to run my recursive queries external to the database (MySQL, SQLite), this change looks to bring it into the database. Excellent news!

castratikron · on Oct 21, 2020

Transitive closure will tell you if a path between two nodes exists, not the length of the shortest path between two nodes.

whoisburbansky · on Oct 21, 2020

You're technically correct, but afaik there isn't really a way to compute transitive closure without also computing shortest paths along the way, modulo constant factors?

simcop2387 · on Oct 21, 2020

You'll compute paths along the way but won't know that they're the shortest path. Odds are good though that you'll havr found something good enough for most purposes though.

whoisburbansky · on Oct 21, 2020

If all you do is iterate over the adjacency matrix, sure, but wouldn't Floyd-Warshall be the same time complexity? All I'm saying is that you get both pieces of information for essentially the same price, modulo constant factors.

bawolff · on Oct 21, 2020

Even if it is, i imagine its not going to be that efficient if it isnt optimized for that use csse.

nikeee · on Oct 21, 2020

I solved a task involving transitive closures in last year's Advent of Code using SQLite:

https://github.com/nikeee/advent-of-code-2019/blob/master/06...

How could this be improved with the new feature of SQLite?

chillfox · on Oct 21, 2020

Nice, I was about to pull RedisGraph in to a small project using SQLite to do two graph queries.