Also relevant: the draft changelog for the next version (due out in December): https://sqlite.org/draft/changes.html
Generalized graph traversals require each query to have additional data structures that grow with table size. This becomes an expensive proposition as tables get large, especially if you have many concurrent queries. Some databases designed for graph processing have internals and storage models that make it more efficient to resource manage this extra state. A SQL database, which has other priorities, would not implement this.
SQL has a knack for doing really easy things well, and moderately complicated things badly. I would assume by default that anything involving graphs should not involve SQL. Recursive CTEs to me scream "you're about to spend hours debugging something trivial".
That would be a questionable assumption. SQL databases are a widely tested tool, and SQL itself can allow you to augment your "graph" with constraints and semantics that many graph-focused systems have trouble with. CTE's, while not entirely trivial, are not overly complex; they're not something you'd spend "hours" debugging.
Probably not in the same kind of resource-constrained way, no. Whilst it might be a suboptimal graph database, it's also not going to require 2GB of RAM to run...
(Previous use case was converting music industry data into RDF and providing query interfaces on top. That 1.1GB TTL is a CWR file converted to RDF as a stress tester.)
Relational databases are actually quite convenient for such cases because you can model each "group" of items via a database table, including potentially an associative table which provides "linkage" between items in the database. Graph-based models are generally more limited than that, e.g. RDF and SPARQL are limited to simple triples which link a "source" and a "target" entity (both of which are essentially non-typed) according to a fixed "predicate". You can sort of materialize triples and endow them with extra information, but it gets clunky.
I believe that someone may have done this in a nice way but every time I've encountered it (3 times thus far), it's always ended with complex SQL and tables being bent out of shape to try and keep performance.
> RDF and SPARQL are limited to simple triples which link a "source" and a "target" entity according to a fixed "predicate"
But can also infer transitive relations based on those predicates - `A canSee B`, `B canSee C` => `A canSee C` - which is handy when you're trying to discover those relationships in your data.
You can do this sort of inference in a view if you use relational databases. (A view is a sort of "virtual" table based on the result of some database query. Many databases can also materialize views for improved performance, though this can make it a bit challenging to manage updates.)
You'd need to use recursive queries though, I think, and there be dragons.
> [materialized views] can make it a bit challenging to manage updates
We're not using them with Postgres because refreshing materialized views is very much a blunt hammer and it'd cause more hassle than it would solve. Which is annoying because they've been great when I've used them previously.
Because SQLite is ubiquitous. It's simply everywhere. And it doesn't need a dedicated server to run.
I've actually been considering migrating away from neo4j to sql with recursive queries for performance reasons.
A schema may have only one, or anyway, comparatively very few instances, of recursion.
In this case, using two separate data stores would be too much overhead; if the CTE doesn't do anything particularly fancy, that is, if it just retrieves records recursively associated via keys, it's not implicitly hard to debug (actually, there isn't much to debug).
A point in case is GitLab's users management, which dropped MySQL also because of the lack of CTEs (at the time).
There are pure graph stores out there, AllegroGraph, OpenLink Virtuoso (although this is a strange hybrid of SQL + Graph technologies) and others - and for more advanced graph query constructs like path finding there are optimisations that are difficult/not well supported in SQL.
SQL is an implementation of relational algebra, and databases that implement it are relational databases for storing relational data.
Pointing at the name of a thing isn't an argument, I admit it, but asking "what could go wrong if we want to use a system tailored to relational data to deal with non-relational data?" strikes me as the sort of question that might turn out to have a compelling answer 6 months in when it is too late to easily change the database.
If I were involved in a project, I would be representing very hard that the data comes out of the relational DB using "SELECT * FROM table WHERE conditions" and then the clever graph things start happening. If the data needs to be read from disk using clever graph based algorithms, then use a clever, graph-based database. There are a bunch out there according to Wikipedia.
This would suggest that pretty much every existing RDBMS has made some very bad decisions since JSON/XML types, arrays, and all sorts of other non-relational features have long been supported.
Given the nature of SQLite you probably aren't dealing with petabytes of data.
This is the basic argument here. If the situation is delicate enough we need the database to be doing graph operations, why would we pick SQLite? If it isn't, why would we pick SQL over Python?
I don't think it is bad that SQLite supports this, just ... what are the circumstances where this is a good idea? Is there a reason to do graph-style algorithms in SQL?
I don't think it's such a far-off fit from the relational database problem. Any time you have a table with a relation to itself, there's the potential to do graph-style algorithms on that data.
> If the situation is delicate enough we need the database to be doing graph operations, why would we pick SQLite?
Perhaps it has nothing to do with the situation being "delicate" and it is just a simple matter of it being less work and less lines of code to use a graph-style query in SQL, rather than re-implementing the graph algorithms in your application code or having to bring in an entirely new database system just to process one query.
Slightly off-topic, but I want to point out how well Richard deals with a slightly rude commenter . Keeps it classy and productive while still calling out the bad behavior.
Hipp's response is also fine! The only thing that wouldn't be fine is if someone inappropriately reacted to that comment as an insult.
> The type affinity of a column is the recommended type for data stored in that column. The important idea here is that the type is recommended, not required. Any column can still store any type of data.
I simply don't want weird rows (that violate the _expressly declared_ type of a SQL table) to be merrily allowed into persistent storage.
Some problems are easy to solve given a column of type VARIANT: 1. simple sparse columns instead of a multi-type EAV table, 2. a BIG Table style schema that is easily range partitioned.
The main reason in SQLite is that most SQL dialects just work. SQLite becomes a universal desktop workbench for SQL. This flexibility also applies to things like delimited identifiers .
The problem is not that SQLite uses VARIANT datatypes, the problem is that once a datatype is specified it is not enforced by the engine. SQLite could/should add a PRAGMA to enable Domain Integrity much like it does with Foreign Keys (PRAGMA foreign_keys = ON;).
Fixed length data types in SQL were a premature optimization that led to verbose syntax like LONG VARCHAR FOR BIT DATA  and incompatibilities tied to internal implementation details. There is no reason why more specific constraints can't be specified on top of the core storage types.
One can argue that SQLite is sorely missing an exact NUMERIC type but that holds true for most/all language runtimes as well. Datetime is a bit weird too but the format of the DB file is a thing of beauty.
The most common place I've encountered recursive CTEs being massively useful in real world work is working with company responsibility & reporting hierarchies (line management, regulatory supervision, "spans of control" reporting, ...) where there is not a fixed number of levels.
When there are a small and fixed number of levels (for instance every drone has a supervisor and senior supervisor and you don't monitor connections above that) there are often other solutions that people find easier to understand and/or perform faster but for deeper trees or those where the level structure is any less static, recursive CTEs are a godsend.
Other examples include nested taxonomies (nature, book or other publication filing, nested tags for categorisation), properly threaded discussion records, family trees (though these are graphs rather than trees unless you impose some strict limits on what you count), representing file-system like structures in the DB, ...
Graphs get tricker (trees are a subset of graphs) as you may need to consider multiple paths to the same node, infinite loops, and other complications, so I suggest looking into trees first then expending your research.
The most common example for graphs in DBs would be a graph representing the friends/contacts of users. It's one of these "you'll know when you need it" solutions.
Sometimes your data structure is a graph, and sometimes you want that graph to live in a database.
And in even rarer cases you want to search that graph in your database in some recursive manner.
The Fossil discussion involves the SCM's representation of a File System, probably the most common hierarchical data structure we regularly encounter. Hierarchical data structures are notoriously hard in SQL. Recursive CTEs are not easy but they are useful and efficient for hierarchical data.
I was able to make a query like this in PostgreSQL, but forgot the details of how I did it (though once you sit down, you get it eventually).
A graph database usually means you’re storing the graph nodes and edges as entities, so you can traverse the graph easily from any direction, and express different kinds of relationships.
Either way my intention here was to simply provide an example of a real-world recursive CTE use case.
I have had to run my recursive queries external to the database (MySQL, SQLite), this change looks to bring it into the database. Excellent news!
How could this be improved with the new feature of SQLite?