I read that yesterday and felt a bit tricked. You said there was a one line diff, which to me suggested you had made a query optimiser and added that to trustfall so that consumer applications could transparently benefit without writing any new code. But really, as it turned out, you just added APIs for for indexing a table and using those indexes, and then used those APIs to do select manual optimisations in a “fast path” in the trustfall-rustdoc-adapter crates.
What does it matter that some crates are called trustfall adapters and some are not? You still had to optimise the execution of the query manually. I can see how it’s cool you didn’t need to change the text of the query, but people like SQL because the execution engines are smart enough to optimise for you. They will build a little hash table index on the fly if it saves a lot of full table scanning. The expectations re smart optimisation in the market you’re competing in are very high. If you say it was a one line upgrade to the trustfall executor then people will believe you.
The net result is better than what most GraphQL infrastructure gives you. GraphQL doesn’t give you any tools to avoid full table scans, it just tells you “here’s the query someone wants, I will do the recursive queries with as many useless round trips as possible unless you implement all these things from scratch”. At least your API has the concept of an index now. But I think you’re trying to sell it as being as optimisable as SQL while trying to avoid telling users the bad news that it’s going to be them who has to write the optimiser, not you.
Trustfall isn't trying to compete with SQL; doing so would be suicidal and pointless. If the dataset being queried already has a statistics-gathering query optimizer, it's just the wrong call to not use that optimizer. If one wrote a Trustfall query that is partially or fully served by data in a SQL database, the correct answer is to use Trustfall to compile that piece of the query to SQL and run it as SQL (perhaps by adding some auto-pagination/parallelization before compiling to SQL, but that's beside the point).
Most uses of data don't have anything like SQL / any kind of a query language, let alone an optimizer. No tool I know of other than Trustfall can let one have optimization levers (automatic or human-in-the-loop) where one can optimize access to a REST API, a local JSON file, or a database -- all separately from how you write the query.
With Trustfall, I'm not promising "magical system that will optimize your queries for you without you having to lift a finger" -- at least not for a good long while. But I can promise, and deliver, "you can write queries over any combo of data sources, and if need be optimize them without rewriting them from scratch." This means that you can have product-facing engineers write queries, and infra-facing engineers optimize them, with both sides isolated from the other: product doesn't care if there's a cache or an index or a bulk API endpoint vs item-at-a-time endpoint, and infra has strong guarantees on execution performance and optimizability so they aren't that worried about a product query getting out of hand and wrecking the system. Trustfall buys operational freedom and leverage across your entire data footprint.
You can see this effect in play in cargo-semver-checks. We use lint-writing as an onboarding tool, because anyone can write a query, and we know we can optimize them later if need be. Both Trustfall and the adapters will get better over time, so queries get faster "naturally". We get efficient execution over many different rustdoc JSON formats simultaneously, without version-specific query logic. And while the hashtable indexing optimizations required some manual work that I didn't time exactly, it was limited to ~1-2h tops and made all queries in the repo faster automatically with no query changes. Rolling out the optimization would be operationally very simple: it's trivial to test, and thanks to the Trustfall engine, I wouldn't have to test it with every combination of filters and edge operations -- if the edge fetch logic is correct, the engine guarantees the rest. Put simply, nobody else needed to know that I made the optimization -- the only observable impact to any other dev on the project is that queries run faster now.
I know all that. I just thought you might like to do another pass editing your piece. It is your marketing material at this stage. It would be nice if it gave people a clearer impression of what the capabilities are and where trustfall is positioned in relation to SQL, GraphQL, and other stuff. I came away a bit suspicious of your claims because I didn’t understand them when I first read it.
My only question about the actual code is whether you can write these indices to do hash lookups across data sources. Can I avoid table scans when joining two data sets from different adapters?
I appreciate it! Writing is hard (especially not in my native language) and I'm always looking to improve, so feedback like this is valuable.
To be honest, that blog post was targeted at cargo-semver-checks users and r/rust readers, to give them a sense of how cargo-semver-checks is designed and why, with a motivating example of speeding up queries while supporting multiple rustdoc versions. It wasn't really meant to be "Trustfall's entrance on the world stage" even though it kind of ended up being that...
I plan to write more blog posts (and code!) about Trustfall's specific capabilities (and things it can't/shouldn't do) in the future, so hopefully those will come up first in people's searches and give folks the right impression.
Re: multiple adapters, yes, that's the plan. I have some prototype code for turning multiple adapters into a single adapter over the union of the datasets + any cross-dataset edges, and it supports the new optimizations API so the same kind of trick should work in the same way. In general, Trustfall is designed to be highly composable like this: you should be able to put Trustfall over any data source, including another instance of Trustfall, and have things keep working reasonably throughout.
What does it matter that some crates are called trustfall adapters and some are not? You still had to optimise the execution of the query manually. I can see how it’s cool you didn’t need to change the text of the query, but people like SQL because the execution engines are smart enough to optimise for you. They will build a little hash table index on the fly if it saves a lot of full table scanning. The expectations re smart optimisation in the market you’re competing in are very high. If you say it was a one line upgrade to the trustfall executor then people will believe you.
The net result is better than what most GraphQL infrastructure gives you. GraphQL doesn’t give you any tools to avoid full table scans, it just tells you “here’s the query someone wants, I will do the recursive queries with as many useless round trips as possible unless you implement all these things from scratch”. At least your API has the concept of an index now. But I think you’re trying to sell it as being as optimisable as SQL while trying to avoid telling users the bad news that it’s going to be them who has to write the optimiser, not you.