I read that yesterday and felt a bit tricked. You said there was a one line diff...

obi1kenobi · on Feb 9, 2023

Trustfall isn't trying to compete with SQL; doing so would be suicidal and pointless. If the dataset being queried already has a statistics-gathering query optimizer, it's just the wrong call to not use that optimizer. If one wrote a Trustfall query that is partially or fully served by data in a SQL database, the correct answer is to use Trustfall to compile that piece of the query to SQL and run it as SQL (perhaps by adding some auto-pagination/parallelization before compiling to SQL, but that's beside the point).

Most uses of data don't have anything like SQL / any kind of a query language, let alone an optimizer. No tool I know of other than Trustfall can let one have optimization levers (automatic or human-in-the-loop) where one can optimize access to a REST API, a local JSON file, or a database -- all separately from how you write the query.

With Trustfall, I'm not promising "magical system that will optimize your queries for you without you having to lift a finger" -- at least not for a good long while. But I can promise, and deliver, "you can write queries over any combo of data sources, and if need be optimize them without rewriting them from scratch." This means that you can have product-facing engineers write queries, and infra-facing engineers optimize them, with both sides isolated from the other: product doesn't care if there's a cache or an index or a bulk API endpoint vs item-at-a-time endpoint, and infra has strong guarantees on execution performance and optimizability so they aren't that worried about a product query getting out of hand and wrecking the system. Trustfall buys operational freedom and leverage across your entire data footprint.

You can see this effect in play in cargo-semver-checks. We use lint-writing as an onboarding tool, because anyone can write a query, and we know we can optimize them later if need be. Both Trustfall and the adapters will get better over time, so queries get faster "naturally". We get efficient execution over many different rustdoc JSON formats simultaneously, without version-specific query logic. And while the hashtable indexing optimizations required some manual work that I didn't time exactly, it was limited to ~1-2h tops and made all queries in the repo faster automatically with no query changes. Rolling out the optimization would be operationally very simple: it's trivial to test, and thanks to the Trustfall engine, I wouldn't have to test it with every combination of filters and edge operations -- if the edge fetch logic is correct, the engine guarantees the rest. Put simply, nobody else needed to know that I made the optimization -- the only observable impact to any other dev on the project is that queries run faster now.

cormacrelf · on Feb 9, 2023

I know all that. I just thought you might like to do another pass editing your piece. It is your marketing material at this stage. It would be nice if it gave people a clearer impression of what the capabilities are and where trustfall is positioned in relation to SQL, GraphQL, and other stuff. I came away a bit suspicious of your claims because I didn’t understand them when I first read it.

My only question about the actual code is whether you can write these indices to do hash lookups across data sources. Can I avoid table scans when joining two data sets from different adapters?

obi1kenobi · on Feb 9, 2023

I appreciate it! Writing is hard (especially not in my native language) and I'm always looking to improve, so feedback like this is valuable.

To be honest, that blog post was targeted at cargo-semver-checks users and r/rust readers, to give them a sense of how cargo-semver-checks is designed and why, with a motivating example of speeding up queries while supporting multiple rustdoc versions. It wasn't really meant to be "Trustfall's entrance on the world stage" even though it kind of ended up being that...

I plan to write more blog posts (and code!) about Trustfall's specific capabilities (and things it can't/shouldn't do) in the future, so hopefully those will come up first in people's searches and give folks the right impression.

Re: multiple adapters, yes, that's the plan. I have some prototype code for turning multiple adapters into a single adapter over the union of the datasets + any cross-dataset edges, and it supports the new optimizations API so the same kind of trick should work in the same way. In general, Trustfall is designed to be highly composable like this: you should be able to put Trustfall over any data source, including another instance of Trustfall, and have things keep working reasonably throughout.