There are some solid ideas here and would definitely apply to the IVM engine we're building. I'm curious if some of these effects could play a role in faster rust compilation times (e.g. nopanic..)?
Some folks pointed out that no one should design a SQL schema like this and I agree. We deal with large enterprise customers, and don't control the schemas that come our way. Trust me, we often ask customers if they have any leeway with changing their SQL and their hands are often tied. We're a query engine, so have to be able to ingest data from existing data sources (warehouse, lakehouse, kafka, etc.), so we have to be able to work with existing schemas.
So what then follows is a big part of the value we add: which is, take your hideous SQL schema and queries, warts and all, run it on Feldera, and you'll get fully incremental execution at low latency and low cost.
700 isn't even the worst number that's come our way. A hyperscale prospect asked about supporting 4000 column schemas. I don't know what's in that table either. :)
This site is underweighted on OLAP. Columnstores were invented for precisely this use case; nobody in the field wants to normalize everything.
Which brings me to the question, why a rowstore? Are Z-sets hard to manage otherwise?
Another aspect of wide tables is that they tend to have a lot of dependencies, ie different columns come from different aggregations, and the whole table gets held up if one of them is late. IVM seems like a good solution for that problem.
Feldera tries to be row- and column-oriented in the respective parts that matter. E.g. our LSM trees only store the set of columns that are needed, and we need to be able to pick up individual rows from within those columns for the different operators.
I don't think we've converged on the best design yet here though. We're constantly experimenting with different layouts to see what performs best based on customer workloads.
Start with Postgres and scale later once you have a better idea of your access patterns. You will likely model your graph as entities and recursively walk the graph (most likely through your application).
If the goal is to maintain views over graphs and performance/scale matters, consider Feldera. We see folks use it for its ability to incrementally maintain recursive SQL views (disclaimer: I work there).
I currently run Firefox nightly with cross-site cookies disabled and all the trackers/scripts blocked. I also run uBlock Origin. Any idea if privacy badger is redundant with this set up?
reply