>*You already made a faulty assumption — that we're interested in "classifying t...

derefr · 2024-03-07T21:11:31 1709845891

Data teams in companies often aim to enable the answering of future questions nobody has asked yet, by creating denormalizations of their data that offer maximum flexibility in what classes of questions they can answer. Maximum "power."

Lately, that means they're often spending a lot of resources (and even novel R&D time!) getting various kinds of ML models trained on the data.

My point is that this is often pointless, because, given the type of data they're working with (tabular, quantitative line-of-business data), they won't actually see "arbitrary questions"; they'll see the strict subset of arbitrary questions that could have been solved just as well — if not much better! — with a SQL query. And for much less capital expenditure — because the LOB data usually already lives in an RDBMS in the first place.