Hacker News new | past | comments | ask | show | jobs | submit login

>You already made a faulty assumption — that we're interested in "classifying the data" in the first place.

It's not clear what your point is. If you're not interested in the predictions that tree-based models provide, do not use tree-based models on your tabular data. A predictive model and a SQL query are not the same thing.




Data teams in companies often aim to enable the answering of future questions nobody has asked yet, by creating denormalizations of their data that offer maximum flexibility in what classes of questions they can answer. Maximum "power."

Lately, that means they're often spending a lot of resources (and even novel R&D time!) getting various kinds of ML models trained on the data.

My point is that this is often pointless, because, given the type of data they're working with (tabular, quantitative line-of-business data), they won't actually see "arbitrary questions"; they'll see the strict subset of arbitrary questions that could have been solved just as well — if not much better! — with a SQL query. And for much less capital expenditure — because the LOB data usually already lives in an RDBMS in the first place.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: