Hacker News new | past | comments | ask | show | jobs | submit login

That "notable criticism" is terrible. His whole argument is that because this algorithm does not outperform sklearn's HistGradientBoostingClassifier the paper is useless and a waste of his "very precious time"



The tweets / medium article are a bit incomplete. But they mean at least: - Authors didn't make decent efforts to build a robust baseline. Worse it seems that they have purposefully built a bad baseline to make their solution look better. - Authors didn't really disclose full performance. Statistical performance is one thing but time complexity is another. Hopular needs 10 mins on a 500 rows datasets. That's a NO NO for any serious application. - Authors didn't provide an easy to use interface. You can't really claim SoTA on small tabular data with something that isn't testable by everyone.


I think it is notable because he's an expert in that area and would know how to assess it. I'm not a fan of the tone either though...

But the authors claim "Hopular surpasses Gradient Boosting (e.g. XGBoost), Random Forests, and SVMs on tabular data." So if a simple sklearn model works better, it's worth knowing it might not be all it's cracked up to be before deciding to spend hours understanding another paper.


https://twitter.com/tunguz/status/1532480966836510753

The fact that SVM was the #2 algo in these results, and that linear methods beat out all the other methods means there's something going awry.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: