Our classifier outperforms CatBoost, XGBoost, LightGBM on 5 benchmark datasets

hamid9 · 2024-05-12T17:52:26

Hi All!

We're happy to share LinearBoost, our latest development in machine learning classification algorithms. LinearBoost is based on boosting a linear classifier to significantly enhance performance. Our testing shows it outperforms traditional GBDT algorithms in terms of accuracy and response time across five well-known datasets. The key to LinearBoost's enhanced performance lies in its approach at each estimator stage. Unlike decision trees used in GBDTs, which select features sequentially, LinearBoost utilizes a linear classifier as its building block, considering all available features simultaneously. This comprehensive feature integration allows for more robust decision-making processes at every step.

We believe LinearBoost can be a valuable tool for both academic research and real-world applications. Check out our results and code in our GitHub repo: https://github.com/LinearBoost/linearboost-classifier

We'd love to get your feedback and suggestions for further improvements!

ImageXav · 2024-05-12T19:45:31

So, this looks really interesting and I look forward to delving into the methodology in order to understand the algorithm better. However, what I immediately noticed from the paper linked in the documentation was that linearboost has a worse F1 score on average than the mentioned classifiers. Where it shines is energy consumption. Would it be possible to edit the title to reflect this? It's a huge gain in energy efficiency for a relatively small F1 loss, so kudos for that, but I think people might be expecting something a bit different from the title.

hamid9 · 2024-05-12T19:50:03

Thank you for your message and pointing it out! I think it needs some clarification (I will update the documentations as well). The classification algorithm that you mentioned is SEFR, which is energy-efficient, but not as accurate as other algorithms. LinearBoost is the boosted version of SEFR, and it has superior F1 in 5 benchmark datasets over GBDTs. So, SEFR to LinearBoost is somehow like Decision Tree to CatBoost. SEFR is fast, and by boosting SEFR, we have LinearBoost which is slower but accurate. The results will be provided as a paper, but now, they are in the GitHub Repository's README file.

ImageXav · 2024-05-12T21:12:17

Ahhh I see, that makes sense. Thank you for clarifying I appreciate it. I made the mistake of assuming that the paper in the documentation was the paper of interest. I will take the time to properly delve in further once the paper is released, do you have any idea when that might be? In the mean time I look forward to giving testing the method on some toy examples I have.

hamid9 · 2024-05-12T22:21:35

Thank you for bringing up this issue! Our plan is to release the paper in a month, but let's see how it goes. Feel free to reach out to me if you have any questions!