
LightGBM – A fast, distributed, gradient boosting framework - gwulf
https://github.com/Microsoft/LightGBM
======
tadkar
This looks like an interesting project. I'd take the accuracy results with a
pinch of salt because growing deeper trees often improves accuracy and in the
test scenario xgboost is handicapped by limited depth. As the author says on
reddit its difficult to do an apples to apples comparison of the two methods
directly, because their approach to growing trees is very different [a bit
like DFS vs BFS]

The thing that is more relevant for 'real-world' data is whether this library
supports categorical features at all. The answer seems to be that it doesn't
(then again neither does xgboost).

The text in the Parallel experiments section [1] suggests that the result on
the Criteo dataset was achieved by replacing the Categorical features by the
CTR and the count.

[1] From
[https://github.com/Microsoft/LightGBM/wiki/Experiments#paral...](https://github.com/Microsoft/LightGBM/wiki/Experiments#parallel-
experiment): "This data contains 13 integer features and 26 category features
of 24 days click log. We statistic the CTR and count for these 26 category
features from first ten days, then use next ten days’ data, which had been
replaced the category features by the corresponding CTR and count, as training
data. The processed training data has total 1.7 billions records and 67
features."

~~~
nerdponx
At the risk of outing myself as behind the cutting-edge, what is CTR?

~~~
tadkar
Sorry, CTR=click through rate. The Criteo dataset is a real world ad-click
prediction task.

------
TheGuyWhoCodes
Looks fantastic!

I'd love to have a python interface for this, just drop a pandas frame, maybe
scikit-learn interface with fit/predict. Saving/Loading models... This will
definitely boost adoption.

~~~
minimaxir
That is explicitly in the future plans:
[https://github.com/Microsoft/LightGBM/wiki/Features](https://github.com/Microsoft/LightGBM/wiki/Features)

------
nl
[https://github.com/Microsoft/LightGBB/wiki/Experiments#compa...](https://github.com/Microsoft/LightGBB/wiki/Experiments#comparison-
experiment)

At least 3 times faster than XGBoost _AND_ more accurate. Wow.

I'm off to Kaggle now.

~~~
dswalter
I'm guessing you meant to link this instead.
[https://github.com/Microsoft/LightGBM/wiki/Experiments#compa...](https://github.com/Microsoft/LightGBM/wiki/Experiments#comparison-
experiment)

~~~
nl
Indeed. Pretty bad when I can't even cut and paste. No idea how I managed
that, but too late to edit it now.

------
nerdponx
Very interesting. Growing trees "leaf-wise" is more intuitive in my opinion.

That said, I don't see a single equation on that page. Is there an Arxiv paper
or something behind this?

------
itschekkers
looks nice, especially in reducing memory use. it would have been great if
they built in k-fold cross-validation by default too

------
botexpert
10-20 year old methods implemented properly. Amazing.

