
Random Forest vs. Neural Network - pplonski86
https://mljar.com/blog/random-forest-vs-neural-network-classification/
======
eden_h
This is a very thinly disguised advert for the author's product, and doesn't
really advance on the benefits of either approach, as it doesn't go into any
depth on why Random Forests/NNs are applicable to each type of data provided.

They're both generalised solvers, but default Random Forests aren't the most
common Forest these days - LightGBM/XGBoost are both using Gradient Boosted
Forests by default, which would be a much more interesting comparison to a
NN's Gradient Boosting.

~~~
turingbike
I don't know why it isn't as popular, but CatBoost should be on the list too
[https://catboost.ai/](https://catboost.ai/)

~~~
eden_h
I tried Catboost when it came out. It _should_ be very popular, as working
with categories is where a lot of people seem to fall down in Random Forests.

The 'typical' response is either to make them into numeric variable, so 1-3
for 3 categories, or to make an individual column for each one. The first
approach makes sense for ordinals, but not so much for actual categories, and
the latter makes it difficult to group categories when a group of two
categories together has more predictive capability than any single group. I
know that LightGBM did a lot of work in this to optimise testing groups of
variables, as testing every possible group in a large set is very intensive.

When I tried Catboost in R, I remember it downloading a large binary to work
with, which put me off considerably, and predicting with it was pretty
fragile, even for R. I trust Yandex about as much as I'd trust Google, but it
seemed 'odd'.

~~~
ScoutOrgo
I think it is actually preferable to start by converting categorical variables
to numeric most of the time, even if they are not ordinal. The RF algo can
separate off individual classes with 2 splits (e.g. <=7 then >=7) if a single
class is very important. The "pool" of features for RF sampling also doesn't
get diluted with one hot encoded classes from the one feature.

I am pretty sure I've seen this done successfully in kaggle a bunch before,
but don't have any sources on hand for evidence that this method is "better".
It does however make it much easier to just throw the data into the RF and
check the feature importances to see which features are helping the most.

~~~
eden_h
The only case it struggles with is when the grouping is difficult to achieve
in a small amount of splits, such as 1,3,5 against 2,4,6,7, especially when
each split will need to show more predictive capability against any of the
other column options.

------
turingbike
The article recommends RF for tabular data because it is easier. In general I
agree, but newer tools are making NN for tabular data as easy as can be...
see, for example, fastai
[https://docs.fast.ai/tabular.html](https://docs.fast.ai/tabular.html)

~~~
ramraj07
The biggest reason to use RFs is that with sufficient trees it's basically
impossible to overfit your data. You also don't need to spend days optimizing
your hyperparameters. Hence, if you need a quick model where time is
tantamount, and you want to err in the side of caution, I feel like an RF is
the best choice.

~~~
yters
I would have assumed the opposite, with enough trees you are guaranteed to
overfit your data? Boosting increases the VC dimension of the aggregate model,
which makes it more prone to overfitting.

~~~
pplonski86
Random Forest doesn't increase overfit error when adding more trees. I did the
experiment on toy dataset to check it [https://mljar.com/blog/random-forest-
overfitting/](https://mljar.com/blog/random-forest-overfitting/)

What is more, Leo Breimain wrote on his website: "Random forests does not
overfit. You can run as many trees as you want"
[https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home...](https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#remarks)

~~~
rq1
These are (false) claims, therefor not proofs.

Deep trees will fortunately overfit your dataset.

Any binary tree of depth log2(P) can completely separate your P points.

~~~
ramraj07
But we are merging hundreds of trees each of which has been handicapped by
removal of multiple features and a fraction of the data. Sounds to me like
overfitting is not easy (no single data point or feature contributes to every
tree so it can't be represented all the time).

False claims as they maybe, these are claims I've seen in at least two of the
most commonly studied statistical learning text books, so given that it makes
sense and that it's in the text books, it seems reasonably not false to me.
Someone else posted that if too many features or data points are very similar
then it will overfit, and that totally makes sense. Whatever you say doesnt.
Clarification would be useful.

~~~
yters
Adding bunches of trees will overfit the accidental patterns in your data.

I have an explanation here why reducing variance is not the same as reducing
overfitting:
[https://news.ycombinator.com/item?id=20089890](https://news.ycombinator.com/item?id=20089890)

------
vbarrielle
Random forests can be applied to images. The RF algorithm only needs to be
tweaked to split its trees by comparing the difference of two pixels to a
threshold.

------
meesterdude
Been using the annoy library lately
([https://github.com/spotify/annoy](https://github.com/spotify/annoy)) after
becoming frustrated with TF not behaving as I'd expect. There's another lib in
python, nmslib, but I can't get seem to get it to work right - and the docs
are crap.

Anyway, would love to pick someone's brain about this stuff to help fill in my
gaps.

------
RocketSyntax
No mention of backpropagation

