

How to Evaluate Machine Learning Models: Hyperparameter Tuning - mirceasoaica
http://blog.dato.com/how-to-evaluate-machine-learning-models-part-4-hyperparameter-tuning

======
mbq
It is sad that the post fails to acknowledge that hyperparameter tuning may
also be a source of overfitting (and very often is) -- one should always treat
is as a part of training and validate as such, for instance by using nested
CV. And never ever report accuracy straight from tuning as a final result.

~~~
alicez
Don't be sad. I'm happy to update the blog post as needed. By overfitting, do
you mean over-optimizing the results on the validation set? Based on what I
understand about nested CV, it is only necessary if 1. the hold-out validation
set is way too small and not representative of the overall data distribution,
or 2. if the model training procedure itself is unstable and produces models
with wildly varying results on the same dataset.

To prevent overfitting to the training data, one performs hold-out validation
or cv or early stopping in the training process.

To prevent overfitting of hyperparameters to a _small_ validation dataset, or
to mitigate the variance of the model training outcome, one can use nested cv.

Is that along the lines of you were looking for?

~~~
mbq
It is a common misconception and a huge source of disappointment with ML --
without proper validation of the whole model building procedure (method
selection + parameter tuning + feature selection + fitting) no amount of data
and magic tricks will make you sure that there is no overfitting. Even a
single hold-out test is risky because gives you no idea about the expected
accuracy variance.

~~~
alicez
Well, you can use the bootstrap to calculate the variance. It costs
computation. But it works. Cosma Shalizi wrote a really nice introduction to
it: [http://www.americanscientist.org/issues/pub/2010/3/the-
boots...](http://www.americanscientist.org/issues/pub/2010/3/the-bootstrap/1)

------
murbard2
A very easy simple improvement over random search is to use a low discrepancy
sequence. They were designed almost for this purpose (avoiding the problems
caused by irrelevant dimensions). I don't know why I never see it suggested...
it's not as good as gaussian process modeling, but it's very easy to implement
and it clearly dominates random search for this application.

~~~
alicez
Great idea. Will run some experiments to see how it performs. It sounds
analogous to kmeans++ initialization. Sobol sequences ring a bell. Some of the
Bayesian optimization software libraries may in fact use a Sobol sequence of
initial evaluations. But it may not be well documented.

~~~
doobwa
Spearmint, for example:
[https://github.com/JasperSnoek/spearmint/blob/master/spearmi...](https://github.com/JasperSnoek/spearmint/blob/master/spearmint-
lite/ExperimentGrid.py#L188)

------
vii
In a happy situation the sensitivity of the optimzation to hyperparameter
changes is low. That's why the 'random' approach provides reasonable results.
If the optimization quality were heavily dependent on the hyperparameter, for
an exaggerated example, only providing good results for exactly one value of
the hyperparameter, then guessing 60 times and getting within 5% of the best
value of the hyperparameter would not guarantee a good model optimization.

The main difficulty with hyperparameters is that one often does not actually
know a priori a reasonable range to search in. Suppose you have a
regularisation constant C - without some calculation based on your data how
can you pick that constant? By picking the range of the hyperparameter, the
problem is just punted to a hyperhyperparameter.

More interesting than blindly guessing values, is measuring the sensitivity of
recall, precision and cross validation performance to changes in the
hyperparameters. Make sure that the sensitivity is low!

------
skadamat
There's research that shows that in practice, grid search / random search beat
most of the alternatives. It's also easier to parallelize thankfully!

~~~
elliott34
Check out whetstone labs. Their Bayesian grid search tech is awesome

~~~
doobwa
I think you mean [https://www.whetlab.com/](https://www.whetlab.com/)

