
Nevergrad: A Python library for performing derivative-free ML optimization - jimarcey
https://code.fb.com/ai-research/nevergrad/
======
oteytaud
To the best of my knowledge, Hyperopt is limited to random search and Parzen
variants. We have more algorithms, and include test functions, deal with
noise. On the other hand, in Hyperopt conditional variables are naturally
handled, whereas for the moment Nevergrad needs user manual work on this.

Both frameworks are asynchronous.

------
breckuh
Are there any practical Pytorch examples? Say my network training time is 12
hours, I wonder how beneficial this would be for hyperparameter tuning over
just simple grid/random search? Or would I instrument my network in a way to
iterate over hyperparams faster than at every epoch/run?

~~~
oteytaud
We have not yet released examples of interfaces with Pytorch. Maybe with
moderate number of hyperparameters the benefit compared to random search will
be moderate, whereas it will be very significant with high number of
hyperparameters. It also depends on how parallel you are. In all cases we have
a wide range of algorithms with a common interface, so that you can compare.

We also use it for direct training of the weights of a network in
reinforcement learning, not only hyperparameters.

~~~
geedy
Can you elaborate on the benefit for a high number of hyper parameters?

~~~
sliem
A fundamental problem is as the number of parameters increase the probability
of sampling from the edge of the hypercube increases. You will then not
effectively explore the parameter space. This might be some what alleviated by
a concentrated multivariate normal, but I guess that has its own caveat.

If you instead have a sampling algorithm informed by the loss functions you
avoid this problem. (You instead might have to worry about local minima.)

------
snthpy
How does this compare to hyperopt?

~~~
oteytaud
To the best of my knowledge, Hyperopt is limited to random search and Parzen
variants. We have more algorithms, and include test functions, deal with
noise. On the other hand, in Hyperopt conditional variables are naturally
handled, whereas for the moment Nevergrad needs user manual work on this. Both
frameworks are asynchronous.

------
torgian
Nevergrad?

I wonder if the maker never graduated.

;-)

~~~
keypusher
> Nevergrad offers an extensive collection of algorithms that do not require
> gradient computation

~~~
torgian
I like my idea better

~~~
ngcc_hk
You are not alone

------
fulafel
Would this type of thing be suited for program synthesis or property based
testing?

~~~
oteytaud
For property-based testing I would say yes, with an objective function equal
to the margin by which the properties are satisfied.

Program synthesis only in some particular cases, like the parametrization of
programs for speed or another criterion - but not in the general case of
program synthesis.

------
mikejulietbravo
Has anyone tried this? Interested to know if results were in line with the
benchmarks.

~~~
oteytaud
We have a wide range of experiments on plenty of objective functions in games,
reinforcement learning, in real world design and machine learning
hyperparameter tuning - these reports will come soon.

------
brokensegue
isn't this the same thing as blackbox learning?

~~~
oteytaud
It's black-box optimization. This means that we just have an objective
function, without access to derivatives or whatever other information. This is
not relevant for training weights in deep learning for image classification,
or other things for which the gradient works well.

~~~
lostmsu
There was a recent paper from Uber, that GA works well for weights, so I
wouldn't drop that area right away.

~~~
oteytaud
Sure GA can be great for weights as well - but mainly when gradient is
unreliable. I would not use Nevergrad for training the weights of a
convolutional network for image classification for example; whereas I use
Nevergrad for WorldModels.

~~~
lostmsu
Doesn't the model Uber used begin with a bunch of convolutional layer sets,
since it processes raw images?

