

Scalable Bayesian Optimization Using Deep Neural Networks - groar
http://arxiv.org/abs/1502.05700

======
cs702
In short, these guys are using deep neural nets to find good hyperparameters
for training other deep neural nets, and this works as well as a Gaussian
Process[1] but is more scalable and can be parallelized, allowing for faster
optimization of hyperparameters.

\--

[1] For example, like Spearmint:
[https://github.com/JasperSnoek/spearmint](https://github.com/JasperSnoek/spearmint)

------
abeppu
I haven't read the paper, just skimmed through it, but isn't this a sort of
unreasonable comparison? Full GPs are O(N^3) because they do inverse on a
covariance matrix that basically allows every datapoint to be related to every
other datapoint. There exists a bunch of literature on sparse approximate
Gaussian processes which iirc are basically O(N*M) where N is the data and M
is basically the size of some set of active points (which is tunable). That
seems like the natural point of comparison to their neural net approach.
Broadly, in their deep net, they've chosen some architecture which determines
the number of parameters. It seems like either the neural net or the sparse GP
approach can claim to have runtime which is linear with the data size, and is
doing less work than the full GP for basically the same reasons.

------
cschmidt
As a completely trival point, one of the co-authors only has one name,
Prabhat. Google, to its credit, has his work page as the first result.

