
Building Auto-Tuners with Structured Bayesian Optimization [pdf] - blacksmythe
https://www.cl.cam.ac.uk/~mks40/pubs/www_2017.pdf
======
orasis
This works if you already understand ahead of time how tuning parameters
changes performance.

If your model is weak, then it looks like this falls back to normal Bayesian
Optimization based on a Gaussian Process, which has performance of O(N^3),
which means in practice you can't do more than a couple thousand samples.

~~~
steviedc
From a standard Bayesian Optimisation approach, without having the need for
in-depth knowledge into the relationship between the individual parameters and
the overall performance, BO algorithms can be shown to converge on the global
maximum of an application's configuration parameters (N>10) in significantly
fewer than 1000 iterations, while also yielding large performance gains. This
is of course, provided the choice of kernel and hyper parameters are a true
representation of both the continuous and categorical configuration space.

~~~
elcritch
It appears that while the specific relationship doesn't need to be known,
modeling the general relationship between some of the parameters helps speed
up the search. Another way of thinking about it, this seems to be a somewhat
straightforward approach to encoding prior information into the BO approach.
How is that different than what you mention as the choice of kernel? Or do
they relate?

~~~
steviedc
I don't think they directly relate. The premise of the paper seems quite an
interesting one and anything that can encodes prior knowledge about the
objective function while improving the convergence rate is always welcomed,
particularly when it is expensive to evaluate. In terms of my original comment
I was referring to standard Bayesian Opt (GP) and my experiences with
convergence rates when optimising applications. In these algorithms the kernel
underlies the accuracy of future predictions and it is important that is
reflects the configuration space. For the majority (if not all) applications
the parameter space will consist of continuous and categorical parameters. It
is vital that the kernel or choices of kernels over various dimensions encodes
the structure of these encoded variables.

~~~
elcritch
Ok, thanks for the feedback! It's good to know to keep an eye out for the
choice of kernel (esp. continuous vs categorical) with respect to the
structure of the encoded variables. I've been tossing around ideas in
particular for how to encode PID feedback control set points, but I'm still
befuddled as to what kernel's to use which would give good convergence rates.
It feels like it should be something straightforward, but just haven't cracked
it yet. Granted, thats only after haphazardly reviewing the literature.

