We use a solution inspired by Jasper Snoek's Spearmint, which allows to somewhat parallelize bayesian search through a concept called "fantasies". If you've never read his paper I encourage you to find it on arxiv. The most gentle and thorough introduction to this topic that I've seen.
Unfortunately I have quite a backlog of other projects that I need to clear out, though Simple is AGPL so anyone is free to fork it. However I am also looking for employment opportunities.
Simple is already highly parallelizeable due to the fact that it relies on dynamic programming. Adding more threads of execution is just a matter of popping off another candidate test point from the priority queue while the objective function is already being evaluated at another point. I was planning to include this feature in a more mature release.
If you take a simplex of length size d as starting simplex so that it contains the unit cube, does it still beat Bayesian Optimization ?
Or maybe map the simplex to the cube but it will distort things a lot.
What is a typical value for the dimension of the space where your algorithm should perform ?
You're right that it's tricky to make a direct comparison between different shaped optimization domains, however in the comparison animation each algorithm was given an optimization domain of equal area. For comparisons in higher dimensions it is up to the user to specify optimization domains with equal content.
Simple's runtime performance is not affected by the size of the optimization domain, however any global optimization algorithm will necessarily need to make more samples to chraracterize a higher dimensional space because these spaces inherently have more content to explore. Essentially the runtime performance of the algorithm and its sample-efficiency are two orthogonal concepts. Simple promises to be at least as sample-efficient as Bayesian Optimization while offering significantly lower computational overhead. Its main selling point isn't improved sample efficiency so much as better performance.
Does that answer your questions?
EDIT: Regarding the appropriate problem dimensionality, that will tend to depend quite a bit on the specific optimization problem. Simple is a global optimizer, so for extremely dimensional problems like training a deep neural net it would not be appropriate because the optimization domain would be too massive. The algorithm needs more testing, but as a rule of thumb I would say that anything under 500 dimensions is feasible.
That's a fair point about the time complexity, though the algorithm does at least demonstrate a real-world speedup of several orders of magnitude against fmfn's Bayesian Optimization implementation. However, I am not sure which performance optimizations he has implemented. Simple boils down to a handful of constant size matrix multiplies and a heap insertion at each iteration, so its performance is hard to beat.
I will update the first paragraph of the readme.
Generally speaking the heuristic for applying BO to optimization problems is that the objective function must take at least as long to evaluate as an iteration of BO. For an equally sample-efficient algorithm with negligible overhead, you can now take twice as many samples and find better optima for a large class of problems.
The shown examples are all 2D because plotting high dimensional space is rather difficult. Also, this is intended as an early proof of concept. A more formal description of the algorithm is coming, along with better examples.
While it shares a lot of similarities with conventional Bayesian Optimization, I was reluctant to call it BO since it doesn't have a concept of explicit error bounds.
EDIT: I see now that this is not an implemention of the algorithm in the paper, but its own algorithm entirely. Even so, it appears to me the same technique could be used in this case.
Yes, I hope to add that feature soon-ish. The reason I left it out of this first release is that computing Delaunay triangulations scales very poorly with dimensionality. That was why TRIOPT was restricted to low dimensional spaces. I will have to implement a more efficient way to compute nonoverlapping triangulations. In the mean time, you can simply make the optimization domain bigger to encapsulate the shape of your problem domain.
I'm currently away from my computer but I can produce a comparison animation in a few hours. While I am biased, I strongly suspect that Simple will have superior sample efficiency and at least similar performance.
Simple is a constrained optimizer while CMA-ES is unconstrained, so I ended up choosing a sigma value such that the three sigma cutoff could be considered its faux "optimization domain." I gave the two algorithms equal areas to operate on, however CMA-ES is very sensitive to where it's initialized at. If I had started it near the global optimum it would have instantly converged without performing any optimization at all, so then there was also the issue of deciding on a "fair" starting location. Trying to get around this, I briefly tried the comparison on the Griewank function from -5:5, however CMA-ES showed nonconvergent behavior.
CMA-ES seemed to have a higher startup cost than Simple, but very similar per-iteration runtime performance. Sample efficiency ranged from totally nonconvergent to instant convergence without global search, depending on where it was started.
If you have any suggestions for a better testing methodology I would really appreciate them, because I'm still not quite satisfied with this comparison.
When I need constraints on CMA-ES for real world stuff I've been using a simple equation to map the output of the parameters before using them:
y = a + (b - a)*(1 - cos(pi * x / 10)) / 2
If I get a chance today I'll try and run your code and play around a bit. Really cool stuff!
Very loosely, but not particularly. Simple makes an effort to interpolate between points rather than using the Lipschitz constant to estimate an upper bound. This is important for regions of the search space which are "objective deserts." LIPO would continue to sample a large but poorly performing region of the search space even after it had been demonstrated to not contain good candidates, simply because its estimated upper bound varies linearly with distance from a sample. Simple combines a separate measure of how much it has already explored a region of space with the interpolated objective value to handle these cases better.