
NeurIPS 2020 Optimization Competition - mccourt
http://bbochallenge.com/
======
wenc
Interesting. There's been decades of research on Derivative-Free Optimization
(DFO) and stochastic/evolutionary algorithms (most of which are derivative-
free). They're used in practical applications, but have been hard to reliably
perf benchmark because solution paths are so dependent on initial guess and
random chance.

This one focuses on maximizing sample efficiency. That's an interesting (and
important) metric to benchmark, especially for functions that are
computationally expensive to evaluate, like full-on simulations. Sounds like
the algorithm would need to be able to efficiently come up with an accurate
surrogate model for the expensive function -- which is hard to do in the
general case, but if something is known about the underlying function, some
specialization is possible.

~~~
RandomWorker
Solution To benchmark in general is to run several optimization runs with a
random innitialization. Though this has its limitations.

A great optimizer is RBFopt, (python based, free, fast accurate) which is able
to do very well and creates a surrogate model while optimizing. My go to
optimizer at this point for engineering projects. If anyone knows a better
piece of software let me know.

~~~
wenc
Fascinating! I skimmed the details in the paper here [1] and my impression is
that it looks pretty solid. It uses Gutmann's RBFs as surrogate structures. It
also benchmarks pretty well against stochastic/metaheuristic algorithms [2].

However most examples in the paper were somewhat small and the problem type is
restricted to MINLPs with box constraints (which is still tremendously useful,
especially for hyperparameter optimization in simulations).

Just curious, what's the largest problem size you've managed to solve? (no. of
constraints, binaries/continuous vars)

[1] [http://www.optimization-
online.org/DB_FILE/2014/09/4538.pdf](http://www.optimization-
online.org/DB_FILE/2014/09/4538.pdf)

[2]
[https://www.sciencedirect.com/science/article/pii/S228843001...](https://www.sciencedirect.com/science/article/pii/S2288430018300721)

~~~
RandomWorker
Hi paper, is in the works, we designed a bilevel optimization with rbfopt on
the upper level and a linear problem on the lower level. Number of constraints
are in the 1000s for the linear problem. The inputs to the lower level are 17,
and a single output object. We find 400 evaluations of the lower level model
for the most important object we will converge to a global minimum. I test
ga’s swarm couple of other they didn’t even see the solution found by rbfopt
in 800 evaluations.

------
cs702
NeurIPS, 2020: "We need more sample-efficient algorithms for finding better
hyperparameters that specify how to train computationaly expensive deep
learning models."

Rich Sutton, 2019: "The biggest lesson that can be read from 70 years of AI
research is that general methods that leverage computation are ultimately the
most effective, and by a large margin."
([https://news.ycombinator.com/item?id=23781400](https://news.ycombinator.com/item?id=23781400))

I wonder if in the end simply throwing more and more computation at the
problem of finding good hyperparameters will end up working better as
computation continues to get cheaper and cheaper.

~~~
wenc
It certainly does look that way for certain classes of problems, as witnessed
by the evolution of GPT language models, where the model gets better through
sheer use of compute resources.

For many combinatorial problems however, improvements in algorithms can often
produce bigger strides than just throwing brute force compute at the problem.
Take Mixed Integer Programs (MIPs) -- roughly the optimization-equivalent of
SATs -- used for airline scheduling, optimal assignment problems and such. In
slide 12 [1] (there are other sources that corroborate), the author notes that
MIP solver performance between 1988-2017 had improved 2,527,768,000x.

17,120x was due to machine improvements (single core). 147,650x was due to
algorithmic improvements. Multiple cores can also provide a performance boost
up to a point, before saturating due to coordination costs. The author notes
that "A typical MIP that would have taken 124 years to solve in 1988 will
solve in 1 second now".

The biggest improvements in MIP algorithm performance have been due to
improvements in solver heuristics (!), because the fastest computations are
those that don't have to be performed at all -- i.e. that are eliminated via
heuristics.

[1] [http://www.focapo-cpc.org/pdf/Linderoth.pdf](http://www.focapo-
cpc.org/pdf/Linderoth.pdf)

~~~
cs702
_> 17,120x was due to machine improvements (single core). 147,650x was due to
algorithmic improvements._

That is just... insane. I knew only vaguely that performance had improved
significantly for many NP-hard/complete problems in practice, but I did not
realize the magnitude of improvement, especially due to better algorithms.

 _> The biggest improvements in MIP algorithm performance have been due to
improvements in solver heuristics (!), because the fastest computations are
those that don't have to be performed at all -- i.e. that are eliminated via
heuristics._

That is also... remarkable. Thank you for sharing.

I can't help but agree with you :-)

EDIT: Given that most of the "algorithmic" improvements have been due to
better solver heuristics, I imagine it should be possible to train meta DL/RL
models that learn how to find good heuristics for training DL models with high
sample efficiency. Come to think of it, this competition seems to be asking
precisely for such "black-box heuristic-guessing" models, so clearly there are
people working on it.

~~~
wenc
Yes, when Bill Bixby (founder of CPLEX and Gurobi, companies that made and
still make the fastest MIP solvers in the world) revealed similar numbers a
few years ago, many of us practitioners were astounded too.

But those improvements were also a product of lots of smart people funded by
cash-rich industries (i.e. oil & gas, airlines... MIPs are big bucks
commercially. I recall at one point a commercial CPLEX license was $100k list)
poking at the problem for over 30 years. Many Ph.D.s in operations research
and mathematical optimization were generated on this topic alone.

~~~
cs702
Makes sense.

Note that now we have lots of smart people funded by new cash-rich industries
(search, social networks, SaaS, etc.) poking at the problem with "black-box"
approaches. I will be interesting to see what comes out of it.

------
mpfundstein
if anyone wants to do this. i have a threadripper build with 2 2080ti. would
be cool to do a group project. write pm if you want. i am located in
amsterdam, europe

~~~
mkl
I don't want to participate, but you're not going to find anyone as it stands.
There are no PMs on HN, and you have no contact information in your profile
(accounts' email addresses are not public).

~~~
mpfundstein
oops. thx

------
reedwolf
Search is the problem that solves all other problems.

~~~
mpfundstein
back to square 1 :-)

