
Unsupervised Learning with Even Less Supervision Using Bayesian Optimization - Zephyr314
http://blog.sigopt.com/post/140871698423/sigopt-for-ml-unsupervised-learning-with-even
======
Zephyr314
One of the co-founders of SigOpt (YC W15) here. I'm happy to answer and
questions about this post or the methods used. More info on the Bayesian
methods behind this can be found at sigopt.com/research as well!

~~~
bearzoo
Well - just my two cents. The title feels inaccurate. You all are tuning hyper
parameters with respect to the performance of the classification task. The
bayesian optimization is really to optimize the unsupervised -> supervised
pipeline. I was expecting some bayesian optimization of strictly unsupervised
representation learning (ex. we have an autoencoder and use some bayesian
optimization to tune hyper parameters in order to minimize a reconstruction
error). This is really just supervised learning with even less supervision
(which is quite typical).

~~~
Zephyr314
Thanks for the note!

We're using Bayesian optimization to tune both the hyperparameters of the
unsupervised model and the supervised model, but you are correct that they are
being done in unison with the overall accuracy being the target. The lift you
get from adding the unsupervised step (and tuning it) is quite substantial
(and statistically significant).

The idea of tuning just the unsupervised part (or doing it independently) is
great though. All the code for the post is available at
[https://github.com/sigopt/sigopt-
examples/tree/master/unsupe...](https://github.com/sigopt/sigopt-
examples/tree/master/unsupervised-model). It would be interesting to see if
doing that would make for a better overall accuracy.

------
lqdc13
Is this the first OHAAS (Optimize Hyperparameters As A Service)?

~~~
IanCal
There is/was whetlab which got bought out by Twitter if I remember rightly.
It's a shame as I was using them and wanted to do it more.

~~~
bearzoo
[https://github.com/JasperSnoek/spearmint](https://github.com/JasperSnoek/spearmint)

~~~
Zephyr314
We've found that SigOpt compares very well to spearmint, as well as MOE [1],
which I wrote and open sourced around the same time spearmint was open
sourced. We have a paper coming out soon comparing SigOpt rigorously to
standard methods like random and grid search as well as other open source
Bayesian methods like MOE [1], spearmint, HyperOpt [2], and SMAC [3] with good
results.

[1]: [https://github.com/Yelp/MOE](https://github.com/Yelp/MOE)

[2]:
[https://github.com/hyperopt/hyperopt](https://github.com/hyperopt/hyperopt)

[3]:
[http://www.cs.ubc.ca/labs/beta/Projects/SMAC/](http://www.cs.ubc.ca/labs/beta/Projects/SMAC/)

~~~
bearzoo
With spearmint I had the ability to modify the parameters of the mcmc sampling
(e.g. burn in iterations). Will sigopt expose parameters for those of us who
want to manipulate them? Will there be options to use different types of
function estimators to estimate the mapping between hyper params and
performance (i.e. what if I would like to use a neural network or a decision
tree instead of gaussian processes)?

I say these things because as someone who is active in machine learning - I
often want to optimize hyper parameters. The type of people that are serious
about optimizing hyper parameters (i.e. people who may not like to use grid or
random searches) for a model are usually some what technical. Your product
seems to be catered to those who may not be too technical (very simple
interface, etc). How will you balance what you expose in the future without
giving away too much of your underlying algorithms?

~~~
Zephyr314
As you pointed out, it is all about a balance, and every feature has different
tradeoffs.

SigOpt was designed to unlock the power of Bayesian optimization for anyone
doing machine learning. We believe that you shouldn't need to be an expert and
spend countless hours of administration to achieve great results for every
model. We're wrapping an ensemble of the best Bayesian methods behind a simple
interface [0] and constantly making improvements so that people can focus on
designing features and their individual domain expertise, instead of needing
to build and maintain their own hyperparameter optimization tools to see the
benefit.

For experts who want to spend a lot of time and effort customizing,
administering, updating, and maintaining a hyperparameter tuning solution I
would recommend forking one of the open source packages out there like
spearmint [1] or MOE [2] (disclaimer, I wrote MOE while working at Yelp).

[0]: [https://sigopt.com/docs](https://sigopt.com/docs)

[1]:
[https://github.com/JasperSnoek/spearmint](https://github.com/JasperSnoek/spearmint)

[2]: [https://github.com/Yelp/MOE](https://github.com/Yelp/MOE)

~~~
bearzoo
thanks for all the great responses!

------
pizza
Now just throw some compressive sensing at the problem ;)

