
AutoML toolkit for neural architecture search and hyper-parameter tuning - msalvaris
https://github.com/Microsoft/nni
======
mark_l_watson
I manage a machine learning team for a large financial services company and
AutoML tools, Microsoft’s NNI included, are on our radar.

I think the `future of work` for machine learning practitioners will quickly
separate into two groups: a very small and elite group that performs research
and a much larger groups that use AutoML but whose jobs also deal more with
data preparation (which gets automated also) and ML devops, supporting models
in production.

~~~
mlthoughts2018
This sounds like parody to me. There are so many problems in applied
statistics, and neural networks are not helpful for most of them. Consider
Bayesian analysis for very small data sets as an example (just the tip of the
iceberg).

In financial services in particular, there are tons of time series and
regression problems on small data such that a neural network (beyond perhaps
some super small MLP) would be a ridiculous thing to try.

I think the breakdown of workload you described will only happen in business
departments where there is a need for large scale embedding models, enhanced
multi-modal search indices, computer vision and natural language applications,
and maybe a handful of things that eventually productize reinforcement
learning. I could also see this happening in businesses that can benefit from
synthetically generated content, like stock photography, essays / news
summaries / some fiction, website generators, probably more.

What I described above is a tiny drop in the ocean of applied statistics
problems that business have to solve.

~~~
byebyetech
Deep Learning also works on very small data sets by means of embeddings. A
large model trained on large data sets can be used as feature extraction tool
for training for small data sets.

~~~
paperwork
I’ve seen this mentioned before, including a blog post by the fast.ai folks.
Any idea where I can get details? If my tabular data set is small, what kind
of embedding can I get out of it? Or is the idea that a larger data set is
used for embeddings of categorical data?

~~~
yorwba
Pre-trained embeddings are only helpful if they are trained on a different
(ideally larger) dataset or even a different task, but with the same kind of
input data. So you would need to find out where else something similar to the
data in your tables appears. If some of the data is text, word embeddings may
be applicable. Or if you're trying to analyze user activity by time and
location, you might try to transfer what can be learned about the influence of
holidays from publicly observable activity e.g. on Twitter (just a random idea
that popped into my head, no guarantee that it can actually work).

Of course if all you have are numbers without context, there isn't a lot you
can do to improve the situation.

------
williamsmj
I'd be interested in the creator's thoughts on this paper, "Random Search and
Reproducibility for Neural Architecture Search",
[https://arxiv.org/abs/1902.07638](https://arxiv.org/abs/1902.07638), posted
on the arxiv last week. Among other conclusions, they find:

"Our results show that random search with early-stopping is a competitive NAS
baseline, e.g., it performs at least as well as ENAS, a leading NAS method, on
both benchmarks"

ENAS, the specific algorithm that they find does no better than chance, is in
this library. My understanding is that the results are pretty generic though,
i.e. NAS is very far from a solved problem. (Hyperparameter tuning for
"classical" models are another matter. That's commoditized and available as a
service at this point, see tpot, DataRobot, etc., etc.)

------
wongarsu
> We support Linux (Ubuntu 16.04 or higher), MacOS (10.14.1) in our current
> stage.

No Windows support in a Microsoft product. Curious.

This looks very useful for tuning hyper-parameters, and the fact that the
tuned algorithm is treated as a black box makes this very flexible.

~~~
yeahhhhh
Actually, they will support in Windows later. Due to many developers usually
train their deep learning model in Linux, so they support Linux and Max first.

------
perturbation
Their example with LightGBM
([https://nni.readthedocs.io/en/latest/gbdt_example.html](https://nni.readthedocs.io/en/latest/gbdt_example.html))
is very cool - I wanted to put together a custom script with mlflow + catboost
+ mlrMBO to do something similar, but this puts everything together in one
package.

I think this does everything MLFlow does and more (besides maybe helping with
deployment?)

------
yzh
I'm working on auto hyper-parameter tuning and network optimization, I always
think that people have put too much focus on NAS, which aims to create a whole
new network from scratch, but not nearly enough on hyper-parameter tuning and
local structural optimizations for an existing network, which I think is more
demanding at least in the industry. Looks less cool than NAS though, maybe
that's the reason.

------
sgt101
I don't understand - isn't this model fishing? How is it different?

~~~
thanatropism
With training, test and validation sets.

In good old fashioned statistics there's the idea of the jackknife: for the
i-th sample run a regression on all the data except i, and store statistics of
interest (coefficients, predictions, etc). This gives you an ipso facto
sampling distribution for the statistics of interest.

Similar and more common in econometrics is the bootstrap: run your model in
like 1999 subsamples (with repetition) of the data and get sampling
distributions.

With said sampling distributions, whether from the jackknife or the bootstrap,
you're able to test whether your model is valid -- what's the probability that
it'll have significant coefficients or an r2/mae/mape score indicating
predictive capacity.

Cross-validation (and even scikit-learn is starting to default to five folds
not three) is a "lazy" version of this. You don't get a sampling distribution
but at least you're able to know that a given model appears good because it
grips the data with all its might and doesn't work out-of-sample.

sklearn even offers the jackknife under some ML-y name like "one at a time
scoring".

------
sandGorgon
interesting - there's no scikit support, which for long has been the mainstay
for data scientists everywhere.

Are people migrating from scikit to tensorflow in production for non-deep
learning usecases ?

~~~
mmq
I think it should probably support scikit as well as any other library, since
it's only making suggestions of hyper-parameters based on recorded/historical
observations or random evaluations.

At least that's the behaviour of the platform[1] I am working on.

[1]: [https://github.com/polyaxon/polyaxon#hyperparameters-
tuning](https://github.com/polyaxon/polyaxon#hyperparameters-tuning)

~~~
pplonski86
I think it all depends on the purpose of the library and who is a target user.
The NNI is a package for tuning neural networks models, it will be mostly used
in use cases that require deep neural networks, like image classification or
voice recognition.

BTW, I think all autoML solutions forget about end users. They all require too
much engineering knowledge from the user. I think it will be nice to have an
autoML solution that can be used by citizen data scientist.

~~~
minimaxir
> BTW, I think all autoML solutions forget about end users. They all require
> too much engineering knowledge from the user. I think it will be nice to
> have an autoML solution that can be used by citizen data scientist.

This is the approach of a project I am currently working on. (and am now
explicitly making clear in the README!)

~~~
pplonski86
Could you provide some link to the project?

------
nurettin
Do we need a hyper-parameter tuner tuner for this?

~~~
mlthoughts2018
Stuart Geman (one of the inventors of Gibbs Sampling) always used to say,
“Parameters are the death of an algorithm.”

~~~
nurettin
Environmental constraints (like width, height) are not bad. I would have
argued Mr. Stuart.

------
angel_j
Does it test against and prevent over-fitting?

------
hestefisk
This is very cool.

