
How to use Deep Learning when you have Limited Data - sarthakjain
https://medium.com/nanonets/nanonets-how-to-use-deep-learning-when-you-have-limited-data-f68c0b512cab#.sv478elrs
======
sarthakjain
Machine Learning and AI seem to be in vogue but become tough to implement
unless you have boatloads of data. We've personally had multiple frustrating
experiences over the last ~7 years of trying to solve problems using ML. In
almost all the cases we failed to ship due to lack of data. Transfer Learning
is a major breakthrough in ML where companies with little data can also build
state of the art models. Unfortunately not enough people know about it. We are
trying to do our part to make it easier to use Transfer Learning as well as
increase awareness about it.

Using Transfer Learning we can build a model to identify cats and dogs in
images with a few (<100) images as compared to the few thousands it would take
before.

To make Transfer Learning easy we are building
[https://nanonets.ai](https://nanonets.ai) that has multiple pretrained models
that can be augmented with your data to create state of the art models. We are
currently in the process of building our first few models. Image Labeling and
Object Detection (in Images) work with a few Text based models coming up in
the next few weeks.

~~~
Ayyar
This is Deep Learning 101 level material, rather than an advanced insider
technique.

It's even listed in the Tensorflow tutorials:
[https://codelabs.developers.google.com/codelabs/tensorflow-f...](https://codelabs.developers.google.com/codelabs/tensorflow-
for-poets/)

~~~
sarthakjain
While I agree that you could build a demo in Deep Learning 101 that could work
for some small set of examples, I disagree that this is 101 level material.

Facebook just released: [https://techcrunch.com/2017/02/02/facebooks-ai-
unlocks-the-a...](https://techcrunch.com/2017/02/02/facebooks-ai-unlocks-the-
ability-to-search-photos-by-whats-in-them/)

You could also call this Deep Learning 101. But it really isn't because
building a usable platform that works at scale actually delivers performance
and solves problems is a lot tougher than what can be taught in an intro to
Deep Learning 101 course.

~~~
ChaitanyaSai
Just poked around a bit with your API, and the learning with just 25 samples
is impressive! And the getting training samples from the web is a great touch.
But that 25 sample number seems too low for classes that are "closer"
together? How do you quantify if you have done a good job on training or if
you need some varied samples?

~~~
sarthakjain
Thanks for trying us out! Internally we have validation metrics which have a
number that says how good a model is at class separation. One naive way to do
this entropy (-plogp). We are planning on exposing this to users soon. So once
you create a model you'll receive feedback as to how "good" we think it is. In
case it's not working well we might ask you for more data (we hope we don't
need to do this too frequently)

------
sytelus
Another less know but really promising approach is program synthesis (also
called "program generation"). One can build fairly robust model just using 2-5
examples and that too in just seconds. Implementation of this approach is
already shipped in to Excel where you just enter few example of formatting and
"Flash Fill" will learn what to do: [http://research.microsoft.com/en-
us/um/people/sumitg/flashfi...](http://research.microsoft.com/en-
us/um/people/sumitg/flashfill.html)

Paper: [http://research.microsoft.com/en-
us/um/people/sumitg/pubs/ca...](http://research.microsoft.com/en-
us/um/people/sumitg/pubs/cacm12-synthesis.pdf)

~~~
prats226
This is cool. From what I understand from paper, its DSL has set of algorithms
as building blocks that learn the input/output function. Deep learning algos
are trying to do the same but with more generic blocks where assumption is
that a lot of these blocks will be able to learn algorithms too. Deep learning
is trying to build with a more generic approach in which transfer learning is
helping to reduce number of examples needed by reusing algorithms learned.

------
amelius
I think Deep Learning is very frustrating to work with at the moment. First,
there is the problem of overfitting, which shows up typically after you've
already been training for hours. So you have to tweak things (basically this
is just guessing), and start from scratch. If your network has too many
neurons, then overfitting may more easily occur, which is a weakness in the
theory because how can more neurons cause more problems? Then there is the
problem that if your data is somewhat different from your training data in
what humans would call an insignificant way, your network may easily start to
fail. For example, when doing image classification, and your images contain
e.g. a watermark in the lower-left corner, suddenly your recognition may start
failing. I've been able to use DL for some projects successfully, but for
other projects it has been an outright failure with many invested hours of
training and tweaking.

~~~
dfan
> which is a weakness in the theory because how can more neurons cause more
> problems?

In exactly the same way that adding more terms to a polynomial fit causes more
problems. The is one of the most fundamental results in the theory of
statistical learning in general; don't blame Deep Learning for it.

~~~
amelius
Yes I know, it was a rhetorical question. Imho, if having more parameters
causes problems, then the system should simply not use those extra parameters.
But the theory is not there yet.

~~~
dfan
That's what regularization is for. You probably know that too, so pretend that
was just for the benefit of the onlookers.

~~~
cityhall
I think his point is no one can tell you from theory which regularization
methods to apply to a particular problem to get the best results. You need
expert knowledge, experience, and hyperparameter tuning.

~~~
prats226
Transfer learning helps with overfitting too. It is proven to get more
generalized model if you use transfer learning that if you train a model on
your own with same data (even with large datasets). You need expertise in deep
learning but the good thing is that you don't need a lot of expertise in
domain of the problem.

------
brandonb
Another idea is one-shot learning using deep generative models. DeepMind had a
paper on this last year:
[https://arxiv.org/abs/1603.05106](https://arxiv.org/abs/1603.05106)

~~~
bglazer
Not to be a pedant, but I think the DeepMind paper is actually an example of
one-shot generalization, but not learning. From the paper:

> Another important consideration is that, while our models can perform one-
> shot generalization, they do not perform one-shot learning. One-shot
> learning requires that a model is updated after the presentation of each new
> input, e.g., like the non-parametric models used by Lake et al. (2015) or
> Salakhutdinov et al. (2013). Parametric models such as ours require a
> gradient update of the parameters, which we do not do. Instead, our model
> performs a type of one-shot inference that during test time can perform
> inferential tasks on new data points, such as missing data completion, new
> exemplar generation, or analogical sampling, but does not learn from these
> points. This distinction between one-shot learning and inference is
> important and affects how such models can be used.

------
iraphael
It should be noted that transfer learning is an umbrella term for many ideas
that revolve around transferring what one model has learnt into another model.
The method described here is a type of transfer learning called fine tuning.

~~~
sarthakjain
Yes transfer learning is a fairly umbrella term encompassing a lot of
different approaches. We tried to give an example of the one most commonly
used in NNs almost exclusively with regards to feature extraction. Do you have
some resource that lists a variety of transfer learning approaches? Happy to
work with you in creating a aggregated list.

~~~
iraphael
Well for starters, fine-tuning can be done in a variety of different ways. You
can pretrain your model with a larger, different dataset, or you can train an
autoencoder that learns some useful representation of that larger dataset and
use the encoder as a base for fine tuning.

Another approach I've seen that was really cool is Model Distillation [0],
which is basically the training of a smaller NN with the inputs and outputs of
a larger NN (where the output is slightly modified to increase gradients and
make training faster).

[0]
[https://arxiv.org/pdf/1503.02531v1.pdf](https://arxiv.org/pdf/1503.02531v1.pdf)

------
jph00
This is good advice - in fact at
[http://course.fast.ai](http://course.fast.ai) our main teaching focus is
transfer learning

------
salimmadjd
Thanks for this write up. I'm not in the ML field, but do follow it a bit and
didn't know anything about it.

I also really like your business model. I had argued with potential
entrepreneurs and friends that ai is becoming a commodity. However, your
business model has a potential for building a network effect of data on top of
ai. Presumably becoming more valuable with time. I do think though, you
probably best solve this for a specific vertical first as your go to market
strategy.

------
NicoJuicy
For anyone interested, i have a whole bunch of links on Deep-Learning and much
more: [http://tagly.azurewebsites.net/Home/ByTag?Name=deep-
learning](http://tagly.azurewebsites.net/Home/ByTag?Name=deep-learning)

------
jeffreysean
Here is a machine learning app able to detect unprofessional posts (texts &
images) on Facebook, Instagram, and Twitter:

[https://www.repnup.com/](https://www.repnup.com/)

