Using Transfer Learning we can build a model to identify cats and dogs in images with a few (<100) images as compared to the few thousands it would take before.
To make Transfer Learning easy we are building https://nanonets.ai that has multiple pretrained models that can be augmented with your data to create state of the art models. We are currently in the process of building our first few models. Image Labeling and Object Detection (in Images) work with a few Text based models coming up in the next few weeks.
More seriously though: others have pointed out that finetuning is pretty popular in some subfields, but it's just one hammer in a of a whole toolbox of techniques which are necessary to make neural nets train (even when you have a tonne of data). Standardisation, choice of initialisation, and choice of learning rate schedule all come to mind as other factors which seem simple, but which can have a huge impact in practice.
Of course, each tool has its limitations. The most obvious limitation of finetuning is that you need a network that's already been trained on vaguely similar data. Pretraining on ImageNet is probably not going to help you solve problems where the size of objects matters, for example, because most ImageNet performance tends to benefit from scale invariance.
I wish you luck with nanonets.ai, but I think it's irresponsible to market this as the "1 weird trick" to bring data efficiency to neural nets.
Personally, I'm a hobbyist and I don't want to know about these shortcuts until I start to need them - which is a stage I might never reach. People who've progressed far enough to need them are probably far fewer than those who are just curious what these words mean.
Another possibility is the words "transfer learning" might be more generally meaningful outside the ML field than the other search terms on the graph, so most of the searches for it are really schoolteachers or something else.
It's even listed in the Tensorflow tutorials:
Facebook just released: https://techcrunch.com/2017/02/02/facebooks-ai-unlocks-the-a...
You could also call this Deep Learning 101. But it really isn't because building a usable platform that works at scale actually delivers performance and solves problems is a lot tougher than what can be taught in an intro to Deep Learning 101 course.
The OpenFace face recognition library also offers this technique. You take advantage of their large pre-trained network for face embedding: transforming a face into features distinct enough for classification. You then train another few layers for recognizing your own samples.
And do you just arbitrarily select the "cut off output layer" for the pretrained model when retraining with your own data on new layers?
Some other areas are much more challenging. For example, in natural language processing tasks you will sometimes see some benefit from using pretrained embeddings, but it is very task and model specific. There's some exciting work going on in this area though.
1) Our target audience is someone who hasn't taken Deep Learning 101 but wants to solve a problem
2) We are focusing on users who don't want to setup their own deep learning machines and don't want to learn how to use tensorflow/kerras/caffe/theano and spend time maintaining their own boxes, they don't want to spend engineering effort in ensuring slas and uptime along with scalability
3) We have made improvements in both the model we use for our product and the way it is retrained. It's not the same as the tensorflow example
4)The model has a different dataset than ImageNet and provides additional value in being better suited to certain tasks.
Worst case we have something nobody wants and that's valuable insight in itself. In the best case we have made something that people learn in Deep Learning 101 that can now be used by anybody without spending time and get straight to solving problems.
OTOH, your webpage makes it pretty clear what you actually do, so props to you for that!
In exactly the same way that adding more terms to a polynomial fit causes more problems. The is one of the most fundamental results in the theory of statistical learning in general; don't blame Deep Learning for it.
Tradeoffs, tradeoffs everywhere. It's almost like traditional mathematical statistics has something to offer them fancy machine learners. (Breiman was a professor of statistics, after all... ahead of his time, but no less a statistician.)
More neurons means more parameters to adjust to your data, so overfitting is more likely to happen. It is like interpolation a function, the more parameters you use the more overfitting you have.
> if your data is somewhat different from your training data in what humans would call an insignificant way, your network may easily start to fail
humans call it insignificant because we have a deep knowledge of a lot of domains, meanwhile a network has been trained for an specific domain. So if you train the network with a distribution and then test it with another distribution it is not going to work. That is like quite obvious I think
Deep learning works incredible well. It works so well that it outperform humans in some domains. So may want to rethink what are you doing, because I think (but I may be wrong) the reason you are failing applying deep learning is something related with your process and not with deep learning
Yes, that's a problem right there, because often only one solution actually works, and you don't know which one unless you spend hours on training.
> Another important consideration is that, while our models can perform one-shot generalization, they do not perform one-shot learning. One-shot learning requires that a model is updated after the presentation of each new input, e.g., like the non-parametric models used by Lake et al. (2015) or Salakhutdinov et al. (2013). Parametric models such as ours require a gradient update of the parameters, which we do not do. Instead, our model performs a type of one-shot inference that during test time can perform inferential tasks on new data points, such as missing data completion, new exemplar generation, or analogical sampling, but does not learn from these points. This distinction between one-shot learning and inference is important and affects how such
models can be used.
Another approach I've seen that was really cool is Model Distillation , which is basically the training of a smaller NN with the inputs and outputs of a larger NN (where the output is slightly modified to increase gradients and make training faster).
I also really like your business model. I had argued with potential entrepreneurs and friends that ai is becoming a commodity. However, your business model has a potential for building a network effect of data on top of ai. Presumably becoming more valuable with time.
I do think though, you probably best solve this for a specific vertical first as your go to market strategy.