
What is Transfer Learning? - pranoy
https://medium.com/@pranoyradhakrishnan/what-is-transfer-learning-8b1a0fa42b4
======
hatmatrix
> Transfer learning make use of the knowledge gained while solving one problem
> and applying it to a different but related problem

> the new dataset should be similar to the orginal dataset and the new dataset
> should be much smaller.

I don't think this represents the capabilities of transfer learning very well.

~~~
glass_of_water
How would you characterize the abilities of transfer learning? Not
disagreeing, simply curious.

~~~
jszymborski
Not OP, but the assertion that the two datasets be similar is just too
incomplete to be correct, considering the topic is rather nuanced.

As the article states, the majority of the hidden layers, particularly the
early ones, are very general ones that usually work out to identify
edges/colours/etc... that generalise very well.

Because you pop-off the last fully-connected layer, transfer learning a
network that has been trained on something like imagenet still generalises
quite well to something completely different like medical imaging of tissue
samples, although distance between the two sets will still be your main
limitation.

Before I get ahead of myself, I'll refer you to this paper that tests the
degree to which transfer learning is generalisable [0]. They conclude that
difference between the two datasets is the main limitation, but that those
models still work out better than random features, which I think proves that
it's sufficiently generalisable for most tasks.

[0] [http://papers.nips.cc/paper/5347-how-transferable-are-
featur...](http://papers.nips.cc/paper/5347-how-transferable-are-features-in-
deep-neural-networks.pdf)

------
samwalrus
I like the notion of 'context' for transfer learning. Where context can be
parameterized.

The idea is that you learn a general model from your available data and you
are able to specialise that model to perform well when adapted to different
contexts.

A simple example is when the different contexts are different cost matrices or
different expected ratios of positives and negatives.

So instead of learning a classification model from your training data, you
learn a ranking model. You can make and adapt the different classification
models (thresholds on the ranking) depending on the context of where that
model will be deployed.

So for example you learn a ranking model from pictures that ranks women above
men. When you want a classifier that classifies pictures as men or women, you
chose the threshold from your ranking model depending on the confusion
matrices costs for the context of where the model is being deployed.

I think a cool research theme is to think of similar tools for other aspects
of transfer learning.

