Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Transfer Learning and Fine-Tuning Deep Convolutional Neural Networks (revolutionanalytics.com)
88 points by rasmi on Dec 23, 2016 | hide | past | favorite | 4 comments


What I have not seen is the explanation about transferring the data normalization parameters. Say you apply the contrast normalization to the images you use to train the first network.

Is there anything better you can do than applying the same parameters to the second training set?


If you had enough data, I imagine you could just re-learn the parameters on the new dataset (and also finetune the network)?


This is the interesting part:

> New dataset is smaller in size and similar in content compared to original dataset: If the data is small, it is not a good idea to fine-tune the DCNN due to overfitting concerns. Since the data is similar to the original data, we expect higher-level features in the DCNN to be relevant to this dataset as well. Hence, the best idea might be to train a linear classifier on the CNN-features.

> New dataset is relatively large in size and similar in content compared to the original dataset: Since we have more data, we can have more confidence that we would not over fit if we were to try to fine-tune through the full network.

> New dataset is smaller in size but very different in content compared to the original dataset: Since the data is small, it is likely best to only train a linear classifier. Since the dataset is very different, it might not be best to train the classifier from the top of the network, which contains more dataset-specific features. Instead, it might work better to train a classifier from activations somewhere earlier in the network.

> New dataset is relatively large in size and very different in content compared to the original dataset: Since the dataset is very large, we may expect that we can afford to train a DCNN from scratch. However, in practice it is very often still beneficial to initialize with weights from a pre-trained model. In this case, we would have enough data and confidence to fine-tune through the entire network.


This is one of the most concise and accessible explanations of fine-tuning CNNs that I've come across. I hope someone finds it as helpful as I did.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: