Adding layers to the middle of trained network without invalidating the weights

simonster · on March 22, 2019

There's at least one existing paper about this idea (https://arxiv.org/abs/1511.05641). Also, it is possible to initialize a convolutional layer so that it passes through its input, but initializing the weights properly requires a little more work than calling tf.keras.initializers

dontreact · on March 23, 2019

It would be interesting to see neural networks that can grow layers as needed using this technique. Perhaps you could have the network occasionally monitor its own progress on a tune set somehow and adjust the model capacity based on the amount of over/underfitting observed.

thrax · on March 23, 2019

Exactly my thoughts on this.. if you could detect when the network was getting saturated or over fitted, spawn new layers. Maybe this would allow starting with much simpler networks, operating on lower rez inputs, and then scaling then up as you feed them more data, instead of burning cycles slowing training a large network from scratch.

wodenokoto · on March 22, 2019

Author jumps straight into how to implement this in code before giving any clues as to what the strategy or purpose is.

The idea is funny, but it would be nice if there were some arguments as to why it might be useful, and even better with an example of a model being improved by this.

Tenoke · on March 22, 2019

>These posts (along with the last) are my first two in a series where I will attempt to increase the size of OpenAI's GPT-2 model while taking advantage of the training the model has already gotten. Their model is a great candidate for this experiment, as OpenAI have already demonstrated great results with what is basically a bigger version of it.

Not that I disagree that a bigger intro would be a bad thing - you are right to point out that there is little reasoning given.

phowon · on March 22, 2019

The Hug of Death seems to have killed the page so I can't really tell what it's about, but here's an example of inserting new trainable layers in BERT: https://arxiv.org/abs/1902.00751

Tenoke · on March 22, 2019

I don't think it's Hug of Death even since it's not actually that much traffic and it's not down for me. Cloudflare + Github Pages just seems to be a much less reliable combination than I believed before I moved to it a year ago (it sometimes goes down even with almost no traffic, and then is up a second later).

This paper is definitely relevant though!

bmcooley · on March 22, 2019

I don't have a strong background in AI/ML, but aren't quite a few advances in this tech driven by a "throw at the wall and see what sticks" attitude? I was under the impression that the "why" in some mechanics in advanced neural nets were poorly understood and just happened to work well. Please enlighten me if that isn't the case.