There's at least one existing paper about this idea (https://arxiv.org/abs/1511.05641). Also, it is possible to initialize a convolutional layer so that it passes through its input, but initializing the weights properly requires a little more work than calling tf.keras.initializers
It would be interesting to see neural networks that can grow layers as needed using this technique. Perhaps you could have the network occasionally monitor its own progress on a tune set somehow and adjust the model capacity based on the amount of over/underfitting observed.
Exactly my thoughts on this.. if you could detect when the network was getting saturated or over fitted, spawn new layers. Maybe this would allow starting with much simpler networks, operating on lower rez inputs, and then scaling then up as you feed them more data, instead of burning cycles slowing training a large network from scratch.
Author jumps straight into how to implement this in code before giving any clues as to what the strategy or purpose is.
The idea is funny, but it would be nice if there were some arguments as to why it might be useful, and even better with an example of a model being improved by this.
>These posts (along with the last) are my first two in a series where I will attempt to increase the size of OpenAI's GPT-2 model while taking advantage of the training the model has already gotten. Their model is a great candidate for this experiment, as OpenAI have already demonstrated great results with what is basically a bigger version of it.
Not that I disagree that a bigger intro would be a bad thing - you are right to point out that there is little reasoning given.
The Hug of Death seems to have killed the page so I can't really tell what it's about, but here's an example of inserting new trainable layers in BERT: https://arxiv.org/abs/1902.00751
I don't think it's Hug of Death even since it's not actually that much traffic and it's not down for me. Cloudflare + Github Pages just seems to be a much less reliable combination than I believed before I moved to it a year ago (it sometimes goes down even with almost no traffic, and then is up a second later).
I don't have a strong background in AI/ML, but aren't quite a few advances in this tech driven by a "throw at the wall and see what sticks" attitude? I was under the impression that the "why" in some mechanics in advanced neural nets were poorly understood and just happened to work well. Please enlighten me if that isn't the case.