It'd be really interesting if a research group could calculate an entropic calcu...

simiones · on June 11, 2020

I think the problem you'd find is that "bit trained" is probably highly non-trivial.

For example, I expect that the training required to go from 7-year-old child to Go grand master requires a completely different number of bits of information than the training required to go from blanks-late NN to NN Go Grand master. I also suspect that the difference in what is being learned may well dominate the difference in training efficiency. Both the prior knowledge and the mechanism of learning are so different that I doubt you could get a meaningful comparison based on current understanding.

You should remember that we have no idea basically how human beings actually learn things, and no idea how much prior knowledge we have encoded. Just for an example, I once saw a documentary that claimed chess grandmasters seem to recognize valid chess positions using the parts of the brain that usually recognize faces. Assuming that was true (I'm not claiming it is) perhaps a part of their chess learning consisted in taking a built-in face recognizing NN and training it to recognize chess boards. How much did the built-in knowledge of recognizing faces help? I don't think it would be possible to calculate.

elcritch · on June 11, 2020

Agreed, after writing that I realized that "bits" of training is a pretty poor metric. Especially in lossy NN as compared to normal computing. Likely researchers will be busy for decades defining and narrowing down the concepts in the field before useful values could be determined in terms of information theory.

A huge question I didn't even realize was "bits don't relate very directly to a NN ability to perform a task".

tasuki · on June 11, 2020

Can't help but remark that "7-year-old child" is not a valid go rank. Some 7 year olds are surprisingly good at playing go :)

simiones · on June 11, 2020

True, I should have probably said something safer, like 1-year old child :)