It'd be really interesting if a research group could calculate an entropic calculation on how efficient training any given neural network would be. As in what is the thermodynamic limit of the most optimal NN training could be in terms of watts per bit trained. My hunch would be that human brains would operate close to this limit. At least in our standard environmental conditions. Based on how near optimal biomaterials are in terms of strength to weight ratios it wouldn't surprise me much.
I think the problem you'd find is that "bit trained" is probably highly non-trivial.
For example, I expect that the training required to go from 7-year-old child to Go grand master requires a completely different number of bits of information than the training required to go from blanks-late NN to NN Go Grand master. I also suspect that the difference in what is being learned may well dominate the difference in training efficiency. Both the prior knowledge and the mechanism of learning are so different that I doubt you could get a meaningful comparison based on current understanding.
You should remember that we have no idea basically how human beings actually learn things, and no idea how much prior knowledge we have encoded. Just for an example, I once saw a documentary that claimed chess grandmasters seem to recognize valid chess positions using the parts of the brain that usually recognize faces. Assuming that was true (I'm not claiming it is) perhaps a part of their chess learning consisted in taking a built-in face recognizing NN and training it to recognize chess boards. How much did the built-in knowledge of recognizing faces help? I don't think it would be possible to calculate.
Agreed, after writing that I realized that "bits" of training is a pretty poor metric. Especially in lossy NN as compared to normal computing. Likely researchers will be busy for decades defining and narrowing down the concepts in the field before useful values could be determined in terms of information theory.
A huge question I didn't even realize was "bits don't relate very directly to a NN ability to perform a task".