I have question about information theoretic view of backprop

sharemywin · on Oct 11, 2019

don't under estimate the 50Mx improvement in computer performance since 1970.

Also, Relu's help with Vanishing/Exploding Gradient problem which allows the information to propagate without sending it in to la la land.

CNNs helped because they don't have to calculate across a fully connected network.

britcruise · on Oct 11, 2019

thanks for the comment.

Yes the performance boost in training is critical.

So what I'm saying is this performance boost is thanks to the switch to non-binary neuron (which allows a reversible operation to transmit a magnitude) - that's MOST important.

And separately ReLu are just better at this, because they are linear they don't have "vanishing edges" (which prevents the vanish/explode)

Separately I'm glad you brought up CNNs because CNN's are old, and go back to rosenblatt (1958), his perception had a first layer of local connections in it based on the findings in biological systems.

and of course that's because nature has found it's more efficient, and so the efficiency is huge.

but the point is CNN = fewer knobs to train.

and there are lots of simple ways to help reduce knobs - fix some connections - drop some connections