Consider a single-neuron model that just pools all pixels in an image together. ...

jychang · 2025-08-16T06:34:05 1755326045

> here's never a size where the model cannot recognize faces at all

True

> then you add just a single neuron that recognizes them perfectly

Not true.

Don't think in terms of neurons, think in terms of features. A feature can be spread out over multiple neurons (polysemanticity), I just use a single neuron as a simplified example. But if those multiple neurons perfectly describe the feature, then all of them are important to describe the feature.

The Universal Approximation Theorem implies that a large enough network to perfectly achieve that goal would exist (let's call it size n or larger), so eventually you'd get what you want between 0 and n neurons.

yorwba · 2025-08-16T08:33:43 1755333223

> if those multiple neurons perfectly describe the feature, then all of them are important to describe the feature.

You could remove any one of those neurons before retraining the model from scratch and polysemanticity would slightly increase while perfomance slightly decreases, but really only slightly. There are no hard size thresholds, just a spectrum of more or less accurate approximations.