Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm pretty sure linearity was also part of the picture -- no matter how many layers of neurons you have, as long as they perform linear combinations, they could all be replaced with a single one (ignoring numeric errors).


I mentioned this in my post ("given nonlinear activations"), but you're correct to point out its importance. The combination of linear functions is always linear. Full stop. You need at least one non-linear layer to get reasonable results.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: