Hacker News new | past | comments | ask | show | jobs | submit login
Interpreting neural networks through the polytope lens (2022) (lesswrong.com)
80 points by sva_ 10 months ago | hide | past | favorite | 7 comments



Very interesting read and a rather "obvious" one. I can't believe I didn't see this before. Obviously... A perceptron layer is a bunch of dot products followed by comparison. Every graphics programmer knows this is a check of which side of a plane you're on.

Of course, the relu unit is also a passing on information when the result is on one side of the plane, making this a spline.

As others have said... Can we learn the separating planes without the backward gradient propagation? I don't know but seeing it in this new way may help.


I’m very much exploring this idea. Hopf algebras provide a really nice wrapper around this. I have a discord to discuss this

https://discord.cofunctional.ai


I wonder if these two things from the HN post below this one:

https://www.nature.com/articles/d41586-024-00288-1

I would love to see a cross section of these two ideas....

I have been surprised that in the past few weeks, I have seen several posts on HN where, while separate, unrelated posts here - there have been related characteristics and if you look at them for a sec, you can see how having AIs GPT both studies/papers - immediate connections worth looking at further are revealed.

If even for the sake of just a more informed tapestry of knowledge in a particular area...

Its really enjoyable reading and TIL'ing.


These ideas are too far apart...


> if we scale the activations in a particular layer in a non-linear network, some neurons in later layers may ‘activate’ or ‘deactivate’.

Normalization removes this problem. Magnitude information can still be encoded separately in a log form so differentiation can still happen when scale matters, but scaling doesn't have much impact by default (small initial weights following magnitude element).


I wonder if there's a way to learn polytopes directly with a non-MLP formulation.


There are many interesting efforts — going back quite a few years —- to this goal, many of which in the PAC setting (which automatically means MLP is out, for theoretical guarantees). E.g [0]and its related references come to mind as an interesting place to look into it!

[0]: https://proceedings.neurips.cc/paper/2018/file/22b1f2e098316... pro

(Edited for some clarity)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: