Interpreting neural networks through the polytope lens (2022)

anon291 · 2024-02-05T04:59:07 1707109147

Very interesting read and a rather "obvious" one. I can't believe I didn't see this before. Obviously... A perceptron layer is a bunch of dot products followed by comparison. Every graphics programmer knows this is a check of which side of a plane you're on.

Of course, the relu unit is also a passing on information when the result is on one side of the plane, making this a spline.

As others have said... Can we learn the separating planes without the backward gradient propagation? I don't know but seeing it in this new way may help.

adamnemecek · 2024-02-04T23:06:07 1707087967

I’m very much exploring this idea. Hopf algebras provide a really nice wrapper around this. I have a discord to discuss this

https://discord.cofunctional.ai

samstave · 2024-02-04T19:52:20 1707076340

I wonder if these two things from the HN post below this one:

https://www.nature.com/articles/d41586-024-00288-1

I would love to see a cross section of these two ideas....

I have been surprised that in the past few weeks, I have seen several posts on HN where, while separate, unrelated posts here - there have been related characteristics and if you look at them for a sec, you can see how having AIs GPT both studies/papers - immediate connections worth looking at further are revealed.

If even for the sake of just a more informed tapestry of knowledge in a particular area...

Its really enjoyable reading and TIL'ing.

data_maan · 2024-02-04T22:59:13 1707087553

These ideas are too far apart...

Nevermark · 2024-02-05T10:26:13 1707128773

> if we scale the activations in a particular layer in a non-linear network, some neurons in later layers may ‘activate’ or ‘deactivate’.

Normalization removes this problem. Magnitude information can still be encoded separately in a log form so differentiation can still happen when scale matters, but scaling doesn't have much impact by default (small initial weights following magnitude element).

Scene_Cast2 · 2024-02-05T00:52:40 1707094360

I wonder if there's a way to learn polytopes directly with a non-MLP formulation.

jtanderson · 2024-02-05T02:00:39 1707098439

There are many interesting efforts — going back quite a few years —- to this goal, many of which in the PAC setting (which automatically means MLP is out, for theoretical guarantees). E.g [0]and its related references come to mind as an interesting place to look into it!

[0]: https://proceedings.neurips.cc/paper/2018/file/22b1f2e098316... pro

(Edited for some clarity)