
An idea from physics helps AI see in higher dimensions - theafh
https://www.quantamagazine.org/an-idea-from-physics-helps-ai-see-in-higher-dimensions-20200109/
======
empath75
Pretty amazing work -- a couple of thoughts:

1) The article doesn't say this, but dimensions don't _always_ have to do with
locations in space and time, you can treat any value that can continuously
vary as a dimension -- for example, a person might have dimensions for
personality type, age, hair color, etc, etc.. Seems like they could use this
technique to better train CNNs to recognize patterns in a lot of data besides
imagery -- fraud detection based on credit card transactions, for example. 2)
There are a lot of local and global symmetries in physics -- i wonder what new
capabilities adding them to a CNN would enable?

~~~
mywittyname
The article is talking specifically about performing convolutions on higher
dimensional manifolds. This is different from the more broad concept of data
dimensionality typically associated with AI/ML.

Without repeating the article too much, this is important because it can be
used to learn very complex systems from a series of lower dimensional
projections. Such as, creating a 3d map of a dog from a collection of 2d
images of dogs. The resulting system can better detect a dog in a position
it's never seen because the CNN has built into it the relationship between 3d
space and the 2d representation of that space.

~~~
empath75
This is a serious question because I don’t know: what’s the difference between
high dimensional data and high dimensional manifolds?

~~~
mywittyname
Disclaimer: This is reaching the limits of my math abilities.

The differences comes in the fact that higher dimensional tuples may contain
data that is independent of other fields. Say, if you have a tuple that's:
(name, dob, address,) and you have a projection function that accepts such a
3-tuple and returns a 2-tuple of (name, dob,). For that function, the address
dimension has no relationship at all to the other fields, meaning, that there
is not unproject function that a person could create such that 3_tuple ==
unproject(project(3_tuple)).

With manifolds, the higher dimensions can have a relationship with lower
dimensional data, and such a relationship can be encoded into a function. What
the research appear to have designed is a system that, given a priori
knowledge of task and enough n-tuples for learning, can produce and
approximation of such an unproject function.

Thus, after learning, they have a system where, unproject(project(n_tuple)) ~=
n+1_tuple. Because they were able to inform the learning system about the
nature of the relationship between the two dimensions.

~~~
_wzsf
s/tuple/coordinate

------
peter_d_sherman
Excerpt:

"Now, researchers have delivered, with a new theoretical framework for
building neural networks that can learn patterns on any kind of geometric
surface. These _“gauge-equivariant convolutional neural networks,” or gauge
CNNs_ , developed at the University of Amsterdam and Qualcomm AI Research by
Taco Cohen, Maurice Weiler, Berkay Kicanaoglu and Max Welling, can detect
patterns not only in 2D arrays of pixels, but also on spheres and
asymmetrically curved objects. “This framework is a fairly definitive answer
to this problem of deep learning on curved surfaces,” Welling said."

------
gambler
Can someone explain to me why advances in actual model performance come from
using analogies from physics when there are papers that supposedly provide a
mathematical explanation of convolution?

"A Mathematical Theory of Deep ConvolutionalNeural Networks for Feature
Extraction":

[https://arxiv.org/pdf/1512.06293.pdf](https://arxiv.org/pdf/1512.06293.pdf)

"Understanding Convolutional Neural Networks with A Mathematical Model":

[https://arxiv.org/pdf/1609.04112.pdf](https://arxiv.org/pdf/1609.04112.pdf)

~~~
conjectures
Because it's not a standard convolutional net by the description. The
difference is:

A) Studying an existing technique with math.

B) Coming up with a new technique.

You could get a modern engineering consultancy to review your steam engine,
but it would still be a steam engine.

------
hanniabu
Is multidimensional AI the new 2020 buzzword?

~~~
z3c0
I sure hope so. Telling people I build multidimensional data structures for a
living has only yielded glazed-over eyes thus far.

~~~
77544cec
\- I'm a developer.

\- Huh ?

\- I build websites _types on an air keyboard_.

~~~
catalogia
"Developer" is a word that requires some context. A stranger might not know if
you worked in construction or on computers.

~~~
toper-centage
On a professional context I'll say I'm a software developer. If I want to brag
a bit I'll say I'm a software engineer. If I'm with friends I'll say I'm a
programmer. If I'm with family or older people I'll dip my toes with "I work
with computers" and maybe further explain if prompted.

~~~
giancarlostoro
I just say Software Engineer its what my employer calls me and resolves what I
want to call myself.

~~~
SkyBelow
This is much more entertaining when you have family who are PEs and their eye
twitches every time.

~~~
giancarlostoro
Which is funny because there's also backlash against Computer Science that it
is not in fact a science, which is why they added math courses to some
degrees, at least here in Florida. I think Computer Programming is just
probably best described as Computer Programming but I do like saying Software
Engineer every time someone asks just because it sounds good enough to me.
Until they standardize our title into one single thing, I'll just go by SE.

~~~
_0ffh
Funny, where I come from the earliest computer science departments at
universities where basically joint ventures of the maths and electric
engineering departments. Consequently they were quite maths heavy, and it
shows to this day.

------
etaioinshrdlu
This is super cool and I'm pretty sure this is basically topology. The article
was pretty hard to read though. It reminds me a little bit of the
[https://en.m.wikipedia.org/wiki/Hairy_ball_theorem](https://en.m.wikipedia.org/wiki/Hairy_ball_theorem)

~~~
improbable22
Not really topology, it's more like group theory, and representations.

Ordinary convnets are a way of building in translational symmetry, which is
the group R^2 (in the plane). The work being described extends this to larger
symmetry groups, such as rotations of a molecule in 3D (which is SO(3)).

For either of these, you can work in Fourier space instead of real space,
where convolutions become products. For ordinary convnets means ordinary FFT,
but nobody does that as translating to neighbouring pixels is simple enough.
Rotations aren't so simple, and so working in Fourier space can be an
efficient way to do things. And the connection to physics is really just that
the representation theory of SO(3) is a bread-and-butter exercise there, the
basis of atomic theory.

------
MrQuincle
What happens if there are multiple 3D or 4D objects? Do we then need an
attention mechanism as well? Or is there some topology where a "where vs what"
pathway emerges naturally?

~~~
numlock86
Isn't a 3D object basically a 4D's object surface?

------
jhisiow9839
Is this very different from a graph convolutional network (GCN)? Seems like a
GCN would have a lot of the same equivariabne properties (i.e. orientation,
units of measure, etc)?

------
beefield
_I_ would like to have a VR experience in higher dimensions. It should not be
completely impossible to build some kind of actuators that I can somehow
attach to my body to sense my orientation and acceleration in fourth
dimension.

~~~
uj8efdkjfdshf
May I recommend 4D Toys? It's made by the same guy who's developing Miegakure
and it has a VR version. The controls are a bit limited though in that user-
initiated rotations are limited to 3D.

~~~
beefield
I thought that is me being stuck in 3d and the toys around me moving in 4d? As
I would like to move myself in 4d...

~~~
6nf
There's a slider to move yourself in the extra dimension.

------
ganzuul
Something that could go by the same title is the use of tensor networks for
ML. I think it works like a pre-optimization step by dimension reduction of
the solution space, but if someone could give the right intuitive explanation
I'd be much obliged.

It seems to be a way to lessen inductive bias by making decisions about
available ML algos. That is, it vastly increases the solution space but
remains effective by omitting unlikely solutions.

------
openasocket
This is a really interesting application of differential geometry in machine
learning! And the allusion at the end to having the system eventually learn
the symmetries of the system and make sure of that is really interesting. All
the examples they gave were very physical, like climate models, but I imagine
you could find symmetries in much more abstract problems that may not be
intuitive.

------
mojomark
Related G-CNN video:
[https://youtu.be/wZWn7Hm8osA](https://youtu.be/wZWn7Hm8osA)

------
cosmic_ape
Hype pipeline: 1) Take a banal feature engineering work. 2) Add Albert
Einstein reference. 3) Profit.

------
sillysaurusx
One of the strangest things in AI, to me, is that you can average the weights
of multiple different models to create a single model that’s better than any
of the individuals.

This is how distributed training often works, for example. Data parallelism.

I don’t understand why it still works in higher dimensions, but it seems to.

The intuition is that the multiple models are “spinning” around the true
solution, so averaging gives the final result more quickly. But it works even
early in the training process.

~~~
rprenger
When we use data parallelism, we're summing/averaging the gradients induced by
different parts of the data, not the weights of the model itself. When using
multiple models for ensemble methods, we're summing/averaging the output of
the models for a sample. We're not summing/averaging the weights of the models
themselves. While averaging the weights of several models might work on a
given problem, it definitely doesn't work in general.

w2 _tanh(w1_ x) = -1 _w2_ tanh(-w1*x)

But if you average the weights in those two equivalent models you get 0.

If you're talking about asynchronous data parallelism, then there can be some
averaging of weights, but they all start with the same weights and are re-
synched often enough that weights are never too different to break it.

~~~
sillysaurusx
[https://www.docdroid.net/faDq8Bu/swarm-
training-v01a.pdf](https://www.docdroid.net/faDq8Bu/swarm-training-v01a.pdf)

We average the weights themselves, and the efficiency seems to be similar to
gradient gathering.

It’s also averaging in slices, not the full model. There’s never a full
resync.

SWA is the theoretical basis for why it works, I think.

Another way of thinking about it: If the gradients can be averaged, then so
can the weights.

~~~
uoaei
If you are averaging weights often enough, then it's basically the same as
averaging gradients. If you average the weights of a bunch of independently-
trained models, you're going to have a rough time. Even if the function
computes the exact same thing, the order of rows and columns in the
intermediate matrices will totally ruin your averaging strategy.

------
BubRoss
This article is all over the place with almost no substance. It starts talking
about Einstein's theory of relativity, it says this definitively solves curved
data and gives almost no insight into what is actually different. It is so bad
it makes me want to avoid this site all together.

~~~
gojomo
The article includes multiple links to the related underlying research papers,
for people who need more substance.

~~~
BubRoss
My point is that the article is worse than just having a link to the paper. It
just says it might lead to cures for diseases, better self driving cars and
lots of other abstract sensational nonsense. It could be used as a template
and any new machine learning paper could just be linked to it.

------
cliqueiq
I thought quantamagazine was above publishing click-bait, but I guess not.

Neural networks already see in "higher dimensions" (whatever that means).
Anyone who's ever used neural networks already knows each neuron's branch
(i.e. dendrite) of an N-sized vector can already be though of as a "dimension"
of a data set. CNN (convolutions) flatten that data (reduce it or seeing the
same pattern over less "dendrites", much like PCA, etc.).

CNNs only make sense when working with image data anyways.

~~~
jhj
> CNNs only make sense when working with image data anyways

Not true, N-dimensional convnets, 1-d convnets (for NLP and time series
analysis), spatially sparse convnets, graph and non-Euclidean space convnets,
... exist and are used.

CNNs are akin to multiscale wavelet transforms. They can be applied on
different spaces (just as graph wavelet transforms exist).

------
throwaway_tech
Wow this is actually fairly close to my prediction for the HN Next Decade
Prediction post.

Edit: I should say it is a big step in the direction of my prediction.

~~~
gojomo
What prediction, what post?

~~~
throwaway_tech
Post: Ask HN: A New Decade. Any Predictions?

[https://news.ycombinator.com/item?id=21941278](https://news.ycombinator.com/item?id=21941278)

~~~
rckoepke
yeah, which comment?

~~~
pseudosudoer
I think OP realized that linking the comment will reveal their real HN account
name lol.

~~~
throwaway_tech
Nope, I just didn't realize linking to the post wasn't enough to find my
comment.

Figured once there it would be pretty trivial to search for my name.

anyway linked above you, so know you have something else to lol about

~~~
pseudosudoer
With "throwaway" as the prefix to your account name, it should be fair to
assume that's not your main account... Also, I don't really see how your
prediction has anything to do with gauge theory applied to CNN, unless I'm
missing something much deeper here?

