
Deep Quaternion Networks (2017) [pdf] - adamnemecek
https://arxiv.org/abs/1712.04604
======
adamnemecek
There's a lot of space to be explored at the intersection of ml and
hypercomplex numbers. There's a Clifford svm that unlike a regular sam which
learns a hyperplane, learns any manifold.

~~~
MrQuincle
That's also what I thought. This is a dissertation on multilayer perceptrons
using backprop over Clifford type neurons by Sven Buchholz:
[https://www.informatik.uni-
kiel.de/inf/Sommer/doc/Dissertati...](https://www.informatik.uni-
kiel.de/inf/Sommer/doc/Dissertationen/Sven_Buchholz/diss.pdf)

------
thesz
The fine article generalizes complex numbers to quaternions. Okay.

But quaternions are themselves generalized by Geometric Algebra. And there is
a plenty of information about use of GA in the field of neural computing:
[https://arxiv.org/pdf/1305.5663.pdf](https://arxiv.org/pdf/1305.5663.pdf)
(page 3). For example, universal approximation theorem for GA is presented at
[https://www.informatik.uni-
kiel.de/inf/Sommer/doc/Dissertati...](https://www.informatik.uni-
kiel.de/inf/Sommer/doc/Dissertationen/Sven_Buchholz/diss.pdf)

I think that fine article is a step back.

~~~
cgearhart
Thanks for sharing this. There's a lot to digest in there, but there were a
few highlights that stood out as possibly relevant to the OP paper.

> Theorem 6.4 ([2]) Complex FCMLPs having (6.9) as activation function are
> only universal approximators in L∞ for the class of analytic functions, but
> not for the class of complex continuous functions.

> ... the complex numbers (C0,1) are a subalgebra of the quaternions (C0,2).
> Hence the quaternionic logistic function is also unbounded. Neither could it
> give rise to universal approximation (w.r.t. L∞) since this does not hold
> for the complex case. One may argue that such things become more and more
> less important when proceeding to higher dimensional algebras since less and
> less components are affected. This is somehow true, but it hardly justify
> the efforts.

> ... Summarising all of the above the case of Complex FCMLPs looks settled
> down in a negative way. ... Hence Complex FCMLPs remain not very promising.

Unless I'm misreading, it seems already known that you _can_ use complex
numbers (or quaternions) in neural networks...but you don't really gain
anything from doing it.

~~~
gaudetcj
One of the authors here. One thing about quaternion convolution is that you
can write a color image into quaternion space by considering each channel as
an imaginary axis. This lets the convolution act on the entire color space in
a different way compared to real-valued networks, which may it do better for
things like segmentation where you need to be more sensitive to changes in the
color space.

------
Jeff_Brown
I skipped to the table at the end. The gains don't seem enormous. Is there a
kind of problem where we would expect quaternions to perform dramatically
better than other kinds of numbers?

~~~
highd
Not to mention that it appears they're comparing against networks of the same
architecture. If you build your quaternion components with with same number
types as your reals you effectively have 4 times the number of parameters,
which could be most of the benefit. They should also benchmark against similar
architectures with equivalent parameter counts.

~~~
gaudetcj
Hi I'm one of the authors of this paper. Sorry if it unclear, but we reduce
the number of filters per layer to account for this. The quaternion networks
actually have fewer parameters.

~~~
highd
Can you provide some technical details on what you do? Do you divide the
number of channels on each real layered network by 4? I don't see anything
describing this in the paper.

~~~
gaudetcj
Yes that is exactly what we do.

------
doyoulikeworms
I’m able to follow neither the article nor the discussion. What would I have
to learn in order to be able to?

Even if it was, like, years of studying. I’m just curious how deep this rabbit
hole is.

~~~
theoh
Quaternions are an extension of the idea of complex numbers. Complex numbers
have a real and an imaginary part, while quaternions have a real part and
multiple imaginary parts (3). So the basic idea is that these richer types of
number, when used to build a network (instead of plain real numbers) have
benefits.

So to get started with reading this paper you just need to learn about deep
learning, and then also the very basics of quaternions, which would be taught
in, for example, a first course on abstract algebra.

~~~
smadge
I don’t think quaternions would be taught in a typical first course of
abstract algebra. Do you know of a textbook where they are featured
prominently?

~~~
dsacco
I don't know of any book which features them "prominently", but I also don't
think you'd really need one. They are taught in various abstract algebra
books, they're just taught in the fashion of, "Here's an exercise that
introduces a peripheral topic it's useful to know about." For example, groups
and rings of quaternions show up in MacLane & Birkhoff's _Algebra_ (62, 426;
282) and Lang's _Algebra_ (9, 545, 723, 758).

 _Edit:_ In an effort to find more applied information I put down my math
books and picked up the information theoretic ones. You can find more
information about the use of quaternions in the two volume _Handbook of
Digital Signal Processing_ and Salomon's _Data Compression_. More generally,
when quaternions aren't explicitly referred to it's helpful to look up the
coverage of complex rotations, especially with respect to the Discrete Fourier
Transform.

For a discussion of rotations with quaternions in the context of animation,
this is a reasonably short paper:
[http://www.cs.cmu.edu/~kiranb/animation/p245-shoemake.pdf](http://www.cs.cmu.edu/~kiranb/animation/p245-shoemake.pdf).

------
freethemullet
Turns out we can approximate many-body wavefunctions using networks with
complex weights:
[https://arxiv.org/abs/1606.02318](https://arxiv.org/abs/1606.02318).

------
loxias
Does use of complex numbers really provide improvement? How does this work?
(other than cramming 2 numbers into 1... which itself is suspect... the
complex plane has the same cardnality as the reals...)

~~~
danharaj
Cardinality is a complete red herring here, we don't care about the set
theoretic structure, and ultimately we're taking finite approximations anyway.
The structures we care about are the metric which tells us which solutions
(neural nets in this case) are nearby to each other and the algebra which
tells us how to compose solutions.

The algebra of real numbers is simply less structured than the complex
numbers. One of the key properties of the complex numbers is that they
naturally have both a magnitude and a phase. This lets them capture phenomena
that have a notion of superposition and interference.

As you correctly pointed out, you can simulate a complex number with two real
numbers. The key is to exploit the particular geometric and algebraic
properties of the complexes. One example in neural networks is the phenomenon
of synchronization, where the outputs of neurons depending on the presence of
a particular stimulus all have the same phase. This can be exploited for
applications such as object segmentation.

So the widest possible view of this line of research is that putting more
algebraic structure on your parameters can improve the behavior of your
learning algorithms. My extremely hot take on how far this can go is a full
fledged integration of harmonic analysis and representation theory into the
theory of deep learning.

~~~
loxias
Ah, fantastic explanation, thanks. =)

I'm coming from a signal processing background, so thinking in terms of
magnitude and phase is comfortable to me. Does synchronization, in the sense
you're describing, really happen in deep learning (ANN) systems? I'd love a
link or reference.

~~~
danharaj
[https://arxiv.org/pdf/1312.6115.pdf](https://arxiv.org/pdf/1312.6115.pdf)

------
kuwze
I remember being introduced to quaternions recently by this post[0] which
recommended this book[1].

[0]: [https://www.haroldserrano.com/blog/best-books-to-develop-
a-g...](https://www.haroldserrano.com/blog/best-books-to-develop-a-game-
engine)

[1]: [https://www.amazon.com/Quaternions-Computer-Graphics-John-
Vi...](https://www.amazon.com/Quaternions-Computer-Graphics-John-
Vince/dp/0857297597/)

------
eleitl
This assumes numerics is free at very large scale, which is not reasonable if
you want to create efficient biologically inspired AI.

------
naveen99
Mathoma videos on geometric algebra:
[https://youtu.be/ERpcSJzX448](https://youtu.be/ERpcSJzX448)

------
godelmachine
May I ask its applications?

------
mike_n
if quaternions, why not octonions?

~~~
danharaj
The lack of associativity might suck.

~~~
wespiser_2018
quaternions are associative! Anyway, you can clever code all the algebraic
operations of quaternions into matrices and go with that!

~~~
gugagore
Yeah, but octonions are not associative. Which is what the poster was saying.
(By the way, that also means that there isn't a matrix representation for
octonions, since matrix multiplicative is associative).

------
dschuetz
You had me on "The field of deep learning...". Sounds seriously scientific to
me. What's the next big flashy field? "Deep thought"? Oh, nope, Douglas Adams
already covered that one.

