
Einsum Is All You Need – Einstein Summation in Deep Learning (2018) - aleyan
https://rockt.github.io/2018/04/30/einsum
======
yorwba
The problem with einsum is that you have to explicitly specify the mapping
between dimensions and indices every time, without any way to enforce
consistency. It would be more ergonomic if each tensor had labeled dimensions.
That would prevent the kind of silly mistake where you mix up the ordering of
dimensions and only notice it when you later change the shape of the tensors
so the different dimensions no longer match up.

~~~
claudius
ITensor [1] has somewhat pioneered this concept in tensor networks for
condensed matter physics. If I understand correctly, Uni10 [2], a similar
project, is even working on a graphical interface for such networks so that
you can "draw" the network and have the computer figure out the best
contraction order.

In my own code I’ve recently also implemented such named indices (only had
"einstein summation"-like contraction specifiers before) and they make tensor
contractions so much simpler to write, especially since you can simply
overload operator * and have it figure out which legs need to go together.

As a side effect, it also enforces your algorithms to make sense because you
can't simply add two tensors living on different (but equal-dimensional)
vector spaces together anymore.

[1] [https://itensor.org](https://itensor.org)

[2] [https://uni10.gitlab.io/](https://uni10.gitlab.io/)

~~~
mcabbott
I made a thing to audit that you are consistent with your indices [1] as an
alternative to explicit named-tensor objects. And found, but have not used,
another package in a similar spirit [2].

Although to be honest, simply writing A[n,μ,ν,c] etc. (using different letters
for different spaces) makes it pretty easy to visually check that you are
getting this right. This is one of the attractions of the notation, even on
paper. Unfortunately np.einsum's string notation makes this harder to see, as
the indices aren't adjacent to the variable name.

[1]
[https://github.com/mcabbott/TensorCast.jl#checking](https://github.com/mcabbott/TensorCast.jl#checking)
(Julia)

[2] [https://github.com/ofnote/tsalib](https://github.com/ofnote/tsalib)
(Python)

~~~
claudius
> I made a thing to audit that you are consistent with your indices [1] as an
> alternative to explicit named-tensor objects. And found, but have not used,
> another package in a similar spirit [2].

Cool, I will have to check this out!

> Although to be honest, simply writing A[n,μ,ν,c] etc. (using different
> letters for different spaces) makes it pretty easy to visually check that
> you are getting this right. This is one of the attractions of the notation,
> even on paper. Unfortunately np.einsum's string notation makes this harder
> to see, as the indices aren't adjacent to the variable name.

The problem is not so much the letter-space association but also handling the
ordering of the spaces inside the tensor. For example in my code, to do a
contraction over two indices, you could do prod<2>(a, b,
"tlx,tr,p1,p2|tlx,p1,tl|tl,p2,tr") where the result would then have index
order tl,p2,tr. The problem was then that changing the index order in one
place (e.g. for performance reasons) meant having to re-check all other places
where this is used. If you want to contract a tensor network like (d) in [1],
this quickly gets complicated. With named indices, the above becomes a * b and
if I change the index order in any place, it gets automatically changed there,
too.

[1]
[https://journals.aps.org/prb/article/10.1103/PhysRevB.81.165...](https://journals.aps.org/prb/article/10.1103/PhysRevB.81.165104/figures/8/medium)

~~~
mcabbott
Right, changing the order would be a pain. Although if the reason for doing so
is memory layout (for speed), then a lazy permutedims(A) would decouple index
order from this. Perhaps when creating a tensor for the first time there ought
to be a way to specify the layout? Haven't thought much but something like
A[n^4, μ,ν, c^1] := .... would not be hard to do.

In your prod<2>(a, b, ... example, if p1 and p2 are in the same space, how
would a*b know which one to contract? Or do they have different names from
when a was created?

~~~
claudius
> In your prod<2>(a, b, ... example, if p1 and p2 are in the same space, how
> would a*b know which one to contract? Or do they have different names from
> when a was created?

ITensor introduced this concept and I mostly just followed their lead – spaces
have unique names and tensor legs have a name label and a "prime level". So
for example an operator O: A → A would have one leg labelled a[uuid]-prime0
and another leg a[uuid]-prime1. Similar to how one might write O: A → A has
elements O_{a a’} when writing it down on paper.

------
Xcelerate
I remember when I was learning matrix calculus and realized at some point that
it was _much_ simpler to convert everything to index notation, perform all
operations, then convert everything back to standard notation at the end. It
became almost comically simple, because you're "just" working with labeled
scalars at that point. To be fair, it's convenient to memorize some of the
more commonly used expressions (like ∂tr(AB)/∂B) rather than rederive them
from scratch.

~~~
krastanov
Roger Penrose (the famous mathematician) has been saying the same thing for a
couple of decades (going even further than that, check out Penrose diagrams).

~~~
jesuslop
And Penrose diagrams (tensor networks) became a prominent example of String
Diagram in the monoidal category of vector spaces over a field.

------
stared
Right now I work on Tensor diagram notation for deep learning (for project
"thinking in tensors, writing in PyTorch").

To read more about it, see: [https://medium.com/@pmigdal/in-the-topic-of-
diagrams-i-did-w...](https://medium.com/@pmigdal/in-the-topic-of-diagrams-i-
did-write-a-review-simple-diagrams-of-convoluted-neural-networks-6418a63f9281)
(obviously, I refer to the post).

And if you want to create some, here is a short demo:
[https://jsfiddle.net/stared/8huz5gy7/](https://jsfiddle.net/stared/8huz5gy7/)

In general, I want to expand that to tensor structure (e.g. n, channel, x, y)
plus, translate it to the Einstein summation convention.

------
joppy
I've never really understood the point of Einstein notation, as a piece of
mathematical notation. Is writing something like A[i, j] * B[j, k] really that
much faster than writing something like Sum[j](A[i, j] * B[j, k])? Especially
when you have to check the left hand side of the equality sign just to know
which indices to sum over, it seems like making things less clear for a
minuscule saving on ink.

~~~
mhh__
Absolutely yes.

Doing tensor magic without einstein notation will make you shoot yourself, and
even you don't someone else will if publish it.

The only real problem I have with it personally is the abstraction of upper
and lower indices, which I constantly forget the conventions as to which is
which.

~~~
krastanov
This is interesting... I love the notation, but mainly because how upper and
lower indices make it easy to distinguish vectors and forms (not that it
matters in ML where everything is "euclidean").

~~~
antidesitter
> not that it matters in ML where everything is "euclidean"

Non-euclidean spaces are actually quite common in ML, but many people don’t
realize the spaces they’re working in are non-euclidean!

~~~
improbable22
Would be curious to hear what you have in mind here -- could you expand?

------
thecleaner
Awesome examples. However einsum fails to express convolutions. Also functions
that are applied element wise such as the sigmoid and softmax. All three are
crucial in deep learning.

------
TTPrograms
How do you einsum convolution? Arguably the single most important linear
operation in deep learning?

~~~
ssivark
Convolution is a linear operator (with a certain special structure). So
convolutions are represented just like any matrix-vector multiplication. Given
the special structure of the convolution operator, it can be represented in a
sparse manner, and what's more, the actual computation can also be implemented
in an efficient manner, compared to a naive matrix-vector multiplication.

TL;DR: It can be represented easily using Einstein notation. Einstein notation
just does not capture the sparsity properties we like; it represents the
_transformation_ properties quite nicely.

------
mlthoughts2018
Looks like this is a re-post of the same link from this somewhat recent post:
[https://news.ycombinator.com/item?id=16986759](https://news.ycombinator.com/item?id=16986759)

