Hacker News new | past | comments | ask | show | jobs | submit login
A Gentle Introduction to Tensors (2014) [pdf] (wustl.edu)
134 points by michaelsbradley 33 days ago | hide | past | favorite | 51 comments



I have always hated when tensors are defined as something whose coordinates are transformed in a certain way. I just find it inherently unfriendly and un-geometric. No matter how much talk is given about simpler cases such as scalars, vectors, covectors, etc., the final defining formula would still look to me as daunting as always. (There is nothing “gentle” about the formulas on page 14.) Surprisingly or not, the whole thing clicked for me when I eventually learned about tensors in a more abstract algebraic setting, where they are defined as multilinear forms. The coordinate transformation laws were very easy to understand and remember after that.

But if you want to learn about tensors from how their coordinates transform, here’s a treat:

https://www.youtube.com/watch?v=CliW7kSxxWU


I taught a module on tensors in my undergrad Mathematical Methods for Physics course last semester. The students had a weak mathematical background, so I had to explain better than any of the books and videos do.

The main problem with most introductions to the topic are that they deal with coordinate systems where the basis vectors are orthogonal, so the covariant and contravariant components are the same. You need to deal with non-orthogonal basis vectors, because then you realize that naturally there are two ways of defining basis vectors at a given point in space. And naturally these two different sets of basis vectors have different transformation properties under geometric transformations. This book [https://danfleisch.com/sgvt/] does a decent job of getting you started on this approach - of course, it has no rigor.


Yes! As a former Physics major who was puzzled by tensors for years (despite being relatively comfortable using and manipulating them), I actually had a similar breakthrough reading Fleisch's book and finally "getting" covariant and contravariant components via thinking about non-orthogonal basis vectors. Definitely one of those "why couldn't they have just explained it this way back when I was an undergrad?" moments. I got so excited when I was reading it that I literally got up, ran to the living room and subjected my partner (not a math or physics person at all) to an impromptu fifteen minute lecture with diagrams in multiple colors of ink. She wasn't as impressed.

(Fleisch's "Student's Guide to Waves" is also highly recommended as a book I wish I'd had when I was a student).


> The main problem with most introductions to the topic are that they deal with coordinate systems where the basis vectors are orthogonal, so the covariant and contravariant components are the same. You need to deal with non-orthogonal basis vectors, because then you realize that naturally there are two ways of defining basis vectors at a given point in space.

Or, preferably, not with basis vectors at all, but that notion seems to do violence to a physicist's way of thinking about linear algebra—I've never quite understood why the physical approach to the subject is so bound up in co-ordinates, when it seems like physicists would be one of the groups most likely to benefit from fully grokking a co-ordinate-free approach.


Because most physicists are ultimately interested in predicting the results of experiments with actual numbers. And that demands the use of coordinates.


> Because most physicists are ultimately interested in predicting the results of experiments with actual numbers. And that demands the use of coordinates.

Of course, at some point, you need numbers! But there's no reason that those numbers need to infest the whole computation; you can de-coordinatise as soon as possible and re-coordinatise as late as possible, and in between you not only can but must think in a coordinate-free fashion.

(I am a mathematician, not a physicist, and am not presuming to tell physicists how to do their job—this isn't an argument about whether they should do things this way. I'm just pointing out that they could, and that it seems not only possible but advantageous. But of course long-established knowledge of what actually works in physics beats this tyro's guess at what could.)


I can also recommend this great series on Tensor Calculus: https://www.youtube.com/watch?v=kGXr1SF3WmA&list=PLJHszsWbB6...


What is also confusing that for physics to be a tensor/vector only the transformation behavior under the orthogonal group is considered, so acceleration is a vector. Math and GR (ok, this is also physics) then uses the coordinate transforms on a manifold, so these are different things.


Could you recommend a text/paper/blog article which develops tensor theory from an algebraic setting?



Hm, that’s a good question! Pretty much any modern book on, or with a chapter on, multilinear algebra, or even a text with a modern treatment of differential geometry (smooth manifolds) would, I imagine, do a decent job. Also, searching the web for “tensors as multilinear forms” turns up quite a few promising links. For a video lecture that could give you a flavor of what tensors are from the linear-algebraic perspective, try

https://www.youtube.com/watch?v=4l-qzZOZt50

(You might get more mileage if you also watch the preceding lecture(s) of this excellent series.)


I think one of the issues that makes tensors difficult to understand is that different disciplines use the word in different ways. I'm going to generalize here a bit about how each group uses the term, but I suspect not everyone will agree with me. Apologies in advance!

For machine-learning types, a tensor is a multidimensaional array of numbers that admits certain operations (addition, subtraction, hadamard product, etc) [0]

For mathematicians, a tensor is a function F. We pass the function an ordered tuple of N vectors and M covectors (or, to keep things simple, N column vectors and M row vectors), and the function returns a scalar. The function F needs to be linear in each of the N vectors and M covectors. In this view, a matrix is a tensor with N=1, M=1. The operations used by machine learning types arise naturally once you crank the mathematical handle a little bit.

For physicists, a tensor is what mathematicians would refer to as a tensor field. So they take a space X (think R^N), and for each point in the space they associate a mathematician's tensor F as defined above. The properties of the function F are permitted to change from point to point. The Riemannian metric tensor is a classic example of 'tensor' according to this usage.

[0] https://machinelearningmastery.com/introduction-to-tensors-f...


This is so true!

For machine-learning and data science, tensors are largely what is called n-dimensional array. Numpy is a popular python library built basically around this data type.

Mathematical tensors in a particular base (sorry for the sloppy language, theoretical physicist here) can be displayed with such a data type, depsite the n-dimensional array data type (as available in numpy) lacks the (co)vector properties.

Funny enough, in theoretical physics there are two groups of people: The one prefering to write out complicated objects (i.e. tensors) with indices and the ones who do not (typically called abstract tensor notation in general relativity). The latter are much more close to mathematical physics and prefer to understand tensors as mappings from (co)vector spaces to scalars, similar to as you already wrote.

Most general relativity computer codes are written by the index people, not by the mathematical physicists ;-)


I think that tensors have a much more broad meaning to mathematicians. At least in pure mathematics and algebra, there is much more of a focus on the tensor product operation, a way of taking two (or more) vector spaces and producing a new vector space: their tensor product. Tensors are elements of this new space. In particular they don’t need to be multilinear functions on a product of vector spaces (for finite dimensional spaces there is not much difference, but for infinite-dimensional spaces there is a real difference between these two things).

Tensor products in this generality don’t need to have an (n, m) rank in the sense you’re describing, since they might be put together from completely different spaces. For example it’s perfectly fine to form the tensor product of a 2-dimensional space with a 5-dimensional space, yielding a 10-dimensional space.


That's a fair point about the tensor product being more central in algebra. But I think the 'friendliest' introduction to tensors is the multilinear function viewpoint. IMO it motivates the tensor product.

I'm not familiar with the infinite-dimensional case. Do you have an example of a tensor that isn't a multilinear function on a product of vector spaces? I'd be interested to refine my understanding here.


The easiest way to see it is cardinality: the space of linear functions on a countably-infinite dimensional vector space is uncountably-infinite dimensional. This is the reason why you can’t necessarily swap out vectors with linear functions on an infinite-dimensional vector space (if you restrict to functions with finite support or something, it’s ok).

A familiar example of a tensor product of countably infinite dimensional vector spaces would be polynomials in multiple variables. Say F[x] is the vector space of polynomials in x with coefficients in the field F, and F[x,y] is the space of polynomials in the variables x, y. For example x^2 - x is an element of F[x], and yx^2 - y^2 is an element of F[x,y]. Then it’s not hard to see that as vector spaces, F[x,y] is isomorphic to (F[x] tensor F[y]). A tensor in this new space is precisely a polynomial in two variables.

In the above it’s important to note that a polynomial has finitely many terms, so a power series like 1 + x + x^2 + … is not a polynomial. The space of power series is isomorphic to the space of linear functions on F[x], and is uncountably-infinite dimensional.


> Say F[x] is the vector space of polynomials in x with coefficients in the field F, and F[x,y] is the space of polynomials in the variables x, y. For example x^2 - x is an element of F[x], and yx^2 - y^2 is an element of F[x,y]. Then it’s not hard to see that as vector spaces, F[x,y] is isomorphic to (F[x] tensor F[y]). A tensor in this new space is precisely a polynomial in two variables.

I think I'm missing part of the argument. So we have a polynomial p in F[x,y]. The claim in my previous post basically says that there's always a way to associate p with a linear map that takes two elements of the dual space of F[x] and produces a real number. I don't see why that's impossible here.


That’s definitely possible (the function should take two elements in F[x], not in its dual space), the problem is that there are strictly more linear functions on the set of polynomials than there are polynomials. For example you will have trouble finding a polynomial representing the “evaluate at x=1” linear function F[x] -> F, since such a polynomial would have to have infinitely many terms.

So every polynomial could be represented as a linear function on polynomials, but not every linear function on polynomials is itself a polynomial.


I guess the problem is that saying "tensor products are spaces of multilinear functions on vector spaces" is tantamount to saying "vector spaces are spaces of multilinear functions on vector spaces", which is simply not true: the second set is strictly smaller than the first. For example, there is no space of linear functions on a vector space which is countably-infinite dimensional: they are all either finite-dimensional or countably infinite dimensional. Said another way, if we're talking about 1-fold tensor products, it is not right to say "a 1-fold tensor product of V is V*", since V itself is a perfectly fine 1-fold tensor product.

In order to have V* = V for an infinite-dimensional vector space V, you need to redefine V^* to some kind of restricted dual, rather than defining it as the set of all linear functions. In the polynomial example, if we take the space of all linear maps g: F[x] -> F such that g(x^n) = g(x^(n+1)) = ... = 0 for some n >> 0, then this restricted dual is isomorphic to F[x] again. But the evaluation map g(f) = f(1) is not in this restricted dual.

There are more reasons why confusing a vector space with its dual is a bad idea. For example you cannot cook up a map V -> V* without extra knowledge, for example a choice of basis of V or something. There are many examples in abstract algebra where there is a perfectly good vector space V, and absolutely no good choice of basis for V, so trying to identify elements of V with V* is unnatural. We may still be able to speak perfectly well of vectors in V or V*, but trying to identify V with V* is still unnatural. A good example is V = (functions R -> R). I can speak easily of elements of V (for example, x + sin(x)), and of elements of V* (for example, f -> integral of xf(x)), but trying to figure out which element in the dual either of these corresponds to is hopeless. We're better off just accepting at some point that there is a real difference between a vector space and its dual.


Got you. The countability argument is interesting. Thanks for the discussion, I feel like I learned something!


Not trying to be stubborn here - I just don't understand.

So we're talking about the case when tensors don’t need to be multilinear functions on a product of vector spaces.

Each element of V* is (trivially) a tensor and a linear function on V. Each element of V is (trivially) a tensor and a linear function on V*. However, not all linear functions on V* are in V. So V** is bigger than V. No problem so far.

But all elements of V** are functions on (trivial) products of vector spaces, and by definition all functions in V** are linear. So how have we misunderstood each other here?


> For mathematicians, a tensor is a function F. We pass the function an ordered tuple of N vectors and M covectors (or, to keep things simple, N column vectors and M row vectors), and the function returns a scalar. The function F needs to be linear in each of the N vectors and M covectors. In this view, a matrix is a tensor with N=1, M=1. The operations used by machine learning types arise naturally once you crank the mathematical handle a little bit.

As with almost all non-technical statements one can make about math (including my amendment—recursive nerd-sniping away), this is neither quite true nor quite false.

A tensor is an element of a tensor product—that's it. (But what tensor product? There are a lot of notions. I'll use the bare tensor product of non-topologised vector spaces.)

One kind of thing you can tensor is copies of a single vector space, and/or its dual. This is the sort of tensor product you likely have in mind. Tensoring copies of the dual allows you to feed in elements of the original vector space. Tensoring copies of the vector space allows you to feed in elements of the dual vector space (there is mathematically no intrinsic difference between a vector and a co-vector; having a favourite vector space in mind, you can talk about the elements of that vector space or of its dual, and—which is where the vector vs. co-vector terminology in physics comes from—about how coefficient vectors change when you change the basis of V), although, importantly, if your original vector space is infinite dimensional then the natural embedding into its double dual is not an isomorphism.

Which is to say, yes, the objects you describe all arise as mathematical tensors; but mathematical tensors can also describe much more.


> if your original vector space is infinite dimensional then the natural embedding into its double dual is not an isomorphism

Ok, so this feels like the crux of the matter. So how does the fact that V** is not isomorphic to V make the tensor product construction a more general concept than the linear function construction?


> Ok, so this feels like the crux of the matter. So how does the fact that V* is not isomorphic to V make the tensor product construction a more general concept than the linear function construction?

This doesn't really have anything to do with tensor products per se; it can already be seen with tensor products involving only a single factor, which are just vector spaces. Thinking of tensors only as functions means that, for example, one can never think of the original vector space V itself, only of its image in the double dual V^{**} (a fancy way of saying that you can evaluate a vector v \in V on an element v^* of the dual vector space V^* by evaluating v^* at v: in confusing but suggestive notation, v(v^*) = v^*(v)).

It's certainly true that you can do this, and, given the axiom of choice, you don't lose any information; you know everything about a vector v \in V by knowing its value on elements of V^* (which is to say, by knowing the values of elements of V^* on V). However, if V is infinite dimensional, then you are forcing yourself to carry around extra, possibly unwanted information: if you are taking bare algebraic duals, not topological duals, then V^{**} is inconceivably larger than V, which is to say that there are way more linear functionals on V^* than just those coming from evaluation at a fixed element of V.

You can fix some of this inconceivable largeness by knowing a little more structure carried by V, and by forcing your dual to reflect that structure—usually you know the topology, and ask that the dual consist of continuous functionals; and once there's topology, you start asking things of the tensor product, too. (For example, you probably don't want to take the vector-space tensor product of Hilbert spaces, but rather its completion in some suitable sense.) But, even with a more refined notion of duality, it's still only the nice spaces V that are identified with their double duals via the canonical map V \to V^{**}; the terminology is 'reflexive'.

(I think I caught all the asterisks that the Markdown parser ate the first time through.)


I am a math major and I am learning GR by myself right now. For GR, you need tensors (among many other things), and as a result I have gone through SEVERAL tutorials, books, videos etc.

There is ONE main thing I find lacking in all of these sources: computational examples/exercises.

My idea of a "gentle introduction to tensors" would be: motivation, definition and then immediately followed by computational problems (lots of them). Only then I would be comfortable with abstract definitions and proofs (which is my goal ultimately).

Edit: https://www.youtube.com/watch?v=5oeWX3NUhMA&list=PLFeEvEPtX_... is the most amazing lecture on Tensors I have watched so far, by far!


best of luck in your studies! Are you learning GR for a certain task or just for your own enjoyment?


Thank you!

For my own enjoyment, for now.


An easy to read (minimal theorems) but authoritative and modern introduction to tensors can be found in the book “An Introduction to Tensors and Group Theory for Physicists” by Jevanjee (https://www.springer.com/gp/book/9783319147932).

He takes the more geometrical perspective as a tensor as a multi-linear function of vectors, from which all other statements about tensors (eg how the components transform) follows straightforwardly. Lots of other great material in this book and best of all there are loads of examples.


Argh! People make this so much more complicated than it has to be.

A tensor of rank N is a vector in a space whose basis is a set of N-tuples of ordinary vectors. That's it. So a rank-1 tensor is just a regular vector. A rank-2 tensor is a vector in a space whose basis is ordered pairs of regular vectors, a rank-3 tensor is a vector in a space whose basis is ordered triples of regular vectors, and so on.


Here's where this is confusing (to me at least).

The dimensionality of a vector space is the number of scalars needed to build a vector in that space. That's easy.

I also think of "rank" as synonymous with "dimensionality" but it's not. Or at least it implies a different connotation of the word "dimensionality." It's the number of dimensions in the notation rather than in the space. Or something. And now I'm off the rails. And this is before I start trying to think about a basis being "ordered pairs of vectors" or tensor fields or about how tensors transform.


That's right. The dimensionality of the tensor space depends on both the rank of the tensor and the dimensionality of the underlying vector space. A rank-2 tensor based on 3-D physical space has 9 dimensions because there are 9 distinct pairs of the 3 underlying basis vectors. In general, a rank-N tensor on an M-dimensional space will have M^N dimensions.


Take a look at my definition of the "mathematician's tensor" in the top-level comments. The rank of a tensor is the number of row- and column-vectors you need to feed the tensor to get it to return a real number. AFAIK it's not related to 'rank' in the sense of the rank-nullity theorem.

To move from the mathematician's definition to the ML definition, pick a basis for your row and colum vectors. Now if you want the (i, j, ...) element of the multidimensional array, gather the vectors with one in the i-th position and zero elsewhere, the j-th position and zero elsewhere, etc. Then feed them to the tensor in that order.

It's easiest to see how it works with row vectors (rank (0,1)), column vectors (rank (1, 0)) and matrices (rank (1, 1)) and work from there.


There is in fact another definition of “tensor rank” which has everything to do with the rank of a matrix. For a tensor t in a tensor product of vector spaces VxW, define the rank of t to be the least number of summands possible in an expression t = v1xw1 + … + vnxwn.

If t is zero, then it rank is zero. If the tensor product is Vx(dual V), ie of type (n,m)=(1,1), then a tensor t can be considered as a matrix, and its tensor rank is the same thing as its matrix rank. You’re basically looking for the smallest way of writing the matrix as a sum of outer products of row and column vectors.

If you’re into quantum physics, then tensors of rank 0 or 1 are non-entangled, and tensors of rank 2 or more are entangled states.


Nice. Makes sense, thanks for the explanation


I never studied tensors, but for a very first ultra basic understanding of tensors I really liked Dan Fleisch's video:

https://www.youtube.com/watch?v=f5liqUk0ZTw&t=3s

He also has some great "Student's Guide" books about tensors and other subjects.


What is a tensor part 1: https://youtu.be/MkYEh0UJKcE What is a tensor Part 2: https://youtu.be/Lkpmd5-mpHY Paper: https://arxiv.org/abs/2106.08090


"A Gentle Introduction to Tensors" which starts with not one word about what a tensor is, or why you might wish to be introduced to one. Good stuff.


A Gentle Introduction to Tensors: just learn Clifford/Geometric Algebra


The author is clearly more comfortable with deploying beautifully laid out mathematics than prose. The introduction or "Opening Remarks" is lovely and well written but is nearly a wall of text. There is one long first paragraph followed by some staccato afterthoughts.

The author wavers between I and we. Am I your friend, guiding you through the maze or are we lecturing you on something? The author needs to settle on either one identity or spell out when they become one or another.

Sometimes, you might wish to appear as a friend hovering over the shoulder and provide hints as to the right direction to follow and at other times you might deploy something that will give LaTeX a headache and pull out the Vox Dei stop on the 250 ton Organ and destroy nearby eardrums.

I love the paper and it is saved locally.


I've always thought of it as a journey that we're going on together, where we may do something, expect something, find something, or conclude something - together!

Perhaps I'm finding camaraderie where there is none. How disappointing.


Yet to see a "Gentle Introduction to" that didn't freely and accurately translate to poorly written, impenetrable prose whose purpose is to gratify the ego of the author in demonstrating how they are unbelievably more clever than any reader who could conceivably require any kind of "introduction"

Is that bad luck on my part? (I've given up on thinking I'm particularly or especially dense - while making no claims to be some species of super-genius) Or is that the generally accepted genre?


Many of the scientific treatises of today are formulated in a half-mystical language, as though to impress the reader with the uncomfortable feeling that he is in the permanent presence of a superman.

Cornelius Lanczos


After having had difficult topics explained remarkably well by a few math teachers, I'm convinced that the "gentle introductions" you're talking about are meant to be confusing. Or perhaps, in some cases, they don't grasp the material as well as they let on.

edit: For anyone who wants examples of good introductions, check out 3Blue1Brown's YouTube channel.


Or, the infamous Curse of Knowledge?

https://en.wikipedia.org/wiki/Curse_of_knowledge


How should it have been done. Tensors are pretty complicated. It's like advanced calc plus advanced linear algebra. Maybe watch dome youtube videos on general relativity. Those are sometimes better than texts because clarity is essential.


There are great tutorials, textbooks, explanations, sibling post mentions Grant Sanderson's 3B1B youtube channel and I agree with that example whole heartedly.

I haven't seen or read any that have the words in their title "A Gentle introduction..." It's those words I'm talking about as an indicator of being anything but what they claim in the general case. I'm very open to counter examples.


There's a flagged dead reply to this that attacks me. I think the author has missed the point and the point isn't very important anyway. I agree totally that the author of this or any other freely available material owes me nothing. Nor do I owe them.

I mention this only because I don't think the comment should be flagged or dead. Ymmv.


>gratify the ego of the author

I'm sorry that you're not mathematically literate but that is no reason to cast aspersions on expositors that have

1) zero obligation to cater to your needs

2) receive zero compensation for producing such expository pieces (since they don't count towards publication records)

>the generally accepted genre?

Indeed "gentle introduction" does not mean spoon food, it means exposition that includes niceties like motivation, examples, and diagrams. It is gentle relative to a research monograph (at which I'm sure your indignation would be astronomical).

Here's my recommendation to you on how to learn mathematics if you're serious about it but don't have the patience to struggle through "gentle introductions" like absolutely every single other practicing mathematician did: head down to your local university math department and ask about grad students willing to tutor. The going rate is ~$100/hr for the kind of spoon feeding you seem to be looking for. Quite steep I know but it's highly skilled labor after all (I'm sure you make about that as a software dev for whom this is above their skillset). But you'll be interested to that it's an inversely sliding scale for just how much spoon feeding is necessary (I personally go as low as $25 for very highly motivated students).


As you know, you can't attack other users like this on HN, regardless of how knowledgeable you are or feel you are. Therefore I've banned this account. If you don't want to keep getting banned on HN, please follow the site guidelines.

https://news.ycombinator.com/newsguidelines.html


Your shop, your rules, obviously. I think that's a shame fwiw and write now against any bans based on a cursory read of their last 3 months of comments or so. As I also did against the disappearing of the comment itself.

You may know better for many reasons. I note our concerns and goals are not necessarily aligned. YMMV.


When someone has a long history of abusing HN, the standards are different. Normally I'd just post a warning in cases like this. Actually, even if this were the first time this particular account had posted abusively, I'd have used a warning rather than a ban. But there are enough abusive comments in this account's history also, and this comment was a particularly clear case of reverting to a long-established pattern.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: