Hacker News new | past | comments | ask | show | jobs | submit login
Eigenvectors from eigenvalues (terrytao.wordpress.com)
529 points by bigpumpkin 9 months ago | hide | past | favorite | 86 comments



There's a backstory to this paper from the r/math subreddit [1]. One of the authors initially posted the question there, and stumbling upon a similar question on math overflow emailed Terry Tao. And surprisingly, he responded.

[1] https://www.reddit.com/r/math/comments/ci665j/linear_algebra...


Someone replied:

> Apparently this relationship was discovered in 2014 by a dutch mathematician or physician. https://arxiv.org/abs/1401.4580

https://old.reddit.com/r/math/comments/ci665j/linear_algebra...

I don't know anything about the subject, I'm curious what they mean by that.


Discovered, sort of. His method was nearly the same as the one brought to Tao, but he didn't make the connection to the full correlation between eigenvalues and eigenvectors when dealing with real numbers. Tao used the method to explicitly prove this correlation when it was discovered by Denton, Parke and Zhang.


He is a professor of telecommunication networks at the Delft University of Technology. see https://www.tudelft.nl/staff/p.f.a.vanmieghem/

edit: The apperantly renamed it Network Architecture & Services since I finished my education.


Not only responded, but responded in under 2 hours, which is amazing from a busy professor like Tao.


Not only responded in under 2 hours, but "Tao’s reply also included three independent proofs of the identity."

From the article we discussed earlier, https://www.quantamagazine.org/neutrinos-lead-to-unexpected-... https://news.ycombinator.com/item?id=21528425


> responded in under 2 hours

and gave 3 different proofs!


it indicates how important this result is!


Wow. This is so great.


That s great, but any good bachelor students should be able to give you a similar answer.



For those curious about applications:

Suppose you have a learning algorithm that requires the eigenvectors/eigenvalues for a large matrix. Suppose new data is streamed in that increase the size of the matrix. well, now you have an algorithm for updating your existing solution instead of needing to compute the entire thing from scratch. (consider updating principal components over time for example).

Suppose you want to accelerate the process of computing eigenvectors for a large matrix. Well, you can now do parallel reduction relatively trivially given the theorem.

*edit: wording.


Not to downplay the pure and applied mathematics implications. But the immediate discovery was made in the context of calculating neutrino oscillation probabilities. Yielding insight into matter / anti-matter imbalance in the known universe ;)

Eigenvalues: the Rosetta Stone for Neutrino Oscillations in Matter

https://arxiv.org/abs/1907.02534

Deep Underground Neutrino Experiment

https://en.wikipedia.org/wiki/Deep_Underground_Neutrino_Expe...


Don't get me wrong, I like your cites - but at the end of the day, math is math. Either it works or it doesn't. If it works, apply it where you can! I've got a sneaking suspicion a PCA based analyzer for lidar generated point clouds is going to be a lot faster.......


Can you go into more detail? I don’t see how that would work. In this algorithm, you need the eigenvalues of all of the minors. So if I’m adding rows/columns to the matrix, yes I have one of the minors from my last solution, but I have to compute N more. And I also need the eigenvalues of the new full matrix.

I think this result has some intriguing opportunities, and have been toying with them since yesterday’s post, but I don’t see something so straightforward.


I obviously haven't worked it out myself, but I think you can memoize the results from the smaller minors to accelerate the computation of the big one. So you would need to decide for yourself the speed/memory tradeoff.


I think what the parent comment meant is it can be recursive. If you update one row, a lot of these recursively small minors of minors won't change, and you probably can do the computation quicker.

I haven't fully worked out the math yet as well.


If the matrix is symmetric (or hermitian in the complex case).


Won't you need a way to figure out the sign of the vector, though? Am I wrong in that this method only output the norms?


Yes but whether such conditioning is important will depend on your domain. In many cases, just knowing the axis and two sample points is enough to fully determine the orientation.


But you would still need to know the eigenvalue λi(A) of the full matrix, no? How would you do it in this streaming setting?


So it's been checks notes 9 years since I've had to deal with eigenvectors and eigenvalues. So the post went way over my head, but folks on twitter were making quite a big deal over the post last night when it went up on arXiv. Could someone explain this like I have a computer engineering bachelors degree? :)


So, you have some infinite space, and now you transform it with a linear transformation A - that could include mirroring, rotation, stretching (but not translation - one of the properties of linear transformations is that the origin, zero, stays at the origin).

Now, there could be lines through the origin that map to themselves under this transformation. These lines are called eigenvectors of A, and they characterise the transformation A. (If you mirror something, a line in the mirror plane maps to itself. If you rotate something, the rotation axis maps to itself. If you stretch something, a line in the direction you stretch in might map to itself. And so on.)

Now, these lines map onto themselves, but they might be stretched or shrunk in the process - that's given by the Eigenvalue.

The famous equation is A x = lambda x: You transform eigenvector x by A, and you get back x itself, but stretched by a factor lambda (the Eigenvalue).

So, if you have some mapping of 5d space to itself, for example, it can be characterised by 5 eigenvectors (lines that map unto themselves), and 5 eigenvalues (the stretch factor for each of these lines).

Now, "traditionally", you compute the eigenvector by solving a 5d linear system for each of the eigenvalues. (5 eigenvalues, 5 slightly modified linear systems, each gives you an eigenvector of 5 coefficients, for a grand total of 25 eigenvector coefficients).

Here, it is shown that you can compute the eigenvectors of the full system without even solving that traditional linear system, but by instead taking the eigenvalues of a slightly modified smaller system. (5 eigenvalues, 5 slightly modified smaller systems M (that you get from A by striking out the i'th column and row). Each of those has 4 Eigenvalues, so you have a total of 5 + 4x5 = 25 Eigenvalues. Now, you compute some differences and products and ratios of those numbers, and hey presto, you get your 25 Eigenvector coefficients (thus the title, "Eigenvectors from Eigenvalues").

Pretty neat, though it's unclear to me how the computational complexity really compares.

EDIT: formatting, reference to title


Where were you in my intro linear algebra courses?


It's something like an economist finding a $20 dollar bill on the ground and it's actually a real $20 bill that no one has picked up yet.


Finding a $20 bill on Times Square


More like a gold sovereign in Piccadilly. I only skimmed, but it's a sweet result, and something an insightful undergrad could have come up with, but never did.

Linear algebra for large scale "muh big data" applications has had a mini renaissance in recent years. Haven't seen any this cute and basic though.


Don’t know how simple you want me to go, so apologies if required.

You’ve got a matrix A, ie a 2D array of numbers.

You’ve got a vector v, that is a 1D array of numbers.

You can multiply A by v, written Av, to get a new vector.

Matrices can therefore be viewed as devices for taking a vector and giving a new one, ie as linear transformations. You can design a matrix to rotate vectors, stretch them etc.

An eigenvector of a matrix is a vector that changes only by a constant factor c, when you apply (multiply by) the matrix. So Av = kv where k is a number. For example, if k=2 then applying A just doubles every number in v. Eigenvectors are stretched or squashed by the matrix, no funny business like shearing or rotating.

An eigenvalue is the “k” mentioned above, ie there’s at least one eigenvector such that Av=kv. There could be multiple eigenvectors all with the same k, ie all changed in the same simple way by applying the matrix.

Ok? Slight complication: matrix A may contain complex numbers, not just your common garden variety real numbers. But the above all still applies.

“Symmetric matrices” are matrices that have a symmetry across the top left to bottom right diagonal. So the number at A[i][j] is the same number as that at A[j][i].

“Hermitian” matrices are the equivalent of symmetric matrices when we’re dealing in complex numbers. Rather than A[i][j] being equal to A[j][i], if one entry is a complex number then the other entry is the “complex conjugate” of the first entry.

(In case you’ve forgotten complex numbers: we extend the real numbers to including numbers of the form r + zi, where r is the “real part” (just a real number) and “zi” is a real number z multiplied by i, where i = sqrt(-1). Imaginary numbers were introduced to allow us to solve equations we couldn’t otherwise solve, but have turned out to be super useful everywhere. The “complex conjugate” of an imaginary number is obtained by flipping the sign of z, eg (2 + 3i) -> (2 - 3i).)

Anyway, so Hermitian matrices are “symmetric” in this way and turn out to be super useful in physics. They also have the property that when you multiply a Hermitian matrix H by a vector v, the possible eigenvalues are real numbers, despite the fact that H may contain complex numbers.

What the paper states is a relationship between the eigenvalues and eigenvectors of a Hermitian matrix H and the eigenvalues of a smaller “submatrix” matrix H’, where H’ is just the same as H but with a column and row removed, where the index of the row and column is the same. This is neat, because it can be used to speed up the calculation of eigenvectors (as important as say array sorting in CS) in some specific situations. Commenting further is beyond me.


Thanks! It's been 15 years since I did any linear algebra or had any use for eigenvectors or matrices, and I could follow this. Much appreciated.


ELI(Bach of Comp Sci) went very well


You might prefer the earlier post, https://news.ycombinator.com/item?id=21528425

Basically, this allows you to compute eigenvectors from eigenvalues of minor matrices. It is an interesting identity, but speculating on its applications is beyond my expertise.


Somewhat related to some of the questions that have been raised here:

- for the set of matrices that possess them ("transformation matrices that only perform stretch/contract"), eigenvectors (with their associated eigenvalues) play a role quite analogous to the role primes play in the integer set. They provide a unique, identifying "spectrum" for said matrix. This is made explicit by eigendecomposition (spectral decomposition).

- with extension via singular value decomposition (SVD method) to any square matrix (e.g. "transformation matrices that might also shear, rotate"), certain operations such as exponentiation of the square matrix can performed very quickly once eigenvectors/eigenvalues have been obtained via the SVD method.


Wow... your comment is infact more useful pearl of wisdom for getting the intuition of Eigen-values/vectors. Can you suggest some book/reference that best conveys this intuition ? Somehow, Gilbert Strang's Linear Algebra does put me to sleep :/


Linear algebra done right by Sheldon axler

Linear algebra that focuses on matrices of numbers is missing the point and a proper numerical linear algebra class will teach them better anyway


I would recommend Down with Determinants! instead. Same author, and was the inspiration for that textbook.

https://www.maa.org/sites/default/files/pdf/awards/Axler-For...

As a graduate student at the time, it opened my eyes about a whole bunch of theorems that I thought I already knew. And it is a lot more approachable than a textbook.

It won an award for the best piece of expository writing on mathematics in 1996. It was a deserved win. :-)


I recommend the Linear Algebra series of YouTube videos by 3blue1brown.


>for the set of matrices that possess them ("transformation matrices that only perform stretch/contract"), eigenvectors

Minor nitpick: this should end with “real eigenvectors.” Rotation matrices certainly have eigenvectors, they’re just complex.


Before you excitedly jump to coding a new algorithm to produce eigenvectors from eigenvalues, it's worth noting that:

    . This only applies to hermitian matrices and is therefore of limited scope.

    . The identity only produces the magnitudes of the coefficients of the eigenvectors. The signs of the coefficient sare still unknown.

    . Even if knowing only the magnitude of the coefficients is somehow useful, this is still very expensive to compute - just like Cramer rule which is of great theoretical interest, but has very little practical use when actually computing a matrix inverse or a determinant.
None of the above diminishes the importance of the result, btw.


Came here to pretty much write these same notes. The fact that you need to know the spectrum of all the minors really limits the usefulness for solving general eigen* problems. But I can imagine that this could make a great starting point for all kinds of approximate calculations/asymptotic analyses.

Regarding the first caveat that you bring up though, whereas the problem statement says you need a Hermitian matrix I think the results should generalize to non-hermitian matrices. In particular take a look at the third proof of the theorem at the end of the article. The only assumption required here is that the eigenvalues are simple (which does not preclude them being complex/coming in complex pairs).

Protip: I had to read the second step in the proof a few times before I could see what was going on. Explicitly what you do here is to (i) multiple from the right by the elementary unit vector e_j (ii) set up the matrix vector inverse using Cramer's rule (iii) notice that this matrix can be permuted to have a block diagonal element equal to the minor M_i with the other block diagonal entry equal to 1 and the corresponding column equal filled with zeros (iv) Write the determinant in the block form, which simplifies to the expression involving M_i by itself then finally (v) multiply by e_j from the left to get scalar equation.


Despite the limited scope, this information is "out there" now, and could potentially be crucial for some future "useful" application. Not that I think you're trying to downplay it or anything, I'm just seeing this as another example of how the internet is still a good thing.


Since I’ve always been terrible at abstract linear algebra, I would request that someone do a simple problem, say finding the eigenvalues and eigenvectors of a 3 x 3 Hermitian matrix with three distinct eigenvalues. I’d also be curious under what circumstances the above method leads to a computational advantage.


Anybody?


For a second I thought this was going to be HN's darling "Eigenvectors AND Eigenvalues" explained visually http://setosa.io/ev/eigenvectors-and-eigenvalues/


I have a sneaky suspicion this finding can used to speed up expert system matching (e.g. Rete networks), which would affect rendering and compilation times for lots of software.


The entries of a matrix are eigenvalues of 1x1 minors, and clearly eigenvectors are a function of the entries.

That the result is a function of (n-1) minors is first-order information and would help clarify the default assumption that the result computes eigenvectors from the full Hermitian matrix's eigenvalues.


"The real question is whether the discovery is important or interesting. And whether it is surprising. [...] OK, why does Wolchover claim that it is interesting or surprising? Because they gave that formula to Terence Tao, a Fields Medal winner, for him to say something and he didn't know the formula and even expressed doubts. [...] So that's how it works. If you want to promote a new chocolate candybar, send a formula – or the nutritional values from the package – to someone like Terence Tao, to a darling of the media. The popular science writers will do the rest."

https://motls.blogspot.com/2019/11/hype-about-formula-for-ei...


I believe this is equivalent to the solution given by Gene Golub in 1973 for the constraint vector required to produce a specific set of stationary values in a linearly-constrained quadratic program where the constraint vector is unitary.

https://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=9B... (eq. 3.6)

I haven't looked at Tao's proofs to see if this is covered, nor do I have a clear formulation of the equivalence, but it certainly is striking.


When I see people who look similar I think of them as having similar Eigenfaces!


Eigenfaces are a thing! https://en.wikipedia.org/wiki/Eigenface


Reading german words, still don't understand anything...


This can probably be used to speed up Transfer Learning.


Does this mean you can test large neural networks with smaller ones?


Are your neural networks parameterized as Hermitian matrices? (No.)


Will it boost FPS in 3d games?


No, it doesn't speed anything relevant up. I'm curious as to what made you ask, though.


I suspect he was joking.


This is a good example of math being presented in an unnecessarily confusing way.

Values lambda_i(A) are introduced, but then in the formula also values lambda_i(M) are referenced for some M which is not A.

Basically, we have no idea what lambda_i(M) might mean from the statement of the theorem. Is the order lambda_1(M), lambda_2(M) ... related to the order lambda_1(A), lambda_2(A), ... How many lambda_i(M) exist? And so on.

Maybe this is all clear if you deal with minor matrices all the time, but when I see a statement like that I am not even interested in its proof anymore.


Terry is being abundantly clear. Why are you trying to correlate indices in two separate product series?


He is not being "abundantly clear". He doesn't give a definition of lambda_i(M) for the various M appearing in the theorem that are not A.


I think you know that lambda_i(M_j) is the ith eigenvalue of M_j. Order them how you like it won't change the identity.


Doesn't he say it here:

where {M_j} is the {n-1 \times n-1} Hermitian matrix formed by deleting the {j^{th}} row and column from {A}.


Yes, he says what M_j is, but he does not say what lambda_i(M_j) is.


Okay, Donald Knuth might be upset that Terry failed to introduce the notion of a spectrum space in a Matrix Analysis paper to an audience that should know about it already.

If I tell you f(x) = y, do you get upset that I don't define the domain, range, codomain, notion of a function, and so forth? Or if I tell you that https:// prepends a good portion of URLs on the world-wide web, would you subject it to the same scrutiny? I hope note -- to require such would be unnecessarily limiting.

My $0.02.


If you tell me that thing about https://, it's not math, it's just a conversation, so why would I care?

If you tell me that f(5) = 10, and then you tell me later on that now I can compute g(x) = f(x) - 4, for all x, from that knowledge, then I surely would wonder what you are talking about.


Just read your original opening statement. I think what everyone is telling you is that you were being unnecessarily hyperbolic towards an author that writes for a mathematical argument. If it offends you so much as you've indicated that you just "moved on," then so be it.


I still stand by my opening statement. Now, after the discussion here I understand what the theorem statement is saying. That's a lot of work I had to do to understand the theorem statement, and it would have been unnecessary, had the theorem been stated "syntactically" correctly.

Now why wasn't the theorem stated syntactically correct? If we had to state it in a mechanised theorem prover like Isabelle then we would extract the definition of the eigenvalues from the statement into a separate definition, like:

For a hermitian n x n matrix H, let lambda_1(A), ..., lambda_n(A) be the eigenvalues of H.

But of course, here we assume an order of the eigenvalues, and we don't really want that, because it is immaterial to what follows. So we rather have to write it like that:

Definition: For a matrix H, let L(H) be the set of its eigenvalues.

Theorem: If H is a n x n Hermitian matrix, then |L(H)| = n.

But, of course, this is not true, as |L(H)| < n is a possibility. So you would have to account for that somehow as well in your definition / notation.

So it is probably better to go back instead again, and explain the notation as follows:

Notation: For a Hermitian n x n matrix H, let lambda_1(H), ..., lambda_n(H) be its eigenvalues (in an arbitrary order).

Theorem: What Tao said.

The notation could still not be written in Isabelle like that, but at least it has proper scoping now.


And I am pretty sure that every single downvote I get for this discussion here is from somebody who doesn't properly understand the theorem statement.


Point of feedback: this assumption does not hold.


Some people also just downvote because they don't like something, although it is right.


True, but this isn't what you asserted.


Nope, it wasn't, thus my amendment.


Without objection.

Cheers.


He says that eigenvalues of A are the lambda_i(A), so it's pretty obvious that lambda_i(M_j) is the i-th eigen value of M_j.

What might not be obvious is the ordering of the eigenvalues, which must be the same for the formula to make sense.


No, the ordering of the eigenvalues of M doesn't matter, because we take their product (or rather, the product of their difference from a fixed number), and multiplication commutes. It's maybe not obvious, and as such I can understand auggierose's complaint to an extent, though it was maybe unnecessarily harshly phrased.


You're right, my bad.


Well, you see, the ordering of the eigenvalues doesn't really matter. What, that's not obvious to you from the statement?


Yes, it is written for people who deal with minor matricies all the time. Academic papers are incremental materializations -- it is up to the reader to bring or build the requisite base understanding.


Well, I am sure the meat of the theorem is just in those eigenvalues of the minors. Which are not even defined in the statement of the theorem. That's not academic. That's just bad style.


He also didn't define eigenvalue in the theorem statement, but you don't seem to be criticizing that as bad style.

Part of the art of mathematical exposition is to walk the line between under and over explaining your process based on your audience.

If the intended audience is academic mathematicians or even just people with a math degree, then the notation is probably familiar. I imagine that's the target audience here.


I can now say exactly what I don't like about the theorem statement: That the definition of the eigenvalue notation was not well-scoped. This made it much harder for me to understand what is going on, and has nothing to do with the actual semantic meaning of eigenvalues. Caring about proper scoping is also something mathematicians usually do not, making everything more complicated than it has to be in the process.


Actually...yeah the two orderings are related by the Cauchy Interlace Theorem: https://www.semanticscholar.org/paper/Cauchy%27s-Interlace-T....


In that formula the order of the eigenvalues doesn't matter...


The order of the eigenvalues does not really matter in that formula, no. Thank you for decoding that for me.

Still, how many lambda_i(M_j) are there? From the formula I am led to believe there are n-1 such eigenvalues of M_j.


They're expecting you to know the spectral theorem for Hermitian matrices. Since M_j is obtained by striking out the jth row and jth column, it too is Hermitian, so it has has n-1 real eigenvalues (possibly with multiplicity). The notation $\lambda_i(M)$ ought to refer to the $i$th eigenvalue of M in sorted order to be unambiguous, but this does not matter for the proof.


M is hermitian and an (n-1) x (n-1) matrix so that would be reasonable.


Come on, if I defined f(x)=x^2, and later used f(y) for y=x/2 or something, do you have problem with that?




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: