Visual Explanation of the Conjugate Gradient Algorithm 83 points by mercurymercury 11 days ago | hide | past | favorite | 12 comments

 I always thought it was simpler to explain by using the only trick linear algebra has: switch to a basis where your problem is easiest, then switch back.Sylvester's law of inertia proves existence + gram-schmidt constructs that change of basis
 I was recently re-reading how to solve systems via QR and realized that the key trick was switching basis. But it took me a while to understand that, in between all the extra stuff. Now you are telling me that most of LA is like that and wow, so many things make sense now!Do you have any additional linear algebra tricks?
 Linear algebra done right is IMO the best book on it.But a lot of it is simply exploiting linearity to reduce to a working in the most convenient subspaces. Finding and constructing them is a major task.
 I'd go so far as to say if your LA class did not instill that one value it was more a matrix algebra and methods (nothing wrong w/ that....v useful) class.The matrix is simply the model of the linear space. Coordinate free-ness lets you get away from coefficient chasing and tedium to focus on the real stuff.Esp important in infinite dimensional vector spaces where many theorems cannot rely on a basis existing as it is equivalent to the axiom of choice.
 If you haven't watched the videos frome 3blue1brown on YouTube, you definitely should.
 >any additional linear algebra tricks? >> the only trick linear algebra has
 Lol that trick also covers much of quantum mechanics, and relativity. (The gauge in “gauge theory” is a basis, btw)
 A large amount of mathematics is trying to find the easiest way to express a problem, changing to that view, solving the simpler problem, then going back to your original formulationSolving indefinite integrals by substitution for example. Justified by the chain rule.
 CG, most elegant of quadratic methods. Now let’s talk about why you wouldn’t use it. 1. In the form given, you need to know the terms of the quadratic. 2. If you know the terms, you can compute a stable point using QR, which is more efficient. 3. IIRC CG has better numerical stability than QR, but only under strict quadratic assumptions. 4. If you don’t have the form of the objective, but you know it’s quadratic, and you can compute a gradient, then it also works, but errors in the gradient compound. 5. If it’s not quadratic, then you have the same issue. You can try to account for this using resets or more complicated methods, but then, 6. For so many problems, the stability proof becomes too onerous or the evaluations become less efficient than simply doing more steps of gradient descent, which is why this is still the dominant method in neural networks, despite so much effort on ‘our’ part to find more efficient solvers.
 Here is another resource I came across when learning about the conjugate gradient method for a class on finite elements. I wish I had found this back then!
 Thanks ! Are there other great math papers or articles with a similar "without the agonizing pain" approach, for other topics that might be interesting ?
 This is actually the basis of the online article, see the acknowledgments.

Applications are open for YC Winter 2021

Search: