Unfortunately I'm one of those people who tends to reject the process until I understand why it works.
If it wasn't for Strang's thoughtful and sometimes even entertaining lectures via OCW, I probably would have failed the course. Instead, as the material became considerably more abstract and actually required understanding, I had my strongest exam scores. I didn't even pay attention in class. I finished with an A. Although my first exam was a 70/100, below the class average, the fact that I got an A overall suggests how poorly the rest of the class must have done on the latter material, where I felt my strongest thanks to the videos.
So anyway, thank you Gilbert Strang.
After reading your comment and ansible's reply  I wanted to pause and comment on this.
The United States Air Force Academy found that its cadets who took their first calculus class with a professor who focused on conceptual understanding helped those cadets create a durable and flexible understanding of the math .
The kicker is that the cadets got worse scores in Calculus I and gave professors who taught in this way worse ratings.
Ansible's anecdotal reply is what a lot of students experience. A feeling of initial success with the material, but they later find that their knowledge of it was fleeting and inflexible.
What the Air Force Academy study found was that professors who taught in the manner ansible described, that resulted in fleeting and inflexible knowledge, were rated higher by their students. Those students got better initial scores in Calculus I, but went on to do worse in later calculus courses and related courses.
I encourage you to read the study. It is as good of a study design and execution you can get in the social sciences.
David Epstein also discusses the study in Chapter 4 of his book, Range .
The very best students loved it, but most of the people didn't like it at all.
With mathematics, like with gym, you gain when you put in effort. Most people don't enjoy either.
OK hopefully I didn't get too far afield. To me, the analogous concept in learning, particularly in technical fields, is that "learning is ignorance leaving the mind".
In college, particularly math and physics, I /always/ focused on understanding the underlying principles. Initially it was out of fear that if I forgot the formulas, I could re-derive them. But a strange thing happened... through that process, I developed an intuition and an ability to "see" what formulas and concepts to apply when. Once I got to that point in a problem, "seeing it for what it was", finishing to the solution became busywork.
The rock are the incentives, how your performance is measured, and the short duration you will have teaching these students.
The hard place is students who have likely spent 13 years in K-12 learning without understanding and are now being asked to do engage in practices they have little to no experience with.* They also have incentives to get good grades and a good GPA, which can be at odds with actual learning.
*To get more concrete, the practices have a name--Standards for Mathematical Practice (SMPs). The National Council of Teachers of Mathematics developed them and considers them the "heart and soul" of the Common Core Mathematics Standards. Not only are these practices absent from most classrooms, all too many teachers are not even aware of them! (see my Notch Generation reply to Sriram to understand why)
Apologies if my original reply made it seem like it can't.
Why don't teaching systems in America incorporate both the majority of the time?
Two major reasons:
1. Cultural inertia. Most teachers emulate the pedagogy that they experienced in their schooling. Some are aware that you can try to mix conceptual+procedural and try to. I call them the "notch generation"- trying to teach in a way that is different than they were taught. It's hard to do because...
2. The system is not designed to accommodate it. Incentives and higher order effects all conspire with cultural inertia to thwart it.
I've worked in both K-12 and post-secondary education, studied the history of education reform in the United States, and visited schools/met teachers/students/etc that I've connected with across the U.S.
I'm always interested in hearing someone's story about school, how it did/didn't meet their needs, and how it has impacted them.
I had a similar, though sort of opposite experience.
In high school, I breezed through the material, and started teaching myself calculus during the summer to prepare for university. Other than being a lazy student, I had no problems taking the 2nd semester advanced calc 2 and 3 courses my freshman year. I totally get what's being taught. There weren't a ton of practical examples, but I can easily see (for example) what the purpose of integration is, and how and why you'd do it in two or more dimensions. I could work the equations, no problem. Everything is great.
Along comes sophomore year, and still thinking I am hot stuff, I take advanced linear algebra and differential equations. More of the same, I thought.
Well... we seemed to spend the entire semester just solving different kinds of equations. No explanations given as to what they are for, where they are used, or what the point of any of it was. I struggled, for the very first time.
I either got a D or F for the mid-term exam, which was shocking to me.
We had one chapter where we were doing something practical. This is where you have a water tank, and a hole in to bottom. Because the pressure lessens as the tank empties, the flow rate is not constant. However, you can solve this via diff equations, and I really grokked it. I finally saw the point for some of what we had been doing. But it was just that one chapter, we skipped any other practical aspects for what we were studying.
I did end up pulling out a 'C' with that class, to my relief. Sure, most of the blame for my lousy performance must rest with me, because of my poor study habits. And a little blame can go to the TA, who wasn't a good communicator, so that hour every week was kind of useless. But I also blame the material and how it was presented.
Some places use a rigorous "proof-theoretic" approach in math curricula. It's much harder and takes more time, but it's better than merely grinding on hundreds of easy calc-101/diff-eq problems, because students gain an understanding that doesn't erode as easily once they forget "the tricks".
More CS, engineering and science students, IMHO, should dabble in math department courses beyond the the usual "required" sequence for their majors. It can be eye-opening and provide long lasting benefit to take a hardcore real-analysis course, abstract algebra or a number of other courses in math.
That was absolutely not allowed at my faculty (admittely computational linguistics, but I would have massively benefited from math courses). No courses other than the predefined ones, no matter how relevant. Now I have to learn so much afterwards, it's not even funny.
The sad thing is these problems start well before university when high schools pressure students into "advanced" math coursework without demonstrating mastery of previous topics. It builds a shaky foundation and sets the student up for a lot of needless difficulty later on.
Much better to slow down, focus on fundamentals early on and then build breadth in university coursework.
I came in for the first exam, sat there for maybe 15 minutes reading the questions, and realized I had no idea how to solve any of them.
Luckily it was before the drop date! That was a turning point where I decided to only take classes that seemed fun. For me that was discrete math, number theory, abstract algebra, etc.
My only regret is that I took the class as a six week short course. I think my recall would be better if I had taken the full semester. We covered all the material, but missed out on the longer spaced repetition. Linear Algebra was by far my favorite pure math course, I hope to revisit it soon. Maybe Strang's lectures are the way to do that.
I particularly like his videos because he breaks them down into small bites that are easy to work into your day and he's a great teacher.
Second exam was 85/100, the highest between C.S. and Automation Engineer (both lectured by that first professor). While I do agree that a good teacher can pave the way for a good student, I think most of the work you have to do it yourself, as if your life depend on it (mine did).
At some point it's like "Wait, is linear algebra really just about heaps of multiplication and addition? Like every dimension gets multiplied by values for every dimension, and values 0 and 1 are way more interesting than I previously appreciated. That funny identity matrix with the diagonal 1s in a sea of 0s, that's just an orthonormal basis where each corresponding dimension's axis is getting 100% of the multiplication like a noop. This is ridiculously simple yet unlocks an entire new world of understanding, why the hell couldn't my textbooks explain it in these terms on page 1? FML"
I'm still a noob when it comes to linear algebra and 3D stuff, but it feels like all the textbooks in the world couldn't have taught me what some hands-on 3D graphics programming impressed upon me rather quickly. Maybe my understanding is all wrong, feel free to correct me, as my understanding on this subject is entirely self-taught.
I wouldn't say it is all wrong. Just that the stuff you are talking about is a very tiny fraction of LA. I took a graduate class in LA, based on Strang's book. I have the book right here in front of me. So the stuff you allude to, i.e. rotation matrix, reflection matrix & projection matrix, is on p130 of Chapter 2. We got to that in the 1st month of the semester, & it got about 1 hour of classtime total. That's it. An LA class is like 4 months, or 50 hours. If the point of LA to derive those matrices so one can do 3D computer graphics with scaling, rotation & projection ? No, that stuff is too basic. We got 1 homework problem on that, that's it.
The stuff that most of the class struggled with ( & still struggle with, because Strang goes over it rather quickly in his book), is function spaces ( chapter 3, p182), Gram Schmidt for functions ( p184), FFTs, (p195), fibonacci & lucas numbers (p255), the whole stability of differential equations chapter ( he gives these hard and fast rules like a Differential Equation is stable if trace is negative & determinant is positive, but its not too clear why. ), quadratic forms & minimum principles - that whole 6th chapter glosses over too much material imo.
Overall, Strang's book is a solid A+ on how to get stuff done, but maybe a B- on why stuff works the way it works. Like, why should I find Rayleigh quotient if I want to minimize one quadratic divided by another ? Strang just says, do it & you'll get the minimum. How to find a quadratic over [-1,1] that is the least distance away from a cubic in that same space ? Again, Strang gives a method but the why part of it is quite mysterious.
So does LA get substantially more involved than just lots of multiplications and additions or is it always at the end of the day still just bags of floats getting multiplied and summed? Is it just a fantastic rabbit hole describing what values you put where in those bags of numbers?
One advantage of linear algebra is that it is, well, linear. Linear is nice. It means you can decompose things into their independent elements, and put them all together again, without loss. The monad interface, as simple as it is, is not linear; specific implementations of it can have levels of complexity more like a Turing machine.
However, things like the vector space of polynomials of degree at most n, the vector space of all homomorphisms between two vector spaces, the dual space of a vector space, etc. are all concepts that belong to linear algebra proper yet are more "abstract" than just "computations with matrices".
a -> M a
M a -> (a -> M b) -> M b
(A nearly-exact parallel can be seen in the Iterator interface. You can describe it as "a thing that walks through a container presenting the items in order"... and yeah, that's the majority use case and where the idea came from... but it's also wrong. What it really is is just "a thing that presents items in some order". It doesn't have to be from "a container". You can have an iterator that produces integers in order, or strings in lexigraphic order, or yields bytes from a socket as they come in, or other things that have no "container" anywhere to be found. If you have "from a container" in your mental model then those things are confusing; if you understand it simply as "presenting items in order" then having an iterator that just yields integers makes perfect sense. A lot of the Monad confusion comes from adding extra clauses to what it is. Though by no means all of it.)
The "aha" realization that the "container" can be an ephemeral concept and not resident at run time can come later.
FWIW, I think of IO as a container: it contains the risk of side-effects within. All the examples you gave are containers in their own way.
Abstract computer science doesn't.
Part of why Haskell appears like such an implacable curmudgeon is the predilection of its community to believe that users must grasp type and logic theory to use it.
Just like they don't need to have a mental model of their computer to write software for it.
It's not the 80s anymore.
This has inspired me to try to update my post on the idea in a side window, but it's been sitting on my hard drive for over a year now and probably still has a ways to go yet.
is the same as
a = x;
and the same as
g = f;
That's the monad laws. Whatever craziness you want to put in the semantics, those are properties you probably would like to preserve in your language.
Functional languages are really weird, for instance it's possible to switch line order of statements and the compiler will still figure out how to stitch that together. I think even JS in parts has or at least had that behaviour. (Actually that's useful when having mathematical formulas that are interdependent and you're too lazy to order them topologically by dependence)
On the other hand, just executing a sequence of commands in order to do I/O is only a normal thing to do since recently as far as I understand. The sweet spot for FP is IMHO something like React where state is strictly separated from the functions. (Imagine writing Hello World using Normal Maths)
(Please correct me if I'm wrong, which is probably quite likely ;))
The claim is that in other categories, there might be other natural combinations between two objects, for example a tensor product of Abelian groups combined with the integers Z as unit, or a composition of two endofunctors into a new endofunctor F ∘ F combined with the identity functor.
So the idea is that a monoid is somehow a destroyer of this combination operation; a monoid in sets un-combines the Cartesian product M × M back into the set M, and indeed this is a function (a set-arrow) from the combined objects to the underlying object.
By having an endofunctor combined with a natural transformation from F ∘ F back to F (natural transformations are the arrows in the category of endofunctors) a monad is therefore doing exactly what a monoid does, if you replace the "pre-monoid" combination step of the Cartesian product with instead a new "pre-monoid" combination step of endofunctor composition.
You don't even have to go 3D, just starting with the points of a rectangle in 2D and asking, "how do you put the edge points of this rectangle 10px to the left, rotate them 45° and stretch them 200% vertically?" and you've applied a matrix. Even if you're not using the fancy brackets, you're using a matrix, and understanding it.
Consider polynomials in X of degree up, but not including N. The powers 1,X,...,X^(n-1) form a basis. Then the coefficients of the polynomial can be put in a column vector. If D is the derivative operator, DX^n = nX^(n-1), so the derivative matrix can be expressed as a sparse matrix with D_(n,n+1) = n. Visually, it's a matrix with the integers 1,2,...,n-1 on the super-diagonal.
You can also see that this is a nilpotent matrix for finite N, since repeated multiplication sends the entries further up into the upper right corner.
You can extend this to the infinite case for formal power series in X, too, where you don't worry about convergence.
> Google's PageRank is a solution of a matrix equation, what does that matrix represent?
Isn't it just the adjacency matrix of a big graph?
Anyway, I agree with you. Matrices and linear algebra is a really good inspiration for higher level concepts like vector spaces and Hilbert spaces and so on. That's where the real power lies. But even in such general domains, matrices are often used to do concrete computations on them, because we have a lot of tools for matrices.
That's just one of dozens of things LA is "about"
> why the hell couldn't my textbooks explain it in these terms on page 1
Because you wouldn't have understood terms like
and because it would have been unhelpful to everyone else who want in LA for the exact same reason you were.
Being obviously in retrospect doesn't mean it was obvious in forespect. You had to learn the material first.
If you've never written a standalone software-rendered ray tracer, I found that to be a very useful exercise early on. There are plenty of tutorials for those on the interwebs.
I'm really thankful to MIT OCW for putting his lectures out for free -- in fact, I think I'll go donate to them now.
I hope you’re donating and actively contributing to many non-profit projects and that your comment comes from being tired of the world’s injustices rather than from callous impertinence, although I suspect it does not.
On paper, overhead is used for costs of running the place. In practice, it's used for things like upscale faculty clubs, million-dollar executive salaries, $200 million buildings, etc. MIT has among the highest overheads in the academy. Ironically, MIT claims its ocean yacht makes money rather than losing money (which could very well be true).
If you're okay with the majority of your money going to graft, donate to MIT. With a project like OCW, which has such a huge cost:benefit ratio, accepting the graft with the donation may be a rational decision, if you subscribe to a system of ethics like utilitarianism.
Personally, I almost never donate to a charity where the highest-earner makes more than I do. I think if everyone did that, MIT might lose some of the graft and corruption which has built up there over the years.
Here’s a scatterplot showing lots of research institutions’ rates. When this was published, MIT’s rate was slightly higher (54%). https://www.nature.com/news/indirect-costs-keeping-the-light... showing actual and calculated rates.
This also applies to federal research grants and is meant to cover costs associated with actually hosting the research (rent, utilities, support staff). Foundations can (and often do) negotiate lower rates. I’m not sure how donations are handled, but I don’t think the same F&A rates apply.
What's a little bit hidden there is that MIT dips into these funds multiple times, in pretty complex ways. For example, a sponsor might pay overhead and graduate student tuition (which just flows into MIT's general coffers). Or graduate student tuition might be waived, and the sponsor pays just overhead. Or a donor might pay overhead when the money comes in, and again on specific purchases. Or capital expenditures might waive overhead. Etc. The level of complexity is high, while the level of transparency is low.
I'm not claiming any of this is unique to MIT by any means. It's just where I have the most visibility.
What's cute is that MIT claims to lose money on everything. In the article you cited: "'We lose money on every piece of research that we do,' says Maria Zuber, vice-president for research at the Massachusetts Institute of Technology (MIT)." You'll find similar statements about educating undergraduates and tuition. And just about everything else. I've worked through the numbers at some point, and MIT only loses money with clever accounts; it's good PR to say MIT subsidize everything it does, but it's often not a reality.
As the number of student has exploded over the decades around them they've kept their student numbers the same as before even as they collect more and more money which they hoard with no concern for opportunity-cost-of-capital whatsoever. This has created an intensifying zero-sum battle for the admissions slots in these universities. Meanwhile the state systems are increasingly overwhelmed and sketchy private universities are increasingly scamming students on the edges.
By keeping their numbers so low, the Ivy League retains comfortably small classes, no major change in their overall mission and professorial lifestyle. At the student level they function to perpetuate privilege across the generations, especially via legacy admissions which seems like a gilded age concept I can hardly believe is still a thing in 2020.
I'm saying if the Ivy League were to change its mission to serve America and the world as much as they could instead, it would do a great deal to help the vast mass of striving students who are barely not making the cut, as well as the economy over the long run, and even reduce tensions in American democracy. And that change of mission probably can only happen if they start to ignore what their biggest donors think they should focus on.
While your view makes sense in the theoretical, once again human failings cause it to not work well in reality.
LA and Calculus can be studied independently in any order and then fruitfully combined later.
If you look at the order of topics in his book "An Introduction to Linear Algebra", you will find the topic "Linear Transformation" way back in chapter 8! Even after the chapters eigenvalue decomposition and singular value decomposition. But understanding that a matrix is just the representation of a linear transformation in a particular basis is probably the most important and first thing you should learn about matrices ...
You are onto something though. Strang is coming from a direction of numerical computations and algorithms for solving real-world problems. Pure mathematics departments for at least the past maybe 80 years often look down on numerical analysis, statistics, engineering, and natural science, and adopt a position that education of students should be optimized in the direction of helping them prove the maximally general results using the most abstract and technical machinery, with an unfortunate emphasis on symbol twiddling vs. examining concrete examples. By contrast, in the 19th century there was much more of a unified vision and more respect for computations and real-world problems. Gauss himself was employed throughout his career as an astronomer / geodesist, rather than as a mathematician, and arguably his most important work was inventing the method of least squares, which he used for interpreting astronomical observations.
With the rise of electronic computers, it is possible that the dominant 2050 vision of linear algebra and the dominant 1900 vision of linear algebra will be closer to each-other than either one is to a 1950 vision from a graduate course in a pure math department.
Take Hilbert spaces for example. They are based on linear algebra. They are quite general and you might argue that there's a lot of symbol twiddling there. However, Hilbert spaces are/were essential in the study of Quantum Mechanics, which we can argue is a very important topic.
And if you only stick with matrices and numerics, you're bound to get stuck in the numbers and details and miss the big picture. A lot of results are much cleaner to obtain once you divorce yourself from the concrete world of matrix representation.
Of course, we should probably have the best of both worlds. I'm not saying applications are unimportant. Take something like signal processing, which relies heavily on both numerics and general theory.
So I'd like to add something to your point. Math departments optimize the education of math students towards the more general, and perhaps students not interested in pursuing pure math should have course-work that reflects that.
I had this view when I took linear algebra as an undergraduate, but I have gradually changed on the subject over time. I took a standard "linear algebra for scientists and engineers" course but I found it too abstract at the time. The instructor rarely concentrated on examples and applications despite the more applied focus in the course title. Later I came to appreciate the abstraction, since it helped me understand more advanced mathematical topics unrelated to the "number-crunching" I originally associated the topic with. I now think the instructor had a more "unified" approach, but I didn't realize it at the time.
YMMV, but I find pure mathematicians treat computers as “push-button technology” much more than applied mathematicians.
Edit: But there are of course big exceptions there as well, for example Thomas Hales.
He mentions this sentiment in a lot of interviews and things too.
Then please do. I took several online linear algebra courses from sources I trust and they were pretty bad. Or let's put it another way: I'm a pretty clever guy and I was still left confused. Strang is excellent in the classroom, and I almost even like videos for learning now thanks to him (x1.5 speed is your friend). His videos should not be your only learning source, but judging his course only by the book might result in a lot of learners skipping what I found to be the best course by far. If you want to learn linear algebra, give Strang a try first and you might save a lot of time.
While this view certainly helps intuition at initial stages of learning, it is not "just" that, and computational methods involving matrices are of much more practical importance (similar to being able to add and multiply numbers which we are taught early in life) which is probably why the stress is on them first and foremost. Someone said, "learn to calculate, understanding will come later."
Even though I also use Linear Algebra mostly computationally today, the origin of it is in the geometry and I think this connection should come first. Also, "number crunching" is a boring way to learn things.
Though, "matrix way" can be good for engineers.
I found going through Linear Algebra Done Right to provide a good counterbalance to Strang’s book+lectures.
The rest was effectively preaching to the choir so those that already know linear algebra nodded their heads and idiots like me were still flummoxed
Books have pictures that do a pretty good job.
Animations are pretty and interestit but that isn't the same as teaching all the math.
> You are probably about to begin your second exposure to linear algebra. Unlike your first brush with the subject, which probably emphasized Euclidean spaces and matrices, this encounter will focus on abstract vector spaces and linear
But, I do foresee some difficulties. One thing that I find really difficult, for example, is that I take undergrads who have had linear algebra and ask "what is the determinant?" and seldom get back the "best" conceptual answer, "the determinant is the product of the eigenvalues." Like, this is math, the best answer should not be the only one, but it should be ideally the most popular. We would consider it a failure in my mind if the most popular explanation of the fundamental theorem of calculus was not some variation of "integrals undo derivatives and vice versa". I don't see this approach solving that. Furthermore there is a lot of focus from day one on this CR decomposition which serves to say that a linear transform from R^m to R^n might map to a subspace of R^n with smaller dimension r < min(m,n) and while in some sense this is true it is itself quite "unphysical"—if a matrix contains noisy entries then it will generally only be degenerate in this way with probability zero. (You need perfect noise cancelation to get degeneracies, which amounts to a sort of neglected underlying conserved quantity which is pushing back on you and demanding to be conserved.) In that sense the CR decomposition is kind of pointless and is just working around some "perfect little counterexamples". So it seems weird to see someone say "hold this up as the most important thing!!"
I found that the "best conceptual" answer depends a lot on taste, and what concepts you are familiar with.
In this case:
- Calculating exact eigenvalues of matrices larger than 4x4 is impractical, since it requires you to solve a polynomial of degree >4.
- The EV exist only in algebraically closed fields (complex numbers), while the determinant itself lives in the base field (rationals, reals).
- [Geometric Determinant] The determinant is the volume of the polytope (parallel-epiped) spanned by the column vectors of the matrix.
- [Coordinate Free Determinant] The determinant is the map induced between the highest exterior powers of the source and target vector spaces (https://en.wikipedia.org/wiki/Exterior_algebra)
- I think there is also a representation theoretic version, that characterizes the determinant as invariant under the Symmetric group acting by permutation on the columns/rows of the matrix.
The permanant  is the matrix function which is fully symmetric, so permuting any rows or cols leaves it invariant. It emerges from the identity representation.
Finally, partially symmetric matrix functions are known as immanants , defined using the other irreps of the symmetric group.
If the volume of the prallel-epiped is zero, then there will be directions in the target space, that you did not hit.
Hence he matrix can not be invertible.
So for example if I take (x, y) to T (x, y) = (3x + y, 2x + 4y), that is an example of what we call a linear transformation -- it obeys T(p1 + p2) = T p1 + T p2, it distributes over addition.
Now in addition to noticing that this is linear we may happen to notice that T (-1, 1) = (-3 + 1, -2 + 4) = (-2, 2). So in the direction (-t, t) we are just scaling vectors by a factor of 2, to (-2t, 2t). Similarly we might notice that T (1, 2) = (3 + 2, 2 + 8) = (5, 10). So in the direction (t, 2t) we are just scaling vectors by a factor of 5 to (5t, 10t).
These two scaling factors, 2 and 5, are called the eigenvalues of T. Their product, 10, is called the determinant of T. And in this case their eigenvectors span the entire space -- you can make any other (a, b) as a combination (-t1, t1) + (t2, 2 t2), for some numbers t1, t2. Actually t1 = (-2a + b)/3 and t2 = (a + b)/3, I can work out pretty quickly. And in this t-space this transformation is very easy to think about, it has been "diagonalized."
Sometimes these eigenvalues and eigenvectors don't exist, but we can patch that up with one of two tricks. The first trick is, for example, used for the 2x2 rotation matrices. These rotate every direction into some other direction, so how will I find some direction which "stands still"? The answer here is complex numbers, in this case it turns out that any 2x2 rotation by angle t will map the complex vector (1, i) to (cos t + i sin t, -sin t + i cos t) = (cos t + i sin t) * (1, i), so it has two complex eigenvalues e^(it), e^(-it). So the first trick is complex numbers. There is, it turns out, only one other class of weird transformation. In these weird transformations, it is possible to define chains of "generalized eigenvectors". Each chain starts with one ordinary eigenvector with an ordinary eigenvalue q, T v1 = q v1, and then the next element of the chain is a "generalized eigenvector of rank 2" which has T v2 = v1 + q v2, and then the next element of the chain is a "generalized eigenvector of rank 3" which has T v3 = v2 + q v3, and so on.
So it is a theorem that any NxN complex linear transformation has N linearly independent generalized eigenvectors which span the space, and "usually" these are all just normal eigenvectors and the matrix is "diagonalizable" (and even if they aren't, they come in families which start from one normal eigenvector and the matrix can be put into "Jordan normal form").
If you understood all of that, you are ready for the main result that you asked about. :)
For a linear transformation to be invertible, it needs to map distinct input vectors to distinct output vectors. If it maps two different input vectors to the same output, then invertibility fails.
So we know that invertibility fails when we can find distinct v1 and v2 such that T v1 = T v2.
Put another way, T v1 - T v2 = 0. But by the linearity property, T distributes over additions and subtractions, so this is the same as saying that T (v1 - v2) = 0 for v1 - v2 nonzero. This is enough to establish that v1 - v2 is an eigenvector with eigenvalue zero.
What does this do to the determinant, the product of all the eigenvalues? Well, zero times anything is zero. So if some linear transformation T is not invertible, then you immediately can conclude that det(T) = 0.
Furthermore this argument goes the other way too, with only a little subtlety related to these "generalized eigenvalues" -- basically, that the generalized eigenvalues always exist and there is always at least one eigenvector which actually has that eigenvalue, and that complex numbers still have this property that any finite product of complex numbers which results in zero can only come about if one of those numbers was zero. If you know all of those things, then you can work your way backwards to conclude that det(T) = 0 implies that one of the generalized eigenvalues is zero, which has at least one normal eigenvector v such that T v = 0, which I can then use to find many inputs for any given output, T u = T (u + k v) for any k
So to say that this product-of-eigenvalues is zero is to say that one of the eigenvalues is zero, and therefore the linear transformation is projecting down to some smaller, flatter subspace in a way that cannot be uniquely undone. If there is no such smaller flatter subspace, then the transformation must have been invertible all along.
This view also motivates the concept of vector bundles and vector spaces at a point.
My country curriculum introduces linear algebra through group theory and vector spaces. Matrices come later.
I was also taught linear algebra this way, by an applied mathematician with a background in chemical engineering:
- start by solving Ax=b with row reduction
- develop theorems about linear independence and spanning sets of vectors based on these exercises
- introduce the determinant from the perspective of linear systems (rather than eg geometry or group theory)
- eigenvectors and eigenvalues
Later I switched from physics to math and TAed a more “algebraic” approach involving groups/rings/fields. But the matrix-first approach was more helpful for both my physics coursework and later courses in numerical linear algebra.
I took like 3-4 courses in the US involving the engineering approach, starting in high school and continuing through the college as a CS major. That was all that was required.
But I also like algebra, so I happened to take a 400-level course that only math majors take my senior of college. And then I got the group theory / vector space view on it. I don't think 95% of CS majors got that.
I don't think one is better than the other, but they should have tried to balance it out more. It helps to understand both viewpoints. (If you haven't seen the latter, then picture a 300-page text on linear algebra that doesn't mention matrices at all. It's all linear transformations and spaces.)
What country were you taught in? Wild guess: France?
A book I enjoyed is Axler's Linear Algebra Done Right, in which, if I remember correctly, doesn't contain a single matrix.
It does have plenty of matrices. The main thing it really does is avoid determinants until the very end. The determinant is certainly something I remember learning as a kind of rote operation, without really understanding any intuition behind why you'd multiply and add these numbers in this particular way. I still feel lacking in "feel" here, which is why I suppose I'm going through Axler now.
For example, I remember looking at the linear algebra book my department had used previously. Early on, it introduced the concept of the transpose of a matrix:
Superficially, it looks like something good to introduce. It is fodder for easy homework exercises, and there is a satisfyingly long list of formal properties satisfied.
But why? What does the transpose mean? For what sort of problem would you want to compute it?
There are good answers to these questions (see the "transpose of a linear map" section of the Wikipedia article I linked), but they are not easy for a beginner to the subject to appreciate.
> You are probably about to begin your second exposure to linear algebra. Unlike your first brush with the subject, which probably emphasized Euclidean spaces and matrices, this encounter will focus on abstract vector spaces and linear maps.
It's not universal.
The US is a very big place. I doubt there is an american approach to linear algebra. We really don't have a single approach to anything. Different schools and majors probably approach the topic differently. My college had a linear algebra course specifically crafted for CS majors and engineers. I took that and it did focus on matrices. It was also the only math class that required programming. I believe math majors had their own linear algebra course.
> My country curriculum introduces linear algebra through group theory and vector spaces. Matrices come later.
Different strokes for different folks. If it worked out for you that's all that matters.
It isn't. I don't think you understood what I wrote. What's important is the ends, not the means. As long as the OP got what he needed out of linear algebra, then that's what matters.
> If you don’t think it matters then why are you participating in the discussion?
What is this "it" you are talking about? Be more specific and precise otherwise you don't add anything to the discussion.
The OPs referring to an intellectually totally different approach to Linear Algebra. One coming firmly out of pure mathematics. Whereas Strang's notion that one should teach matrices first is aimed at students who are probably never going to conceptualize linear algebra in terms of abstract vector spaces and linear operations on them. So, I do still take issue with what you wrote. "Different strokes for different folks" really doesn't cut it here, IMO. Being tendentious, I'm tempted to say it's more like "If you're happy getting practical calculations right but never really understanding what mathematicians mean by linear algebra then Strang's approach might be OK for you." But the point is that the approach to linear algebra that the OP referred to, versus Strang's, are not just equally valid alternatives and whichever one works for you is the best by definition. They are routes to two different places.
> As long as the OP got what he needed out of linear algebra, then that's what matters.
I don't think I even agree with this. I might agree with
> As long as the method of teaching linear algebra improves the mean educational outcome in a statistically significant fashion then that's what matters.
But that's what Strang's getting at -- better and worse ways of teaching Linear Algebra. It's not the case that any method is OK just because it worked for some European dude on the internet with poorly concealed anti-American feelings (I'm from Europe, I'm allowed to say this :) ).
It's your right. No need to apologize.
> It's antithetical to the spirit of engineering / technical discussion, which are predicated on the idea that there are better and worse ways to do the thing in question.
Sure. But that's the point. There are better ways ( as in plural ). Meaning there is more than one right way.
> It's not the case that any method is OK
Nowhere did I say any method is OK. You are building a strawman here. I didn't just write "Different strokes for different folks." I wrote "Different strokes for different folks. If it worked out for you that's all that matters." The key being "If it worked out for you that's all that matters."
> because it worked for some European dude on the internet with poorly concealed anti-American feelings
Nowhere did I sense any "anti-American" sentiment from OP.
> (I'm from Europe, I'm allowed to say this :)
So maybe it's a language issue. English as a second language is great but leaves open the possibility of misunderstanding. Honestly, I still don't know why you got so upset about such an innocuous comment. I think what we had here is a failure to communicate.
Some of the concepts made sense, especially solving for linear systems of equations.
Recently, I decided to brush up on my math skills via Youtube videos, and came across this series: https://www.youtube.com/channel/UCYO_jab_esuFRV4b17AJtAw
It explains Linear Algebra concepts using 2D and 3D vector manipulation, and the animations help me visualize the underlying maths.
In my time I had picked LA from Ben Noble, Halmos and Axler and the computation side of things from Golub & van Loan.
Ben Noble's book was my entry to LA. I was an undergraduate and involved in a research activity that demanded a lot of knowledge of the eigenvalue problem. The concrete approach in that book helped a lot.
It was only later on that I took a class based on G&vL (implementing a bunch of basic LA factorizations in Matlab), and in my spare time read Halmos's book. I understand the coordinate-free algebraic approach, but I work on applications and that viewpoint has not stuck with me. The stuff on numerical accuracy in GvL really did stick, OTOH.
From the comments here, and Strang's book's table of contents, I gather that his book (which has a lot of fans) has a concrete geometric approach.
Having said that, he is explaining many things really well and is helping a lot to build intuition. He is always cautious presenting things that are computationally inefficient and suggests the alternatives.
Exercises are too hard for me personally. I'd prefer a more laborious set of exercises helping to cement the material, (as in calculus or usual algebra) and then have one or two problem solving puzzles at the end.
I hope its not just that, that would be very limiting considering what linear algebra is about and capable of.
His books usually expand on the subjects he presents at his online lectures. I see them as advanced lecture notes.
I started doing LA on Khan academy, and checked out Linear Algebra Done Right. LADR was a little too much into the deep end for me. KA seemed to be good. One nice thing about KA is that when I didn't quite remember something (i.e. how exactly to multiply a matrix) I could just go to an earlier pre-LA lesson, pick it up, and then go back to LA where I left off. I'm a few lessons in.
What do you all recommend for someone like me?
I feel like I don't really understand his explanation, because it's kind of vague. But I think that might be because you've seen the equations dozens of times, and I haven't seen them at all, so you were prepared to understand the video.
Maybe I will create a game prototype based on the mechanics but this is just a vague idea.
I also really like the applied linear algebra book by Boyd Vandenberghe:
Free PDF is available on their website.
There is Julia and Python code companions for the book and lecture slides from both Profs their websites. Also check out their other books, many of which have free PDF's available.
I can also recommend Data-Driven science and engineering by Brunton and Kutz.
There used to be a free preprint PDF of the book but I can't find it now. Book is totally worth picking up... MATLAB and Python code available. Steve Brunton's lectures on YouTube are pretty damn good and compliment the book well:
Another really cool book is Algorithms for Optimization by Mykel Kochenderfer and Tim Wheeler: https://mitpress.mit.edu/books/algorithms-optimization. Julia code used in book.
I waited after the lecture to personally thank him and have him autograph the textbook; very glad I did in retrospect.
One of the interesting new ways of thinking in these lectures is the A = CR decomposition for any matrix A, where C is a matrix that contains a basis for the column space of A, while R contains the non-zero rows in RREF(A) — in other words a basis for the row space, see https://ocw.mit.edu/resources/res-18-010-a-2020-vision-of-li...
Example you can play with: https://live.sympy.org/?evaluate=C%20%3D%20Matrix(%5B%5B1%2C...
Thinking of A as CR might be a little intense as first-contact with linear algebra, but I think it contains the "essence" of what is going on, and could potentially set the stage for when these concepts are explained (normally much later in a linear algebra course). Also, I think the "A=CR picture" is a nice justification for where RREF(A) comes about... otherwise students always complain that the first few chapters on Gauss-Jordan elimination is "mind-numbing arithmetic" (which is kind of true...) but maybe if we present the algorithm as "finding the CR-decomposition which will help you understand dozens of other concepts in the remainder of the course" it would motivate more people to learn about RREFs and the G-J algo.
Computes the CR decomposition of the matrix A.
rrefA, licols = A.rref() # compute RREF(A)
C = A[:, licols] # linearly indep. cols of A
r = len(licols) # = rank(A)
R = rrefA[0:r, :] # non-zero rows in RREF(A)
return C, R
On another note, he is such a nice guy. 10/10.
If you have a vector v=(1,0) that points to the right, you can scale this vector infinitely in that direction by multiplying it by a positive scalar.
5v = (5,0)
62.1v = (62.1,0)
Similarly, you can scale that vector infinitely in the opposite direction (i.e. left) by multiplying it by a negative scalar:
-987v = (-987,0)
If we call this scalar c, the expression cv allows us to represent any point along the X axis simply by varying c, meaning that cv defines a line along that axis.
Similarly, we can do the same for a vector w=(0,1) along the Y axis, scaling it by d.
Now we have a method for moving to any point on the XY plane simply by varying c and d in the linear combination: cv + dw, meaning that we've defined a plane using two vectors.
- this won't work if v and w are parallel; for example, if v = -w (and neither are zero) then we can only move along a line instead of a plane
- it also won't work if either of the vectors are zero, because no matter what you multiply by, a zero vector can only represent a single point
If you do the same for another 1x3 vector, and it is not parallel to the first, you get all the points on a different line.
These two lines define a plane (and the cross product of the two vectors defines its normal vector)
Also, if you have 3 dimensional vectors you were always in 3D.
That certainly got our attention. I’ve always found linear algebra to be kind of ... almost soothing.
Well, every time, we can make so-called "guess" that solution looks like e*rt. Why? We know that because professors will only give well-behaved systems on final exams because it's hard to grade the other kind.
So we know characteristic polynomials look like so (because of course they do, you can just memorize this) ... so now we lift out the coefficients into nifty thing called _matrix_ and now follow these easy four steps to get roots, plug back in, and incidentally these are "eigenvalues", we'll talk about this later ...
Bam. Done. A-, easy. No sweat."
That book discusses the actual algorithms used for computation. It is a bit more advanced, but amazingly clear.
I am aware of his course on OCW, but wondering is there something more interactive and/or newer than those lectures that has similar quality.
Just to have a taste of use cases: compression, filters(image filters for de-noising, HP & LP filters for audio), encoding/decoding, computer vision techniques, cryptography, neural nets, computer graphics (this is where most people learn how to use it in real computer programs)
- root finding algorithm with more than one variable.
- graph problems like Google's PageRank
- statistical analysis
- 3d rendering (projecting a 3d scene onto a 2d image)
- solving systems of equation (also see linear programming)
Linear algebra is very basic and fundamental to physics and math.
From the course description:
> These six brief videos, recorded in 2020, contain ideas and suggestions from Professor Strang about the recommended order of topics in teaching and learning linear algebra.