Hacker News new | past | comments | ask | show | jobs | submit login
Introduction to Applied Linear Algebra: Vectors, Matrices, and Least Squares (stanford.edu)
938 points by yarapavan on Dec 14, 2018 | hide | past | favorite | 123 comments

I'm e-learning Linear Algebra right now to have a good math foundation for Machine Learning.

I was a History and Sociology major in college - so I didn't take any math.

If you are like me, and working off an initial base of high school math, I would recommend the following (all free):

Linear Algebra Foundations to Frontiers (UT Austin) Course: https://www.edx.org/course/linear-algebra-foundations-to-fro... Comments: This was a great starting place for me. Good interactive HW exercises, very clear instruction and time-efficient.

Linear Algebra (MIT OpenCourseware) Course: https://ocw.mit.edu/courses/mathematics/18-06-linear-algebra... Comments: This course is apparently the holy grail course for Intro Linear Algebra. One of my colleagues, who did an MS in EE at MIT, said Gilbert Strang was the best teacher he had. I started off with this but had to rewind to the UT class because I didn't have some of the fundamentals (e.g. how to calc a dot product). I'm personally 15% through this, but enjoying it.

Linear Algebra Review PDF (Stanford CS229) Link: http://cs229.stanford.edu/section/cs229-linalg.pdf Comments: This is the set of Linear Algebra review materials they go over at the beginning of Stanford's machine learning class (CS229). This is my workback to know I'm tracking to the right set of knowledge, and thus far, the courses have done a great job of doing so.

Don't forget to review calculus as well. Khan Academy is a good start for learning about single variable calculus (http://www.khanacademy.org), but their content on multivariable calculus is a bit lacking (neural networks / deep learning use the concept of the derivatives and the gradient a lot). A good supplement for multivariable calculus would be Terence Parr and Jeremy Howard's article on "All the matrix calculus you need for deep learning": https://explained.ai/matrix-calculus/index.html

Thanks - I am doing that as well! I've been using MIT OpenCourseware for single variable calculus (and will do the same for multivariable). I fenced the parent post to Linear Algebra to not go too far away from the OP.

I will certainly check out the Terrence Parr / Jeremy Howard site, and am super familiar with Khan Academy.

I came across-Calculus Made Easy by Silvanus Thompson,on someones twitter feed. Published in 1910 and far less scary and far more interesting to read than a lot of math text books.


“Considering how many fools can calculate, it is surprising that it should be thought either a difficult or a tedious task for any other fool to learn how to master the same tricks.”

I really wish technical books were still written like this. Though if Thompson posted this on HN as a comment he probably would have been downvoted.

There is also a web goodlooking version, discussed previously in HN: https://news.ycombinator.com/item?id=18250034

Similarly, Stroud's "Engineering Mathematics" takes you right from addition all the way up to Fourier transforms... a fantatsic book.

This should be your first calculus book if you are learning from scratch. Much better than those 1000+ page behemoths colleges use.

I'm going to plug Calculus: Single Variable from the University of Pennsylvania on Coursera (https://www.coursera.org/learn/single-variable-calculus).

This was the best Calculus course I've taken online.

When you say you’re going through the MIT OCW calculus courses are you watching the videos or also doing practice problems?

What other calculus resources are you using?

Watching videos, reading the (indicated portions of) the text, doing practice problems, eventually exams - relying on the resources provided in the OCW site.

To be transparent - I just started the calculus class. I finished the UT Austin Linear Algebra class two weeks ago, and am 7 lectures + readings + 2 problem sets in on the MIT Linear Algebra class and 3 lectures in on the Calculus class.

I'm coming to the end of my first year (6 year part time) Comp Sci course and have seen that we have options for AI and Machine Learning modules in future years. Where should I go to find something like a list of what I should be brushing up on, or learning completely from scratch, in order to not fall flat on my face during those type of modules.

I understand there are very set starting points in math subjects because concepts build on one another but I don't know what I should be starting with and where to go afterwards.

> This course [Strang] is apparently the holy grail course for Intro Linear Algebra.

I haven't watched his lectures, but I TA'd a linear algebra course that used his text book, and strongly disliked his presentation. I've heard that's a fairly common reaction actually - it's one of those love it or hate it books. I'm bringing it up because if you (or someone else reading this) turn out to be in the group that doesn't love it, you should not give up on loving linear algebra! You are definitely still allowed to have a different 'holy grail course'!

Awesome input! Learning isn't linear (tee-hee...)

Gilbert Strang is probably the best teacher on videos, up there with Dan Boneh.

Where’s the love for Lax?

Page after page of mathematical insights and delights! I've never had the opportunity to work through it systematically, but have frequently read excerpts and have never been let down. I would expect nothing less from a figure so great as Lax!

It's worth pointing out in the context of this discussion that the book is, by the author's own design, not an introduction to linear algebra. It is a second course that Lax used to teach his advanced undergraduates and beginning graduate students at the Courant Institute. For example, OP with a high school math background will surely be very puzzled by page two, when a linear space is defined as a field 'acting on' a group. Which is, i think, the 'right' way of thinking about the algebraic structure, in the sense that it greatly simplifies all the intricate moving parts of linear algebra. Anyhow, I second your recommendation!

That's a pretty good list, here are some things I'd add.

Amazing js visualizations/manipulatives for many LA concepts: http://immersivemath.com/ila/index.html

LA Concept map: https://minireference.com/static/tutorials/conceptmap.pdf#pa... (so you'll know what there is to learn)

Condensed 4-page tutorial: https://minireference.com/static/tutorials/linear_algebra_in... (in case you're short on time)

And here is an excerpt from my book: https://minireference.com/static/excerpts/noBSguide2LA_previ... (won't post a link to it here, but check on amazon if interested)

Wow - nice stuff. I'm a fan of the reference map.

Good recommendations. In addition to the UT, MIT and Stanford courses you recommend above, for developing your visual intuition, 3Blue1Brown's Essence of Linear Algebra video series is second to none. [0]

Another good one is MathTheBeautiful [1] by MIT alum Pavel Grinfeld [2]. He approaches Linear Algebra from a geometric perspective as well, but with more emphasis on the mechanics of solving equations. He has a ton of videos organized into several courses, ranging from in-depth Intro to Linear Algebra courses to more advanced courses on PDEs and Tensor Calculus.

Esp note his video on Legendre polynomials [3] and Why {1,x,x²} Is a Terrible Basis: https://www.youtube.com/watch?v=pYoGYQOXqTk&index=14&list=PL....

Gilbert Strang was Greenfield's PhD advisor: https://dspace.mit.edu/handle/1721.1/29345. Pavel has a clear and precise teaching style like Strang, and he makes reference to Prof's Strang and his MIT course from time to time.

NB: Prof Strang has a new book Linear Algebra and Learning from Data that just went to press and will be available in print by mid Jan 2019. A few chapters are available online now, and the video lectures from the new MIT course should on YouTube in a few weeks. [4]

[0] Essence of Linear Video Series (3Blue1Brown): https://www.youtube.com/channel/UCYO_jab_esuFRV4b17AJtAw/pla...

[1] MathTheBeautiful https://www.youtube.com/watch?v=pYoGYQOXqTk&index=14&list=PL

[2] https://en.wikipedia.org/wiki/Pavel_Grinfeld

[3] https://en.wikipedia.org/wiki/Legendre_polynomials

[4] MIT Linear Algebra and Learning from Data (2018) http://math.mit.edu/~gs/learningfromdata/

A bit weird to add a negative review, but here goes:


Is _not_ a good introduction. The instructors are all over the damn place, and you will spend much of your time finding better explanations from other sources. Wish I hadn't started with this. On the plus side, you will get a certificate at the end.

Indeed weird, because the course you mentioned is actually excellent. However, it was designed for people who had (somehow) already seen the subjects in an abstract and unapplied setting (such as a math class at uni). They refresh or refocus the subjects with a geometric intuition and with some concrete applications in mind; which I found quite useful and beautiful. This class is more like a more developed version of 3b1b videos on LA.

It's funny, because as I was reading your comment, I was thinking of 3b1b. He's doing great work by visualizing abstract concepts, but I think what he does mainly helps people who have already gone through the material. If it's your first time encountering the topic, you'll likely feel lost or not see the point.

What 3b1b does still brings a lot of value, so I don't want to take away anything from his work.

3blue1brown has a great youtube series on both Calculus and Linear Algebra that provide excellent intuitive backing for the concepts in both areas.

Out of curiosity, do you feel you can compete with people who have advanced degrees in more quantitative sciences?

Although I'm in the process of plugging several holes in my own math education, I don't believe I'd be able to get any interesting, ML related jobs. I also don't see myself able to perform well in comparison, given that I lack the mathematical intuition one builds after several years of (almost) daily practice.

(I hope I don't sound discouraging, relearning math has been quit fun so far and made me able to understand more of everything).

The direct answer is - not at the research level but yes, in terms of application as the field matures and abstracts over time.

There are also adjacent jobs to ML engineer (product management and so on...).

I agree, Strang's Linear Algebra course is excellent. I worked through the entire course in March/April this year.

I just completed my final exam at CMU in their graduate intro to ML class (10-601). Having gone through the LA course was essential to my success. But equally important (if not more) to ML is a solid foundation in probability.

If you care about anything that runs on a computer, linear algebra is one of the best maths.

Would be interested to hear why you’re studying machine learning. Do you see important problems you think it can solve, are you looking to make more $$ as an ML data scientist, or just generally interested in stats/data?

Part of a broader effort - I committed to learning to code about 3 years ago. At that point in time, I didn't really know why I was doing it... really out of curiosity. I kept going because it was addicting, and really an antithesis to my day job at the time (investment banking) - which I felt was corporate / bureaucratic and unintellectual.

That said, eventually, I want to start a startup. I'm building out small side projects now. I'm generally comfy with web + mobile dev, and I wanted to upskill in a "newer" technology that was more "mathy".

I bet you could flow-chart your entire firm into a process that's mostly automated. Something like wealthfront, maybe?

Thanks for sharing, good luck on your journey

What's the difference between learning and e-learning? Is the latter faster?

Orders of magnitude cheaper.

I just love the MIT linear algebra course with Gilbert Strang. Awesome teacher

did you find LAFF too mathy? i got turned off by the mathyness of it and i quit in 2 weeks. does it get any better? All the math notations and lines got so dry that i vapourised trying to understand.

I didn't think about it during the time. It's a fair comment, and probably true.

What it did really well (for me) was integrate HW with each lecture video, and start at a really basic foundation. It took me from 0 -> something.

This is a beautiful book and a great intro to the basics of linear algebra. All the figures in the book are generated in Julia and there’s a companion book with Julia code for computational examples:


The 3B1B series on Linear Algebra is by far the most welcoming and informative introduction to the topic I've ever seen: https://www.youtube.com/watch?v=fNk_zzaMoSs&list=PLZHQObOWTQ...

Any linear algebra post I come here to comment the same (if someone else hasn’t done it already). Seriously, this series is absolutely wonderful.

After reading this book (or during), take a look at the author's (Boyd) video lectures on linear dynamical systems: https://see.stanford.edu/Course/EE263/

There is a lot of overlap between the book and the course.

Also take a look at Boyd and Vandenberghe's book on convex optimization: https://web.stanford.edu/~boyd/cvxbook/

You might be interested in the Open Textbook Initiative by the American Institute of Mathematics:


Somehow I found linear algebra easier than calculus, but I don't know why.

I did both at the same time in university, but failed calculus 3 times and aced linear algebra at the first try.

I'd expect being either good or bad at math, not both at the same time

Math professor here ---

Quality of teaching might have something to do with it.

But, also, calculus is much harder to understand at a rigorous, formal level than at an informal level.

On one level you can try to understand what the main concepts are about, be able to compute derivatives and integrals, solve optimization and related rates problems, and so on. I'd recommend Silvanus Thompson's Calculus Made Easy over any mainstream calculus book for this. In my opinion, the book succeeds amazingly at fulfilling the promise of its title.

But suppose you really try to read any mainstream calculus book, and understand everything. For example:

- Why are limits defined the way they are (with epsilons and deltas)?

- The book will probably touch lightly upon the Mean Value Theorem -- why is this important? What's the point?

- Why is the chain rule true? It reads dy/dx = (dy/du) (du/dx). Yay! This is just cancelling fractions, right? Any "respectable" calculus book will insist that it's not, but most students will cheerfully ignore this, still get correct answers to the homework problems, and sleep fine at night.

- Consider the function e^x. How is it defined? The informal way is to say e = 2.71828... and we define exponents "as usual". Most students are perfectly happy with this. But does this really make sense if x is irrational? Your calculus book might bend over backwards to define everything properly (e^x is the inverse to ln(x), which is defined as a definite integral), and it takes a lot of work to appreciate why.

In my experience, these sorts of issues mostly don't pop up in linear algebra, where the proofs tend to parallel the handwavy heuristics. I wonder if this had anything to do with your experience?

One difficulty students had that I encountered as a TA was some garbled prerequisites. All of the epsilon-delta definitions are written in propositional logic, and are often '2nd order statements', (that is they have nested quantifiers). This is an entirely new formal language, and its usage is very different than english. It needs to be carefully explained, but standard texts like Stewart just dress it up to look kind of like english and carry on.

In fact the mathematics curriculum DOES acknowledge that you have to teach most students this if you want them to understand it: at my university it lived in the discrete math course, which used Rosen. He devotes an entire chapter on propositional logic, and spends literally 60 pages gradually building up the complexity to arbitrary nested quantifiers; the definition of the limit appears at the end of this.

Unfortunately, discrete math also makes heavy use of sequences and series, so that Calc 2 is a prerequisite of the course... thus my program's student-victims would spend a year taking calculus and not understanding much of the formalism before they were even allowed to take the course that explained the language the basic definitions of calculus were written in! Ugh.

I think Stewart and Rosen are pretty mainstream textbooks, so i suspect this problem is very common. Perhaps you could point it out at your next faculty meeting and shuffle some prerequisites around; we'll start a math-revolution! :)

> - Why are limits defined the way they are (with epsilons and deltas)?

> - The book will probably touch lightly upon the Mean Value Theorem -- why is this important? What's the point?

> - Why is the chain rule true? It reads dy/dx = (dy/du) (du/dx). Yay! This is just cancelling fractions, right? Any "respectable" calculus book will insist that it's not, but most students will cheerfully ignore this, still get correct answers to the homework problems, and sleep fine at night.

Which "respectable" book(s) would you recommend for those who want to dive into this details? Is Tom Apostol's Mathematical Analysis? good for learning these kind of details? (They say this book is "respectable", but I would like to hear your thoughts about it. Thanks).

I don't know anything about Apostol's Mathematical Analysis. My guess would be that it demands a fairly sophisticated background of the reader, and does an excellent job of covering calculus from an extremely rigorous point of view.

I have heard that Apostol's Calculus is an excellent choice, probably somewhat more accessible to beginners, but still offering a rigorous, highbrow perspective. I've also heard the same of Spivak. I'd probably opt for one or both of these.

Calculus by Spivak is good. Abbot's real analysis textbook is also quite popular.

I was really bad at rearanging terms/formulars and this came up in calculus exams all the time, but not so much in linear algebra exams.

That's another thing I observed teaching calculus: a lot of students who have problems, are having their difficulties with the algebra they supposedly know, and not so much with the "new" material. In theory, we expect students to already be fluent in this sort of algebraic manipulation; in reality, we recognize that a calculus class provides our students an opportunity to improve at this.

I'm not sure I have any suggestion other than lots of practice. And know that you're not alone, this is a completely natural difficulty.

That's another thing I observed teaching calculus: a lot of students who have problems, are having their difficulties with the algebra they supposedly know

A. You're not alone. In the first video of his Calc I series, Professor Leonard cracks that "Calculus is the class you take to finally fail Algebra".

B. This is hardly unexpected, especially if there's any gap at all between taking Algebra and taking Calc. The simple truth is, you forget material you don't use. And most people don't use a lot of algebra in their daily lives. If even a semester or two has passed since you took algebra, you're almost certainly going to have forgotten a lot of it, unless you made a very pointed effort to keep practicing that stuff and focus on retention.


I successfully finished my CS degree.

But I often have the feeling that my math problems are getting in the way of getting on the next level, hehe.

> But I often have the feeling that my math problems are getting in the way of getting on the next level, hehe.

Same here. I hate finding amazing CS academic papers and trying to implement them, only to get stuck at some high level math formulas.

Some of the best software engineers I know went from a math background to software. I guess I'm going to have to catch up.

> Somehow I found linear algebra easier than calculus, but I don't know why.

I suspect the answer is that your calculus course was a lot heavier on crank-grinding: having to readily apply integration and differentiation on a wide panoply of functions, some of them you're not really familiar with (such as arccos). If you're weak on trigonometry or some algebraic manipulations, that's going to shut out the ability to do a lot of the crank-grinding without really impacting your ability to understand the concepts.

By contrast, the crank-grinding in linear algebra is a lot less involved. The most complex algebra is going to be solving polynomial equations to find the eigenvalues of a matrix, but those are generally going to mostly be quadratic equations since asking anyone to solve more complex equations by hand is going to ask for trouble. Otherwise, it's largely plug-and-chug numbers into stock formula. Gram-Schmidt orthonormalization? Pick a vector, normalize it, project the other vectors and cancel them out, and repeat until you've done all of them.

Linear algebra should be easier than calculus shouldn't it? The whole program of differential calculus is basically that we already know how to solve problems in linear algebra, so let's solve other problems by reducing them to questions of linear algebra in the tangent space.

Great comment, this big picture strategy of calculus is not emphasized enough.

Sorry, I don't understand.

The fundamental strategy of calculus is to replace a nonlinear function with a tangent line approximation to that function. This greatly simplifies calculations, and the approximation is often accurate enough to be useful.

Linear algebra is probably my favorite part of math from a practicality standpoint. I'm not in a math heavy field, but knowing how to use matrices to solve optimization problems has been very helpful.

Sort of off topic - but I was a math major in college and I always had a broad, casual categorization of the types of classes I took.

Linear Algebra - felt like it required you to be able to hold really long trains of thought in your head

Probability - felt like you had to be clever

Analysis - felt like you just had to think critically and approach things from all angles

I always preferred Algebra - felt like I was writing essays not doing math

I'm going to complain about this every chance I get.

A 2D vector, we generally store as [x, y, 0]. What's the extra 0? The homogeneous coordinate.

A 2D point, we generally store as [x, y, 1]. That extra 1 is the homogeneous coordinate, and since it's there, it means "and apply translations!"

If I have a 2D transform, I put the translation component in the last row or column, depending on if you pre-multiply or post-multiply (I can never remember which).

When I transform a vector by that matrix, the 0 in the homogeneous coordinate means translation doesn't apply.


But what if I have a 3D vector? Well... I end up with [x, y, z, 0], right?


If instead, we stored the homogeneous coordinate in the FIRST position, [0, x, y] for 2D, and [0, x, y, z] for 3D, etc. then it's just a sparse vector! Set the values you want to! [0] is the 0-vector in any number of dimensions!

[1] is the origin point in any number of dimensions!

Why did we put the homogeneous coordinate last in all our internal representations? It was so dumb!

I don't follow. What do we gain by moving the homogeneous coordinate from last position to first?

I don't understand this:

> then it's just a sparse vector! Set the variables you want to!

Or this:

> [1] is the origin point in any number of dimensions.

Could you clarify?

Also, I don't think this book even discusses homogeneous coordinates. It would be sort of unusual for this type of general text and the only mention of "homogeneous" in the index is "homogeneous equation."

I think what's meant is that by putting it first, you can always treat any point in P^n as a point in P^m by just ignoring the extra numbers if m < n or by treating the missing numbers as all being 0 if m > n. That is the point in P^2, [1 x y], can also be regarded as the point [1 x] in P^1, the point [1 x y 0] in P^3, the point [1 x y 0 0] in P^4, etc. This is in contrast to putting it last, where if you have [x y 1] in P^2 and you want the point in P^1 you need to allocate a new list [x 1], etc.

The vector is sparse in the sense that you can regard a point as being an infinitely long list of numbers of which we are sparsely giving only that prefix that is non-zero (like how you can regard a decimal numeral as being an infinitely long list of digits, all the ones that are missing being 0).

[1] is the origin point in any dimension because it is [1] in P^0, [1 0] in P^1, [1 0 0] in P^2, etc.

I hope you are just trolling. Homogeneous coordinates are for projection, not for selecting translation.

Homogeneous coordinates allow for affine transformations to be represented and performed with a matrix multiply. This includes perspective projection and translation.

One of the use cases for homogeneous coordinates is certainly to be able to achieve translation.

With due respect, who is "we" and what are you talking about? The book does not as much as mention a homogeneous coordinate and uses 2-arrays for 2D vectors.

I'm off on a tangent unrelated to the book.

"we" is computer programmers who deal with vectors and points. Mostly graphics, physics, games, etc.

Technically the projective coordinate (3,2,1) should be exactly the same as (6,4,2), and every nonzero multiple thereof. So it’s not really correct to say that (x, y, 0) represents a vector, or that adding these projective coordinates is vector addition. Vector addition is represented by an identity matrix with the x, y coordinates in the rightmost column.

Is theoretical linear algebra at the level of axler helpful for machine learning? If so, in what ways?

I'm a self taught programmer with a very weak maths background. What's the best learning path for me if I want to be able to understand and create ML based applications?

There's a practical course for this http://www.datasciencecourse.org/lectures/ anything you don't know, like linear algebra, look up the topics here for a 1-2hr crash course https://www.youtube.com/playlist?list=PLm3J0oaFux3aafQm568bl...

There's a playlist for a math background in ML for anybody who wants to try a more rigorous ML course https://www.youtube.com/playlist?list=PL7y-1rk2cCsA339crwXMW... More information, including recommended texts https://canvas.cmu.edu/courses/603/assignments/syllabus but don't let that list of prereqs discourage you, can easily look them up directly. You don't have to understand all of Linear Algebra to do matrix multiplication. There's plenty of ML books, papers and playlists on youtube for a full course in ML from dozens of universities https://www.cs.cmu.edu/~roni/10601/ (click on 2017 lectures)

Note never trust YouTube or any other resource to be around forever, make sure you archive everything before you start taking it as lectures tend to disappear (then seed them for others ^^ )

If you have a really weak background go through this free book, refuse to not be able to complete it https://infinitedescent.xyz/

There's no answers because the author gives thanks to a grad course in evidenced based teaching where he claims the only way to really know something and remember it is to figure it out for yourself. Math stackexchange can help too.

> There's no answers because the author gives thanks to a grad course in evidenced based teaching where he claims the only way to really know something and remember it is to figure it out for yourself. Math stackexchange can help too.

This is a cop out; of course to really know something and remember you have to figure it out for yourself. But answers allow you to check whether your work was right, and if not, allow you the opportunity to debug your work.

My best performance came in organic chemistry, where I looked for question banks (with answer keys) and solved problems extensively, perhaps bordering on obsessively. If I hadn't an indicator that my final result was wrong, I would have missed out on many learning opportunities, and objectively my performance would have been worse. In general, I have found this strategy to enable me to be an exceptional student.

If you don't benefit from an answer key, you're probably lazy and undisciplined. Alternatively, you have too much time on your hands, opting to rigorously confirm that each and every answer is correct.

In short, by not providing an answer key, you are denying the disciplined student the opportunity to efficiently learn.

In short, by not providing an answer key, you are denying the disciplined student the opportunity to efficiently learn.

I agree with you 100%. But let me add this: in most cases, if you're studying with a book that doesn't have an answer key, you can supplement that text with exercises taken from somewhere else. For example, lots of course websites around the 'net post previous years exams / homework with answers. There are also books like Schaum's 3,000 Solved Problems in Calculus[1], The Humongous Book of Calculus Problems[2], 3,000 Solved Problems in Linear Algebra[3], etc.

Also, with books that are used as textbooks, and that provide an answer key but only to instructors... if you aren't averse to violating copyright and using certain pirate websites, those "instructor only" answer keys can often be found.

[1]: https://www.amazon.com/Schaums-Solved-Problems-Calculus-Outl...

[2]: https://www.amazon.com/Humongous-Book-Calculus-Problems-Book...

[3]: https://www.amazon.com/000-Solved-Problems-Linear-Algebra/dp...

He has extensive posts on his reasons, but it's also used for a course so only letting other professors have the answers to allow reuse of exercises is another reason. Those Art of Problem solving olympiad books don't have answers either with authors claiming same reasoning and in their defense I did learn a lot figuring it out myself. Personally I too like gratification of solving something then seeing the answers and finding a different and almost always more elegant/clear proof to compare to mine.

How useful do you think is studying general statistics from, for example, OpenStat vs. directly learning from ML-related courses like the one you mentioned?

If you scroll down this channel's videos you'll find lectures (36-705) that cover Chapter 1-12 in the book All of Statistics by the author of the book https://www.youtube.com/channel/UCu8Pv6IJsbQdGzRlZ5OipUg

I like the specific ML material since its usually sliced into a semester worth of material you can finish in reasonable time, most of these courses assume a background such as All of Statistics book and OpenStat is fine too for this.

Fantastic selection of material here. Also used that YouTube playlist myself and it was incredibly valuable. +1 for archiving these resources.

Thank you so much, this is perfect!

For calculus I-III, look at Professor Leonard on youtube. He is the best. His channel can be found at https://www.youtube.com/channel/UCoHhuummRZaIVX7bD4t2czg

His videos look great. How do you get in practice problems following his class?

I think he uses Calculus by Soo T. Tan. Although you can probably match his video's to most calculus books as they mostly follow the same order.

> 2-vector (x1,x2) can represent a location or a displacement in 2-D...

Isn’t this fundamentally faulty? Same notation describing a point and displacement. From this, we may conclude that, a point and a displacement are the same thing because they are described by the same notation. Shouldn’t mathematics be free of such contextual interpretation?

There's no issue with the notation; I think you've misunderstood the mathematical idea. Consider a more familiar algebraic object, a real number, x. This can model a length, an area, volume, time, time interval, temperature, weight, speed, physical constant, geometric ratio, fractional dimension, etc...

In mathematics, we abstract by forgetting about what the things are, and retain information about how they behave, and about what abstract properties they satisfy. The insight is that 2d locations and 2d displacements have the same abstract properties, which are modeled by a certain algebraic object: 2-vectors.

Thanks for the explanation. Makes sense. Something else I noticed, vector notation does not specify a coordinate system. V = (1, 2) is just an array of two numbers. The cartesian coordinate interpretation is a choice we make. Correct?

Yes, the keyword here is 'basis'. You represent a vector by giving two pieces of data, (1) an ordered list of coordinates, and (2) a basis. The vector is then a linear combination of the basis elements, and the coordinates tell you how to form that linear combination.

For example, let's use the standard Cartesian basis consisting of unit vectors e1, e2, e3 (which point north, east, and up, informally speaking). If our vector v is given by the coordinates (3,4,8) (with respect to the standard basis), then this means that v = 3 * e1 + 4 * e2 + 8 * e3.

If the coordinates were given with respect to a different set of basis vectors, then you would take the linear combination using those vectors instead. Note the similarity of how a basis works to how a base system works representing numbers. Using base 10, the 'coordinates' of the number 348 mean that 348 = 3 * 100 + 4 * 10 + 8 * 1. Using a different base, say base 9, they would instead mean 348 = 3 * 81 + 4 * 9 + 8 * 1.

> You represent a vector by giving two pieces of data, (1) an ordered list of coordinates, and (2) a basis.

Ok, I understand. But as used in computer languages, a vector can be simply 2 numbers. No coordinates or basis are implied. That's what I meant.

Aha, yes. Computer languages borrowed the word 'vector', but they have basically nothing to do with the mathematical structure from linear algebra. It's best to keep them completely separate in your mind.

So in math, when we say "vector" coordinate system is a given, as you explained?

If a coordinates are given, then they will be given with respect to a basis. However, it's entirely possible to do things more abstractly without introducing coordinates and bases to begin with, for example:


My advice would be not to get stuck on these “philosophical” questions, if your goal is to actually learn math, and instead just press on and keep learning and solving real problems. Eventually the fog will dissolve by itself, and these kinds of questions will seem to you either naive or devoid any real substance, or just uninteresting compared to everything else that you have learned.

No. This advice does not apply to me. I don't want to learn mathematics. I'm more interested in learning parts of mathematics that interest me at the moment. And I think philosophy comes before mathematics. Or mathematics is the philosophy of quantities. Both philosopnies are based on definitions.

A book on applied linear algebra with a focus on regression and no mention anywhere of the singular value decomposition??

This book has a lot of very interesting applications and seems to cover information not normally found in first books on Linear Algebra (e.g., makes use of calculus, Taylor series, etc) and the authors are EEs, not mathematicians. It doesn't, however, cover several topics normally covered in the first year of linear algebra (e.g., vector spaces, subspaces, nullspace, eigenvalues, singular values; see pp 461-462). As with most engineering books, there are no solutions provided.

An excellent supplement to other Linear Algebra textbooks. Given its focus on applications, will hold the interest of engineers and other technical folk but may not be loved by mathematicians who may prefer a more rigorous approach.

He does cover SVD in EE263, and all of the lectures for that course are freely available online.

Applied linear algebra is such a great idea. Linear algebra is relatively easy to understand and used everywhere. But the material is so damn boring since it's a lot of arthimetic. Even the homework problem is boring since there is no specific purpose.

Typical LA courses in math departments have a bizarre focus on being able to do Gaussian elimination by hand and stuff like that. It's not particularly useful or even mathematically interesting. LA courses would be so much more useful if they just stuck to theory and only had computer applications.

I found knowing the mechanics of Gaussian elimination to be really helpful when learning about algorithms like SVD or using Householder transformations. Knowing how the matrices became triangularized gave me something to hang the new information on.

Just finished my intro linear algebra class yesterday.

The class was a bit more abstract in nature, so some of the chapters in this look like they could be nice application oriented follow ups to it!

How much overlap is there between this and Stanford's EE263?

Can someone in the know comment on the differences between what is covered?

Great Resource! But a good primer to Linear Algebra would be Gilbert Strangs course at MIT OCW.

Ijust flipped some pages and saw it is great book. Wished my school offered this.

I went though the slides. Super fun material! I've seen all the methods long ago, and much deeper than in the slides, and published on some of the most advanced material, and much more, but, still, it was fun material because of the many examples and really good graphs.

From their other books, clearly they are real experts. The slides, then, are a careful path where minimal theory gives a LOT of nice applications. The theory they give is nearly always so simple that they are able, in just a few lines, to give essentially the proofs, nearly always.

E.g., I never saw any mention of convexity, and these two guys are right at the top of experts on theory and applications of convexity, so that it is clear that they tried hard to get lots of applications from minimal theory.

They did next to nothing on numerical stability -- some mention might have been good.

There's a still easier derivation of the least squares normal equations based on perpendicular projections -- they might have included that. That is, if drop a golf ball to the floor, the line to the floor and the shortest distance to the floor is the line perpendicular to the floor. This fact generalizes.

They have illustrated a nice, general lesson: Can do such applications with just finite dimensions and/or discreteness. Can do more theory with continuous instead of discrete values and infinite instead of finite dimensions. But, then, even with the extra theory, often challenging, commonly for the computing are back to discreteness and finiteness. Sooooo, just omit the more advanced theory and just stay discrete and finite throughout -- that's one of the themes in the slides.

With this theme, the slides are able to do at least something interesting and potentially valuable from stacks of texts in pure and applied math, statistics, and more with just a few slides, simple math, nice graphs, and a few words. Nice.

E.g., they did a lot of applied statistics without mentioning probability theory! How'd they do that? They just stayed with the data and omitted describing the probabilistic context from which the data was samples or estimates. Cute. But, readers, be warned -- the probabilistic context should not be neglected; eventually should learn that, too.

Another cute omission of theory -- vector subspaces and the, really, the axioms of a vector space. E.g., that "floor" I mentioned above is such a subspace. How'd they do that? They just stayed with the basic example vector spaces they had in mind and managed to avoid talking about subspaces.

At one point they touched on determinants for the 2 x 2 case, mentioned that that result is important (should be remembered or some such), that there is a more general approach that don't have to remember!!! Determinants have some value here and there, e.g., show some continuity results right away and have some nice connections with volumes, but they are tricky to explain and CAN be omitted!!!

Uh, there is an easier proof of the Schwartz inequality based on Bessel's inequality. Since they did enough with orthogonality to do Bessel's inequality, they could have used that approach to the Schwartz inequality -- I first saw in P. Halmos.

They didn't make clear the close connections among inner products, covariance, and correlation -- maybe some readers will see those connections from what is in the slides.

They did the QT decomposition -- nice -- that is, for square matrix A, we can write A = QT where Q is orthogonal and T is triangular. They used that to solve systems of linear equations but omitted Gauss elimination and the associated approaches to numerical stability. For the Q, they emphasized the Gram-Schmidt process but neglected to mention that it's numerically unstable -- no wonder since are commonly subtracting large numbers whose difference is small, the basic sin in numerical analysis.

Of course, the authors are EE profs. Then it is interesting that another theme in the slides is getting close to much of the work in what computer science calls machine learning. E.g., their few slides on using classification to recognize digits 0-9 in handwriting is really cute, especially the graph that shows the sizes of the coefficients on top of the square that has the input data of hand written digits so that see which parts of the input data are the most relevant to the calculation. Cute.

Of course, there's much more to those fields that they omitted than included, but that's true also for even the best 5 star hotel luncheon buffet!!!

More fun stuff at


> determinants... CAN be omitted

Also see http://www.axler.net/DwD.html.

Looks like a nice paper!

The paper says how to go beyond what is in Boyd, et al., i.e., eigenvalues, eigenvectors, the spectral decomposition, etc. without determinants. Nice!

For that material I would have been tempted just to use the old approach of determinants and the roots of the characteristic polynomial, the Hamilton-Cayley theoem, etc.

Saved the paper! Thx.

And if you really really love determinants there's


(warning: not for undergraduates)

There is also the chapter in P. Halmos, Finite Dimensional Vector Spaces on multi-linear algebra which at the time I read it I took it as an abstract approach to determinants, maybe also a start on exterior algebra of differential forms, but maybe there's a long shot chance that that Halmos chapter is related to multi-dimensional determinants.

Can't read ALL the books on the shelves of the research libraries or even all the recent ones so have to be selective, to focus or as a startup entrepreneur before spending hundreds of hours in such a book (hope the author got tenure) ask "Why should I?".

I am sure Gelfand, Kapranov and Zelevinsky given their other math accomplishments all got tenure track positions when they emigrated. Will give Halmos another look.

That can't still be THE Gel'fand, along with Kolmogorov, prof of E. Dynkin? Must be a great grand son or some such.

He passed recently, but yes it’s that one. The book is from 1994 and the research is from just before USSR fell.

Proceed with caution..

Prof. Boyd is a great teacher! I highly recommend his course on linear dynamical systems [0] and convex optimization [1] too.

The former is beginner stuff, and while convex optimization is more advanced, both are very engaging and clearly explained, with lots of anecdotes and practical examples!

[0] https://see.stanford.edu/Course/EE263

[1] https://www.youtube.com/playlist?list=PL3940DD956CDF0622

Kind of inauspicious to see this kind of thing on page 5:

A (standard) unit vector is a vector with all elements equal to zero, except one element which is equal to one

Huh? I've never heard the term "standard vector" before, and a "unit vector" is a vector whose magnitude is one. There is no requirement that one element be equal to one.

Maybe that's why the download is free...

> A (standard) unit vector...

I know this as "an element of the standard basis," B = {e_1, e_2, ...), where e_1 = (1,0,0,...), e_2 = (0,1,0,0,...). You could view it as inauspicious that the treatment doesn't begin with abstract vector spaces, but there is always Axler.

For what it's worth, I find it inauspicious that after taking three (pure-math oriented) Linear Algebra courses I never saw least squares nor the SVD. I'm looking forward to taking a look at Prof Boyd's book.

Point being, their definition is just plain wrong. If that's how the authors describe a unit vector, I don't think this is the book you want to use to learn about SVD.

Well, the way I see it, when you teach applied science you often want to sacrifice some rigor so that your students could focus on what was intended to be learned in the first place.

But there is the requirement that the other elements be equal to zero. It's right there in the sentence you quoted.

It's right there in the sentence you quoted.

True enough, thanks -- fixed.

Still a completely bogus definition.

The definition (calling those unit vectors which are members of the standard basis “standard unit vectors”) sounds perfectly reasonable to me.

You haven’t given any good reason why you think it’s “completely bogus” or “plain wrong”.

What is your personal definition of the term "unit vector?" Do you suppose there's a reason why no other textbook or Web site defines it the way these authors do?

Are there only N unit N-vectors, as the book says, or are there an infinite number of them?

My definition of unit vectors is the same as yours. However, the book does not say there are only N unit N-vectors. It says there are only N STANDARD unit N-vectors.

In math, “Adjective X” usually means something more specific than “X”. “Prime numbers” are a subset of “numbers”, and so on. Just like in this case, “standard unit vectors” are a subset of “unit vectors”.

So why don't they tell readers what unit vectors actually are, in the general case? It's a rather important elementary concept, isn't it?

Why bother describing a specific case using ambiguous language (the parentheses) and omit the one property that actually makes a unit vector a unit vector?

Bad writing in a math textbook is a weird thing to defend.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact