Here is a nice "cheat sheet" that introduces many math concepts needed for ML: https://ml-cheatsheet.readthedocs.io/en/latest/> As soft prerequisites, we assume basic comfortability with linear algebra/matrix calc [...] >That's a bit of an understatement. I think anyone interested in learning ML should invest the time needed to deeply understand Linear Algebra: vectors, linear transformations, representations, vector spaces, matrix methods, etc. Linear algebra knowledge and intuition is key to all things ML, probably even more important than calculus.Book plug: I wrote the "No Bullshit Guide to Linear Algebra" which is a compact little brick that reviews high school math (for anyone who is "rusty" on the basics), covers all the standard LA topics, and also introduces dozens of applications. Check the extended preview here https://minireference.com/static/excerpts/noBSguide2LA_previ... and the amazon reviews https://www.amazon.com/dp/0992001021/noBSLA#customerReviews

 > I think anyone interested in learning ML should invest the time needed to deeply understand Linear Algebra: vectors, linear transformations, representations, vector spaces, matrix methods, etc. Linear algebra knowledge and intuition is key to all things ML, probably even more important than calculus.To play devil's advocate, (EDIT: an intuitive understanding of) probabilistic reasoning (probability theory, stochastic processes, Bayesian reasoning, graphical models, variational inference) might be equally if not more important.The emphasis on linear algebra is an artifact of a certain computational mindset (and currently available hardware), and the recent breakthroughs with deep neural networks (tremendously exciting, but modest success, in the larger scheme of what we wish to accomplish with machine learning). Ideas from probabilistic reasoning might well be the blind spot that's holding back progress.Further, for a lot of people doing "data science" (and not using neural networks out the wazoo) I think that they can abstract away several linear algebra based implementation details if they understand the probabilistic motivations -- which hints at the tremendous potential for the nascent area of "probabilistic programming".
 > To play devil's advocate, probabilistic reasoning (probability theory, stochastic processes, Bayesian reasoning, graphical models, variational inference) might be equally if not more important.And of course, you're not going to get very far with probability theory and stochastic processes unless you have a mature understanding of analysis and measure theory :)This comment exchange neatly demonstrates the intrinsic problem. Most of these articles start off much like this one does: by assuming "basic comfortability with linear algebra." That sounds straightforward, but most software engineers don't have it. They haven't needed it, so they haven't retained it even if they learned it in college. It takes a good student a semester in a classroom to achieve that "comfortability", and for most it doesn't come until a second course or after revisiting the material.If you don't already have it, you can't just use StackExchange to fill in the blanks. The random walk method to learning math doesn't really pan out for advanced material because it all builds on prior definitions. Then people like you make a comment to point out (correctly) that probability theory is just as important for all the machine learning that isn't just numerical optimization. But unless you want to restrict yourself to basic statistics and discrete probability, you're going to have a bad time working on probability without analysis. And analysis is going to a pain without calculus, and so on and so forth.There are certain things you need to spend a lot of time learning. Engineering and mathematics are both like that. But I think many of these articles do a disservice by implying that you can cut down on the learning time for the math if you have engineering experience. That's really not the case. If you're working in machine learning and you need to know linear algebra (i.e. you can't just let the underlying library handle that for you), you can't just pick and choose what you need. You need to have a robust understanding of the material. There isn't a royal road.I think it's really great people like the author (who is presumably also the submitter) want to write these kinds of introductions. But at the same time, the author is a research assistant in the Stanford AI Lab. I think it's fair to say he may not have a firm awareness of how far most software engineers are from the prerequisites he outlined. And by extension, I don't think most people know what "comfortability with linear algebra" means if they don't already have it. It's very hard to enumerate your unknown unknowns in this territory.
 I get what you are saying, but is the right way to learn math with a "connected path". I've heard "The art of problem solving" series works through math in the correct way, but I'm not sure how far I would get on that alone. Right now I'm trying to gain intuition in linear algebra via OCW with Strang, but I would like to truly understand. Is the only way to just to do a second bachelors in math?
 You don't need to do a second bachelors - you really need four or so courses. If you have the patience and dedication you can sit down with the textbooks and work through them on your own.
 This.There's always more you might want to learn, but when people talk about these basics, it's really just being super focused in 4 or so classes, not a whole ivy league undergrad curriculum in math.probability & stats, multivariable calculus, and linear algebra will take you a long way.
 Cool. I will look into those, but I was asking as a general interest in math question. I actually have no interest in machine learning. I'm bored of chasing money. Interested in 3D computer graphics and math for math's sake.
 > They haven't needed it, so they haven't retained it even if they learned it in college.True for me. I knew all of these from my course work when I graduated with my CS degree in 1996. I haven't used them at all in my career, and so I'd be starting basically from scratch re-learning them.
 Can you recommend books and online courses to hammer these concepts down? I used PCA and k-means for my masters thesis but didn’t really know how well they work under the covers.
 As is mentioned in this thread, Linear Algebra Done Right is a solid textbook for learning linear algebra. I might start there =).
 to achieve that "comfortability""comfort" is a perfectly cromulent word for this.
 I was quoting the article; but thank you, I didn't know that. Good to know.
 Ah, my mistake. I must have edited it right out when I read the thing and took the quotes for 'I know I'm making up a word but can't think of anything better right this second'.
 > To play devil's advocate, probabilistic reasoning (probability theory, stochastic processes, Bayesian reasoning, graphical models, variational inference) might be equally if not more important.For intuition, particularly if you care about vision applications, I think one field of math which is severely underrated by the community is group theory. Trying to understand methods which largely proceed by divining structure without first trying to understand symmetry has to be a challenge.I'm biased; my training was as a mineralogist and crystallographer! But the serious point here is that much of the value of math is as a source of intuition and useful metaphor. Facility with notation is pretty secondary.
 Can you talk about the use of group theory for computer vision or crystallography a bit? I'm familiar with the math but I'm not familiar with group theory's applications in those areas. That sounds pretty interesting. Is it primarily group theory, or does it so venture into abstract algebra more generally?
 For crystallography, the use of group theory in part originates in X-ray crystallography [1], where the goal is to take 2D projections of a repeating 3D structure (crystal), and use that along with other rules that you know to re-infer what the 3D structure is.Repeating structures have symmetries, so seeing the symmetries in your diffraction pattern inform you of the possible symmetries (and hence possible arrangements) in your crystal. Group theory is the study of symmetry.By the way, this is also how the structure of DNA was inferred [2], although not from a crystal.
 > use that along with other rules that you know to re-infer what the 3D structure isGreat answer, thank you :-) Saved me a bunch of typing to explain it less well than you just did.It's worth adding, for this crowd, that another way of thinking about the "other rules" you allude to is as a system of constraints; you can then set this up as an optimization problem (find the set of atomic positions minimizing reconstruction error under the set of symmetry constraints implied by the space group – so that means that solving crystal structures and machine learning are functionally isomorphic problems.
 I thought the work on the structure of DNA used Fourier analysis more than group theory.I know harmonic analysis in general combines the two, but I'm sure Crick and Watson could have done their work without knowing the definition of a group.
 And by Crick and Watson you mean Crick, Watson, Franklin and Wilkins, right? It's fairly clear all four deserve at least partial authorship by modern standards. James Watson was a piece of work.Crick was absolutely certainly familiar with the crystallographic space groups; he was the student of Lawrence Bragg (https://en.wikipedia.org/wiki/Lawrence_Bragg), who is the youngest ever Nobel laureate in physics – winning it with his father for more or less inventing X-ray crystallography. It's mostly 19th-century mathematics, after all.
 For ML, you need both—probability to justify the setup of the problem, and linear algebra and calculus to optimize for a solution.A simple example is with linear regression: find w such that the squared l2 norm of (Xw - y) is minimized.Linear algebra will help with generalizing to n data points; and calculus will help with taking the gradient and setting equal to 0.Probability will help with understanding why the squared l2 norm is an appropriate cost function; we assumed y = Xw + z, where z is Gaussian, and tried to maximize the likelihood of seeing y given x.I’m sure there’s more examples of this duality since linear regression is one of the more basic topics in ML.
 Awesome. I had a really difficult time with math in HS, and never pursued it at all in college, so even though I'm a programmer my math skills are barely at a high school level.I'd love to get into ML but the math keeps me at bay.
 We should find or start a Slack / Discord where we go through a math textbook and conquer our fear of mathematics together.
 Oh man, this needs to exist.
 Seconded/Thirded.Hell, I have a CS degree, and my Maths knowledge is horrific. I didn't take Maths at A Level, so I stopped learning any Maths at 16. The course had Maths, but it's surprisingly easy to brute force a solution, and that was a good ten years ago now.My goal for years has been to learn enough Maths to be able to read Introduction to Algorithms and TAOCP without aid, and recently to be able to better understand ML, but the more I try the more I realise that it's a multi-year investment.
 Fourthed(?)What's a linear transformation? I get that it's f(x + y) = f(x) + f(y) and f(cx) = c * f(x)... but what does that really mean?Why is the dot product equivalent to ||a||*||b|| cos C? I really have no idea, I just know the formulas.
 > What's a linear transformation? > I get that it's f(x + y) = f(x) + f(y) and f(cx) = c * f(x)... > but what does that really mean?That's a really good question!Essentially, when some function (synonym for transformation) is linear, what this tells you is that it has "linear structure," which in turns out to be a very useful property to know about that function.You can combine the two facts you mentioned above to obtain the equation`````` f(a*x + b*y) = a*f(x) + b*f(y) `````` for which the interpretation is that the linear combination of inputs ax + by is transformed into the SAME linear combination of outputs af(x) + bf(y).Suppose now that the input space of f can be characterized by some finite set of "directions" e.g. the x- and y-directions in case f is a 2D transformation, or perhaps the x-, y-, and z-directions if f is a 3D transformation. If f is a 3D transformation, using the linear property of f, it is possible to completely understand what f does by "probing" it with three "test inputs," one along each direction. Just input x, y, and z, and record the three output f(x), f(y), and f(z). Since you know f is linear, this probing with three vectors is enough to determine the output of f for any other input ax + by + cz --- the output will be af(x) + bf(y) + cf(z).See the same explanations as above but in more details here: https://minireference.com/static/excerpts/noBSguide2LA_previ...So why is this important? Well this "probing with a few input directions" turns out to be really useful. Basically, if f is non-linear, it means it super complicated and would be no simple way to describe what its outputs are for different inputs, but if it is linear then the "probing procedure" works. Furthermore, since both the inputs and outputs have the form of a linear combination (a constant times something + another constant times another thing + a third constant times a third thing), you can arrange these "things" into an array called a matrix and define an operation called "matrix multiplication" which performs the constant-times-something operation of outputs, when you give the constants as an array (vector) of inputs.In summary, linear algebra is a bunch of machinery for expressing various transformations in terms of vectors and matrices in order to help with modelling various real-world phenomena ranging from computer graphics, biology, chemistry, graphs, crypto, etc. Even ML ;)
 holy crap that's an amazing idea
 Let’s see where it goes
 "Check this shit out:" hahaha, I will buy this :)
 I also recommend Linear Algebra Done Right 3rd Ed

Search: