Hacker News new | comments | show | ask | jobs | submit login
Learning Math for Machine Learning (ycombinator.com)
695 points by vincentschen 15 days ago | hide | past | web | favorite | 117 comments

Here is a nice "cheat sheet" that introduces many math concepts needed for ML: https://ml-cheatsheet.readthedocs.io/en/latest/

> As soft prerequisites, we assume basic comfortability with linear algebra/matrix calc [...] >

That's a bit of an understatement. I think anyone interested in learning ML should invest the time needed to deeply understand Linear Algebra: vectors, linear transformations, representations, vector spaces, matrix methods, etc. Linear algebra knowledge and intuition is key to all things ML, probably even more important than calculus.

Book plug: I wrote the "No Bullshit Guide to Linear Algebra" which is a compact little brick that reviews high school math (for anyone who is "rusty" on the basics), covers all the standard LA topics, and also introduces dozens of applications. Check the extended preview here https://minireference.com/static/excerpts/noBSguide2LA_previ... and the amazon reviews https://www.amazon.com/dp/0992001021/noBSLA#customerReviews

> I think anyone interested in learning ML should invest the time needed to deeply understand Linear Algebra: vectors, linear transformations, representations, vector spaces, matrix methods, etc. Linear algebra knowledge and intuition is key to all things ML, probably even more important than calculus.

To play devil's advocate, (EDIT: an intuitive understanding of) probabilistic reasoning (probability theory, stochastic processes, Bayesian reasoning, graphical models, variational inference) might be equally if not more important.

The emphasis on linear algebra is an artifact of a certain computational mindset (and currently available hardware), and the recent breakthroughs with deep neural networks (tremendously exciting, but modest success, in the larger scheme of what we wish to accomplish with machine learning). Ideas from probabilistic reasoning might well be the blind spot that's holding back progress.

Further, for a lot of people doing "data science" (and not using neural networks out the wazoo) I think that they can abstract away several linear algebra based implementation details if they understand the probabilistic motivations -- which hints at the tremendous potential for the nascent area of "probabilistic programming".

> To play devil's advocate, probabilistic reasoning (probability theory, stochastic processes, Bayesian reasoning, graphical models, variational inference) might be equally if not more important.

And of course, you're not going to get very far with probability theory and stochastic processes unless you have a mature understanding of analysis and measure theory :)

This comment exchange neatly demonstrates the intrinsic problem. Most of these articles start off much like this one does: by assuming "basic comfortability with linear algebra." That sounds straightforward, but most software engineers don't have it. They haven't needed it, so they haven't retained it even if they learned it in college. It takes a good student a semester in a classroom to achieve that "comfortability", and for most it doesn't come until a second course or after revisiting the material.

If you don't already have it, you can't just use StackExchange to fill in the blanks. The random walk method to learning math doesn't really pan out for advanced material because it all builds on prior definitions. Then people like you make a comment to point out (correctly) that probability theory is just as important for all the machine learning that isn't just numerical optimization. But unless you want to restrict yourself to basic statistics and discrete probability, you're going to have a bad time working on probability without analysis. And analysis is going to a pain without calculus, and so on and so forth.

There are certain things you need to spend a lot of time learning. Engineering and mathematics are both like that. But I think many of these articles do a disservice by implying that you can cut down on the learning time for the math if you have engineering experience. That's really not the case. If you're working in machine learning and you need to know linear algebra (i.e. you can't just let the underlying library handle that for you), you can't just pick and choose what you need. You need to have a robust understanding of the material. There isn't a royal road.

I think it's really great people like the author (who is presumably also the submitter) want to write these kinds of introductions. But at the same time, the author is a research assistant in the Stanford AI Lab. I think it's fair to say he may not have a firm awareness of how far most software engineers are from the prerequisites he outlined. And by extension, I don't think most people know what "comfortability with linear algebra" means if they don't already have it. It's very hard to enumerate your unknown unknowns in this territory.

I get what you are saying, but is the right way to learn math with a "connected path". I've heard "The art of problem solving" series works through math in the correct way, but I'm not sure how far I would get on that alone. Right now I'm trying to gain intuition in linear algebra via OCW with Strang, but I would like to truly understand. Is the only way to just to do a second bachelors in math?

You don't need to do a second bachelors - you really need four or so courses. If you have the patience and dedication you can sit down with the textbooks and work through them on your own.


There's always more you might want to learn, but when people talk about these basics, it's really just being super focused in 4 or so classes, not a whole ivy league undergrad curriculum in math.

probability & stats, multivariable calculus, and linear algebra will take you a long way.

Cool. I will look into those, but I was asking as a general interest in math question. I actually have no interest in machine learning. I'm bored of chasing money. Interested in 3D computer graphics and math for math's sake.

> They haven't needed it, so they haven't retained it even if they learned it in college.

True for me. I knew all of these from my course work when I graduated with my CS degree in 1996. I haven't used them at all in my career, and so I'd be starting basically from scratch re-learning them.

Can you recommend books and online courses to hammer these concepts down? I used PCA and k-means for my masters thesis but didn’t really know how well they work under the covers.

As is mentioned in this thread, Linear Algebra Done Right is a solid textbook for learning linear algebra. I might start there =).

to achieve that "comfortability"

"comfort" is a perfectly cromulent word for this.

I was quoting the article; but thank you, I didn't know that. Good to know.

Ah, my mistake. I must have edited it right out when I read the thing and took the quotes for 'I know I'm making up a word but can't think of anything better right this second'.

> To play devil's advocate, probabilistic reasoning (probability theory, stochastic processes, Bayesian reasoning, graphical models, variational inference) might be equally if not more important.

For intuition, particularly if you care about vision applications, I think one field of math which is severely underrated by the community is group theory. Trying to understand methods which largely proceed by divining structure without first trying to understand symmetry has to be a challenge.

I'm biased; my training was as a mineralogist and crystallographer! But the serious point here is that much of the value of math is as a source of intuition and useful metaphor. Facility with notation is pretty secondary.

Can you talk about the use of group theory for computer vision or crystallography a bit? I'm familiar with the math but I'm not familiar with group theory's applications in those areas. That sounds pretty interesting. Is it primarily group theory, or does it so venture into abstract algebra more generally?

For crystallography, the use of group theory in part originates in X-ray crystallography [1], where the goal is to take 2D projections of a repeating 3D structure (crystal), and use that along with other rules that you know to re-infer what the 3D structure is.

Repeating structures have symmetries, so seeing the symmetries in your diffraction pattern inform you of the possible symmetries (and hence possible arrangements) in your crystal. Group theory is the study of symmetry.

By the way, this is also how the structure of DNA was inferred [2], although not from a crystal.

[1] https://en.wikipedia.org/wiki/X-ray_crystallography#Crystal_...

[2] https://www.dnalc.org/view/15014-Franklin-s-X-ray-diffractio...

> use that along with other rules that you know to re-infer what the 3D structure is

Great answer, thank you :-) Saved me a bunch of typing to explain it less well than you just did.

It's worth adding, for this crowd, that another way of thinking about the "other rules" you allude to is as a system of constraints; you can then set this up as an optimization problem (find the set of atomic positions minimizing reconstruction error under the set of symmetry constraints implied by the space group – so that means that solving crystal structures and machine learning are functionally isomorphic problems.

I thought the work on the structure of DNA used Fourier analysis more than group theory.

I know harmonic analysis in general combines the two, but I'm sure Crick and Watson could have done their work without knowing the definition of a group.

And by Crick and Watson you mean Crick, Watson, Franklin and Wilkins, right? It's fairly clear all four deserve at least partial authorship by modern standards. James Watson was a piece of work.


Crick was absolutely certainly familiar with the crystallographic space groups; he was the student of Lawrence Bragg (https://en.wikipedia.org/wiki/Lawrence_Bragg), who is the youngest ever Nobel laureate in physics – winning it with his father for more or less inventing X-ray crystallography. It's mostly 19th-century mathematics, after all.

For ML, you need both—probability to justify the setup of the problem, and linear algebra and calculus to optimize for a solution.

A simple example is with linear regression: find w such that the squared l2 norm of (Xw - y) is minimized.

Linear algebra will help with generalizing to n data points; and calculus will help with taking the gradient and setting equal to 0.

Probability will help with understanding why the squared l2 norm is an appropriate cost function; we assumed y = Xw + z, where z is Gaussian, and tried to maximize the likelihood of seeing y given x.

I’m sure there’s more examples of this duality since linear regression is one of the more basic topics in ML.

Awesome. I had a really difficult time with math in HS, and never pursued it at all in college, so even though I'm a programmer my math skills are barely at a high school level.

I'd love to get into ML but the math keeps me at bay.

We should find or start a Slack / Discord where we go through a math textbook and conquer our fear of mathematics together.

Oh man, this needs to exist.


Hell, I have a CS degree, and my Maths knowledge is horrific. I didn't take Maths at A Level, so I stopped learning any Maths at 16. The course had Maths, but it's surprisingly easy to brute force a solution, and that was a good ten years ago now.

My goal for years has been to learn enough Maths to be able to read Introduction to Algorithms and TAOCP without aid, and recently to be able to better understand ML, but the more I try the more I realise that it's a multi-year investment.


What's a linear transformation? I get that it's f(x + y) = f(x) + f(y) and f(cx) = c * f(x)... but what does that really mean?

Why is the dot product equivalent to ||a||*||b|| cos C? I really have no idea, I just know the formulas.

> What's a linear transformation? > I get that it's f(x + y) = f(x) + f(y) and f(cx) = c * f(x)... > but what does that really mean?

That's a really good question!

Essentially, when some function (synonym for transformation) is linear, what this tells you is that it has "linear structure," which in turns out to be a very useful property to know about that function.

You can combine the two facts you mentioned above to obtain the equation

   f(a*x + b*y) = a*f(x) + b*f(y)
for which the interpretation is that the linear combination of inputs ax + by is transformed into the SAME linear combination of outputs af(x) + bf(y).

Suppose now that the input space of f can be characterized by some finite set of "directions" e.g. the x- and y-directions in case f is a 2D transformation, or perhaps the x-, y-, and z-directions if f is a 3D transformation. If f is a 3D transformation, using the linear property of f, it is possible to completely understand what f does by "probing" it with three "test inputs," one along each direction. Just input x, y, and z, and record the three output f(x), f(y), and f(z). Since you know f is linear, this probing with three vectors is enough to determine the output of f for any other input ax + by + cz --- the output will be af(x) + bf(y) + cf(z).

See the same explanations as above but in more details here: https://minireference.com/static/excerpts/noBSguide2LA_previ...

So why is this important? Well this "probing with a few input directions" turns out to be really useful. Basically, if f is non-linear, it means it super complicated and would be no simple way to describe what its outputs are for different inputs, but if it is linear then the "probing procedure" works. Furthermore, since both the inputs and outputs have the form of a linear combination (a constant times something + another constant times another thing + a third constant times a third thing), you can arrange these "things" into an array called a matrix and define an operation called "matrix multiplication" which performs the constant-times-something operation of outputs, when you give the constants as an array (vector) of inputs.

In summary, linear algebra is a bunch of machinery for expressing various transformations in terms of vectors and matrices in order to help with modelling various real-world phenomena ranging from computer graphics, biology, chemistry, graphs, crypto, etc. Even ML ;)

holy crap that's an amazing idea

Let’s see where it goes


"Check this shit out:" hahaha, I will buy this :)

I also recommend Linear Algebra Done Right 3rd Ed

I think a lot of people need to start from the basics because they don't have a good foundation in math. The core problem is schools will push you along if you can somehow produce the correct answer for 70% of the problems on a test. Combine this with intense pressure not to fail and you will very likely end up in higher level math courses with many gaping holes in your foundational knowledge. You thus end up relying on tricks and memorization rather than useful understanding. Here is a TED talk where Sal Khan of Khan Academy talks about this: https://www.youtube.com/watch?v=-MTRxRO5SRA

After struggling to understand advanced math in a lot different contexts I decided to go through the entire K-12 set of exercises on Khan Academy. I blazed through the truly elementary stuff like counting and addition in a few hours, but I was suprised at how quickly my progress started slowing down. I found I could not solve problems involving negative numbers with 100% accuracy. Like (5 + (-6) - 4). I would get them right probably 90% of the time but the thing is Khan Academy doesn't grant you the mastery tag unless you get them right 100% of the time. I found most of my problems were due to sloppy mental models. Like, I didn't understand how division works -- if someone were to ask me what (3/4) / (5/6) even means conceptually I would not have been able to provide a coherent, accurate explanation. "Uh... it's like taking 5/6 of 3/4... wait no that's multiplication... you need to flip the second fraction over... for some reason..." It was around the 8th grade level that I found myself having to actually work hard. (What does Pi even mean?) And I've been through advanced Calculus courses at the university level.

> Like, I didn't understand how division works -- if someone were to ask me what (3/4) / (5/6) even means conceptually I would not have been able to provide a coherent, accurate explanation. "Uh... it's like taking 5/6 of 3/4... wait no that's multiplication... you need to flip the second fraction over... for some reason..."

In case you (or others reading this) still struggle to formalize division, a very nice way to conceptualize it is as the inverse of multiplication. This neatly sidesteps the problem of trying to figure out a clean analogue for what it means to to multiply a fraction of something by another fraction of something, since the intuitive group-adding idea of multiplication sort of breaks down with ratios.

Addition is a straightforward operation, but subtraction is trickier. For all real x there exists an additive inverse -x satisfying x + (-x) = 0. So to subtract 3 from 4 we instead take the sum 4 + (-3) = 1.

Likewise to multiply 3 by 4 we add four groups of 3: 3 + 3 + 3 + 3 = 12. We accomplish division by using a multiplicative inverse: for all real x there exists a 1/x such that x(1/x) = 1.

So (3/4) / (5/6) is equal to (3 * 1/4) / (5 * 1/6). In other words, take the multiplicative inverse of 4 and 6 and multiply them by 3 and 5 respectively. Then multiply the first product by the inverse of the second product.

This is the axiomatic basis of division as "repeated subtraction": subtraction is the sum of a number and another number's additive inverse, and multiplication is repeated addition. Then division is the product of a number and another number's multiplicative inverse. From this perspective you need not even understand division computationally if all you'll ever deal with are fractions and not decimals.

So this is really interesting, math is all about relationships, and you've got a really solid understanding of how different operations are related.

Thank you very much for this. I never realized that a) I had no idea how this most fundamental level of math works and b) that it all fits so neatly together.

I've heard of similar stories with adult iliteracy or almost-iliteracy.

I applaud your counter-Dunning-Krugerish inquisitiveness about your own skills. I hope some of that rubs on me.

I had a similar Khan Academy experience. What caught my attention was how much more relevant everything was because I had a wealth of work and life experiences that made the concepts much more relevant and applicable than they might have been when I was in HS.

I'm kinda curious why so many people think that Linear Algebra Done Right is an introductory book for beginners who have math anxiety. Don't get me wrong, the book is great and I enjoyed working it through. It was a magical experience when I saw how simple it was to prove some seemingly hard theorems by just linking the right definitions and theorems. That said, the book does require certain level of math maturity as it achieves its elegance by staying at certain level of abstraction and its style is quite formal, so much so that a person who can use this book as its first linear algebra textbook shouldn't have math anxiety at all.

Speaking as one of the people who recommended it in this thread: I don't think math anxiety is the right focus for which textbook to choose. More precisely, I don't think you should try to solve that problem by getting a different linear algebra textbook. To put it bluntly, someone with math anxiety probably just doesn't have the mathematical maturity for linear algebra yet. In that case they'd be doing themselves a disservice by attempting the material using some sort of "more accessible" book; instead, they should focus on resolving that anxiety through developing a solid foundation in the prerequisite material.

Linear Algebra is typically the first course in which students have to transition from predominantly rote computation to proof-based theory. Axler's Linear Algebra Done Right is very often the textbook used for that course because it (mostly [1]) lives up to its name. This isn't Math 55: compared to Rudin and Halmos, Axler is a very accessible introduction to linear algebra for those who are ready for linear algebra. The floor for understanding this subject doesn't doesn't get much lower than Axler (and in my opinion, it doesn't get much better at the undergraduate level either).

It's unfortunate that so many people want to skip to math they're not ready for, because there's no shame in building up to it. A lot of frustration can be eliminated by figuring out what you're actually prepared for and starting from there. If that means reviewing high school algebra then so be it; better to review "easy" material than to bounce around a dozen resources for advanced material you're not ready for.


1. See Noam Elkies' commentary on where it could improve: http://www.math.harvard.edu/~elkies/M55a.10/index.html

Tools from linear algebra can be accessible and useful to many people who don’t want to (or are not yet prepared to) prove nontrivial theorems. Indeed, a book like Axler’s should probably be used in a second semester-long linear algebra course for typical undergraduates wanting to study abstract mathematics; a gentler more concrete introduction would probably be better for students without previous exposure to linear algebra or hard mathematical thinking. For engineers or others who want to use linear algebra in practical contexts, something like Boyd & Vandenberghe’s new book might be a better for a first (or even second) course than Axler’s book, https://web.stanford.edu/~boyd/vmls/

Elkies’s post is in the context of a course for very well prepared and motivated first-year undergraduate pure math students who are racing through the undergraduate curriculum because most of them intend to take graduate-level courses starting in their second year.

Those two audiences are very far apart.

> Those two audiences are very far apart.

Yes, that's precisely why I said, "This isn't Math 55: compared to Rudin and Halmos, Axler is a very accessible introduction to linear algebra for those who are ready for linear algebra."

How do you propose to teach linear algebra beyond basic matrix operations and Gaussian elimination if you're not teaching any theory? You can take some disparate tools from linear algebra (just like you can with analysis to make calculus), but The presentation of learning the mechanical tools of linear algebra versus the theory of linear algebra is a false dichotomy. Axler's textbook is a very nice compromise that provides students an understanding of why things are the way they are while still teaching them how to work through the numerical motions of things. You need not go so far as reading Finite Dimensional Vector Spaces if you want to avoid theory, but you need enough of it to put the mechanical operations in some kind of context.

Personally I think that the undergraduate mathematics curriculum does a poor job of exposing people to examples and concrete situations before introducing new abstractions.

Students are often entirely unfamiliar with the context (problems, structures, goals, ...) for the new abstractions that are rained down on them, and end up treating their proofs as little exercises in symbol twiddling / pattern matching, without much understanding of what they are doing.

The undergraduate curriculum is put in this position because there is a lot of material to get through in not much time, and students are generally unprepared coming in. Ideally students would have a lot of exposure to basic material and lots of concrete examples starting in middle school or before, but that’s not where we are.

I think we're in agreement on that point. In my experience most peoples' difficulty with higher mathematics comes from the tendency of elementary and high schools to push students along through grades without ensuring they've really mastered the material. Unfortunately most students come to hate math because they're introduced to ever more abstract and complex material when they haven't achieved a solid foundation to build upon. I don't see this artifact of our education system going away any time soon.

Personally I used David Lay's Linear Algebra and Its Applications as my first linear algebra book. It's more formal than Gilbert Strang's famous text book, but less formal than Linear Algebra Done Right. It's emphasis on geometric intuition really struck home, particularly the discussion on coordinate systems, change of basis, and quadratic forms.

For me, it really was the first time math had clicked. I had a non-proof based linear algebra course before going through the book, but it made very little sense to me. After doing LADR, I understood the subject intimately, lost my math anxiety, and performed better in every class I took afterwards than I would have otherwise.

As a teenager I thought I was bad at math and even went to get a film studies/communications bachelors (I've defended this week my masters thesis in computational mathematical physical; this after an undegraduare degree in econ with lots of math of course).

The thing is, I couldn't write the damn matrices well lined up and made mistakes when doing calculations. This was really a (de)formative experience. In college Linear Algebra for econ was 40% gaussian elimination, 40% eigenvalues and 20% linear programming. I mean, I still can't do gausian elimination by hand right.

I started crawling out of it when I started seeing (in self-study) a book on linear algebra that takes the linear transform/vector space-first approach.

This is excellent. Thank you for taking the time to write it.

I don't know what is it about math -- especially when it involves manipulation of symbols as opposed to pictures or lay language -- that turns off so many people.

The fact that so many software developers "don't like math" is ironic, because they're perfectly happy to manipulate symbols such as "x", "file", or "user_id" that stand in for other things every day. The entirety of mathematical knowledge is very much like a gigantic computer language (a formal system) in which every object is and must be precisely defined in terms of other objects, using and reusing symbols like "x", "y", "+", etc. that stand in for other things.

Perhaps the issue is motivation? Many wonder, "why do I need to learn this hard stuff?" If so, the approach taken by Rachel Thomas and Jeremy Howard at fast.ai seems to be a good one: build things, and then fill the theoretical holes as needed, motivated by a genuine desire to understand.

> I don't know what is it about math -- especially when it involves manipulation of symbols as opposed to pictures or lay language -- that turns off so many people.

The biggest turn off about math is the way people are taught math.

Most people are taught math as if it's an infinite set of cold formulas to memorize and regurgitate. Most students in my statistics class didn't know where and when to use the formulas taught in real life; they only knew enough to pass the tests. Students who obtain As in Algebra 2 hardly know where the quadratic formula comes from (and what possibly useful algebraic manipulation could you do if you can't even rederive the quadratic formula?). It's not just math, I've been in a chemistry class where the TA was getting a masters in chemistry and yet she taught everyone in my class a formula so wrong that if interpreted meant that everytime a photon hits an atom, then an electron will be ejected with the same energy and speed as the photon. This is obviously wrong but when I pointed it out, everyone thought I was wrong because "that's not what it says in the professor's notes" (later, the professor corrected their notes). In my physics class, the people who struggled the most are the ones who tried the least to truly grasp where the formulas come from. I don't blame them, it's the way most schools teach.

> build things, and then fill the theoretical holes as needed, motivated by a genuine desire to understand.

I totally agree.

Source: My experience with tutoring people struggling with math for the past eight years. I used to like math then I got to college where 95% of people don't understand the math they're doing and thus can't be creative with it; this includes the professors who teach math as the rote memorization of formulas. Yeah, call me arrogant, but I have found it to be true in my experience. I strongly believe the inability to rederive or truly grasp where things come from destroys the ability to be creative and leads to a lack of true understanding. But everyone believes they understood the material because they got an A on the exam. I'll stop ranting on this now.

This is exactly my experience! I breezed through math all the way until I got to calculus because I was excellent at rote memorization. Calculus made me realize that I didn't really understand most of what I had learned for the past several years.

Years later, I'm trying to relearn math, but I'm taking the exact opposite approach. No calculator, no rote memorization, just reading about the concepts and thinking about what they mean until I can do the manipulations in my head. When I do practice problems, I don't care so much about the specific numbers, but about my ability to understand what's happening to each part of an equation, what the graph looks like, etc.

The impression I get is that many people want to be system designers stringing together pieces to create systems to solve problems (part of the motivation might be that it is easier to extract economic value from such integrated solutions, rather than better functioning pieces).

The problem is that in an immature field that's still evolving, the components are not yet well-understood or well-designed, so available abstractions are all leaky. However, modern software engineering is mostly built on the ability to abstract away enormous complexity behind libraries, so that a developer who is plumbing/composing them together can ignore a lot of details [1]. People with that background now expect similarly effective abstractions for machine learning, but the truth is that machine learning is simply NOT at that level of maturity, and might take decades to get there. It is the price you pay for the thrill of working in a nascent field doing something genuinely uncharted.

"Math in machine learning" is a bit of a red herring. We hear the same complaints about putting in effort to grok ideas in functional programming, thinking about hardware/physics details, understanding the effects of software on human systems [2], etc. Fundamentally, I think a lot of people have not developed the skill to fluidly move between different levels of abstraction, and a variety of approximately correct models. And to be fair, it seems like most of software engineering is basically blind to this, so one can't shift all the blame on individuals.

[1] Why the MIT CS curriculum moved away from Scheme towards Python -- https://www.wisdomandwonder.com/link/2110/why-mit-switched-f...

[2] Building software through REPL-it-till-it-works leads to implicitly ignoring important factors (such as ethics) -- https://news.ycombinator.com/item?id=16431008

Yes, I see your point.

Deep learning, in particular, is a trade today. If we want to be generous we can call it an "experimental science"... but my perception is that only a minority of papers in the field actually deserve that moniker.

(Speaking as a deep learning practitioner with expertise in a narrow domain.)

I don't know what is it about math -- especially when it involves manipulation of symbols as opposed to pictures or lay language -- that turns off so many people.

I can tell you at least part of it, from my subjective perspective. I tend to "think" in a very verbal fashion and I instinctively try to sub-vocalize everything I read. So when I see math, as soon as I see a symbol that I can't "say" to myself (eg, a greek letter that I don't recognize, or any other unfamiliar notation) my brain just tries to short-circuit whatever is going on, and my eyes want to glaze over and jump to the stuff that is familiar.

OTOH, with written prose, I might see a word I don't recognize, but I can usually work out how to pronounce it (at least approximately) and I can often infer the meaning (at least approximately) from context. So I can read prose even when bits of it are unfamiliar.

There's also the issue that math is so linear in terms of dependencies, and it's - in my experience - very "use it or lose it" in terms of how quickly you forget bits of it if you aren't using it on day-in / day-out basis.

My way of dealing with this is to treat symbols as proxies for the verbal concept, rather than just some letters. As an example, when I see "E = mc^2" I read (energy-of-object) = (mass-of-object) * (speed-of-light)^2 and not "Eee equals Em Cee Square". Another great idea I use a lot when writing/reading is David Mermin's 2nd rule (verbalize the damn equation!) [1].

It's sad that many mathematical resources do not make a careful effort of helping someone reason verbally. I guess this is partly due to the fact that most people who are skilled in the subject and write about it prefer equational reasoning (for lack of a better word) to verbal reasoning! In my experience as a physics instructor for non-STEM majors, this might be one of the biggest impediments for otherwise intelligent people trying to learn math/physics.

[1] What's wrong with these equations? by David Mermin -- http://home.sandiego.edu/~severn/p480w/mathprose.pdf

Very good point about reading/verbalizing symbols and notation. Once you know what they mean, they are super useful for expressing complex concepts precisely and compactly, but when you're getting started they look like an alien language...

This is why I'm using Anki to memorize the Greek alphabet, and to keep basic algebraic (h.s. algebra that is) stuff in mind. It might seem like a small thing, but remembering the various rules for factoring, working with fractions, dealing with exponents / root, etc. is not easy when you don't do math all the time.

> The fact that so many software developers "don't like math" is ironic, because they're perfectly happy to manipulate symbols such as "x", "file", or "user_id" that stand in for other things every day. The entirety of mathematical knowledge is very much like a gigantic computer language (a formal system) in which every object is and must be precisely defined in terms of other objects, using and reusing symbols like "x", "y", "+", etc. that stand in for other things.

I don’t find it ironic, because I wouldn’t expect engineers to make good mathematicians implicitly (nor vice versa). There is some similarity between math and programming, but there is also a collossal amount of dissimilarity that makes them different things entirely.

For example, notation and terminology in mathematics is not actually rigorous. It’s highly context dependent and frequently overloaded (take the definition of “normal”, the notation of a vector versus a closure, or the notation of a sequence versus a collection of sets). As another example, consider that beyond the first few courses of undergraduate math you’re wading into a sea of abstraction which you can only reason about. There is no compiler flag to ensure your proof is correct in the general case, and you don’t have good, automatic feedback on whether or not the math works. In this sense, the entirety of mathematical knowledge is actually very much not like a formal computer language.

Beyond that, the ceiling of complexity for theoretical computer science or applied mathematics is far higher than programming. It’s not so much motivation (though that can be an issue too), it’s that learning the mathematics for certain things simply takes a vast amount of time. Meanwhile a professional programmer has to become good at things that mathematicians and scientists don’t have to care about, like version control or the idiosyncrasies of a specific language.

They're really orthogonal disciplines, for much the same reason that engineering isn't like computer science. There is a world of difference between proving the computational complexity of an algorithm and implementing an algorithm matching that complexity in the real world.

> Meanwhile a professional programmer has to become good at things that mathematicians and scientists don’t have to care about, like version control or the idiosyncrasies of a specific language.

Really depends on what kind of mathematician or scientist you are to be honest though. How good is someone's data analysis of an experiment if they can't reproduce it? Or if they've got 6 different versions of an application with 100k lines of code in a single file, each labelled "code_working(1).f90", "code_not_working.f90", etc... These are real problems with what people actually do in science; software development skills are poor and people do things badly.

There're organisations like Software Carpentry globally and the Software Sustainability Institute in the UK which exist to try and promote some thought about developing software as researchers, and making the software sustainable in the long term rather than letting it die every time a PhD student leaves.

That's essentially my point. Programming and mathematics are so different from each other that, without special effort, a professional in one domain shouldn't be expected to be meaningfully better than average in the other.

This applies in both directions most mathematicians and scientists have such poor version control and development hygiene because mathematics doesn't imbue them with any special insight about how to be an engineer.

True. I cannot disagree with any of this.

That said, the math we're talking about (that is, the math necessary for understanding, say, the sequence of transformations that make up a convnet) lies far below the ceiling of complexity you mention.

I'm not sure symbol reuse and overloading are as much of an issue. I've run into people who are quite proficient with Perl and routinely use complicated regular expressions who say they didn't like math growing up.

> The fact that so many software developers "don't like math" is ironic, because they're perfectly happy to manipulate symbols such as "x", "file", or "user_id" that stand in for other things every day.

`user_id` says what it is; something like `β` does not. It's more like reading minified JavaScript than literate programming. Math notation is frequently horribly overloaded and needlessly terse.

Math education from the undergraduate level on is fairly horrible and not communicated well. Just go read the typical calculus textbook and realize that they reference a lot of stuff that no pre-calculus student would typically know, such as proof by induction, lemmas and so on. The textbooks are written to the professors, not the actual students.

Various non-intuitive concepts are handwaved, the foundations skipped over and students then start struggling because they don't understand the foundation of what they are trying to learn. Reading from the textbook is fairly useless and it ends up being used as a problem set source.

I argued to a few math professors about teaching things like calculus with the textbooks referencing concepts that were not actually taught until 5 classes later is a bad idea.

In return I got a shrug of indifference telling me that's just the status quo and the status quo is OK.

Thank god khan academy exists now.

I attended a not-superb high school in rural Missouri, and we studied proof by induction in 11th grade, before calculus in 12th. (Only over the naturals, but that's enough to get the flavor...) Lemmas came in 10-grade geometry, although frankly I may not understand what you mean because that's not really a full "concept", just kind of an arbitrary detail. That was the early 90s, though, so perhaps standards have slipped.

Of course, colleges should cater to a range of preparatory educations, but the textbooks you're talking about are pitched at the correct level for somebody.

My recollection is vague, since it was quite a while ago, but there were other things that weren't introduced in my high school curriculum that I remember my calculus textbook containing. I'm glad your school taught you proof of induction and other such things although. I also remember talking to my classmates about how near indecipherable our calculus textbook was.

First semester of college, I took multivariate. That was kind of a mess, because instead of a textbook we got a "bound" compilation of our professor's notes that he was trying to turn into a textbook. It has been some time, but I kept lots of math books from college, and I specifically threw that thing out. My recollection is that it was pretty much worthless.

Well, I think the difference between developer symbols and math symbols is that developer symbols are lot more google-able. I can google a line of code, but it's hard to google a crap ton of greek letters and summations. Even if I did manage to parse it into a google search somehow, I probably wouldn't get any meaningful results.

Also, for me personally, it's just such a drag to learn all the notation. After the fact, I've always thought, "Wow, that's all this means?" but while I'm learning, I feel helpless. It doesn't feel like I have any way to google it. My professors never actually want to sit down and explain it to me. All the pages of math equations always look so intimidating. It's just such a drag.

I don’t enjoy math and I simply don’t have the intuition for it. Every time I attempt to do math in my head my brain groans and says “It’s the 21st century, jackass. Use a calculator.”

I do, however, have a talent for language.

The reason I am a good developer is because I can communicate with different machines through different programming languages in the same way I can communicate with different people through different human languages.

I have tried as of late to learn math in an attempt to contextualize it as language - the language of the universe, really - but it is far more of an uphill climb for me than JavaScript or Chinese.

I don't think we "don't like math", but in my case, I just need an accelerated version of the "math" that I need without the deep-dive.

Here's a crazy idea that machine learning might one day help with software engineers understanding algorithms and data structures.

You write some code to traverse a list or something and do some naive sorting, or maybe you're "everyday way of doing some operations on your lists is inefficent". I want some cool machine learning where I can submit my code and it does analysis.

I think Microsoft is working on that. https://techcrunch.com/2018/05/07/microsofts-new-intellicode...

Let's take it a step further. Explain to the programmer why what their doing is wrong.

I would pay big bucks for a "machine intelligence" IDE

My bullet list, which might be too ambitious and theory-focused, but this is what I used from my physics background.

Learn some:

Calc up to 3 (you can skip some of the divergence and curl stuff)

Linear algebra (no need for Jordan change of basis)

Real analysis

Intermediate probability theory (MAE, MAP, conjugate priors minus the measure theory stuff)

A little bit of differential geometry (at least geodesics. This is for dimension reduction)

Discrete math (know counting and sums really well)

Learn a little bit of Physics (at least know Lagrangians and Hamiltonians)

A little bit of complex analysis (to know contour integration and fourier/laplace transforms)

Some differential equations (up to Frobenius and wave equations)

Some graph theory (my weak spot, but I have used the matrix representations a few times)

After all that, read some Kevin Murphy and Peter Norvig.

Congrats, now you can read most machine learning papers. The above will also give you the toolkit to learn things as they come up like Robbins-Monro.

OP's article is much better if you are trying to be a ML developer/practitioner. Like I said, this list might be too theory focused, but it lets me read lots of applied math papers that aren't ML focused.

I'm interested to know where you encountered contour integrals in machine learning?

ya lol and Hamiltonians. sometimes people just reel off all the math they've heard of to sound impressive. next we'll have people talking about de rham cohomology because of TDA (or something like that)

Hamiltonian mechanics, along with many other seemingly out of place 'advanced' maths, show up in modern Bayesian statistics pretty frequently. Hamiltonian Monte Carlo/Riemannian Manifold Monte Carlo are pretty cutting edge (although are implemented in popular libraries like MC-Stan and Pymc3) and both require fairly advanced physics to really understand.

Additionally, we're seeing the introduction of even more sophisticated stochastic samplers (stochastic gradient hamiltonian monte-carlo, etc) that require even more esoteric branches of math and physics to really grok. I have a strong math background but frequently find myself struggling with a lack of knowledge in statistical mechanics when trying to read papers in these areas.

So yeah - there's plenty of bullshit and exaggeration. But there's also some wicked cool stuff happening which requires very sophisticated (and specialized) knowledge to understand.

I agree. My original question came from curiosity, not incredulity :)

:) Personally I've never used any serious complex analysis in my job (I'm very grateful too, because I always struggled a bit with it). The closest thing I've seen, which I did run into recently, is the use of complex numbers to compute very accurate finite differences. It's one of those delightful tricks that is both elegant and useful: https://blogs.mathworks.com/cleve/2013/10/14/complex-step-di...

I've been working in golang, which fortunately has built-in complex128 types, so it's proved very helpful in a project!

oh cool, I really like Cleve Moler's Matlab posts.

I took a complex analysis class and did OK, but I get the feeling that EEs are the ones who really benefit from it (at least in the applied world). They seem to have some very rich analyses of linear dynamical systems using frequency domain methods.


I believe the above was used in sk-learn or PyMC3 at some point.

Like I said, the list was a bit too theory focused and not just for ML. Hope that clears things up.

Also, maybe a bit out of place, but it makes my day happier when I assume good intentions out of random tidbits posted online.

I have only seen it used to evaluate some integrals in a few papers via Residue Theorem, so maybe I should have said Residue theorem instead of contour integration. I am sure there were other methods that the authors could have used, but I was sure glad to know of contour integration then. I'd say some complex analysis still deserves to be on the list to have a base understanding of Fourier transforms. Of course, you can arrive there without complex analysis.

Can you give any recommendations for a little bit of differential geometry?

I think the standard reference is probably Spivak's 'Calculus on Manifolds' but this never really did it for me.

If you have a background in physics then some combination of Nakahara's 'Geometry, Topology and Physics' and Baez and Muniain's 'Gauge Fields, Knots and Gravity' might be good (I haven't included relativity textbooks as I assume it you have a background in GR then you have enough differential geometry).

An unusual recommendation that I think is really nice is 'Stochastic Models, Information Theory and Lie Groups' by Chirikjian. It covers a few other topics mentioned in this thread and is really nice. It's _extremely_ concrete and spells out a lot of calculations in great detail. Plus, the connection to engineering applications is much more obvious.

Chirikjian's book looks really cool! Its website says that in volume 1 "The author reviews stochastic processes and basic differential geometry in an accessible way for applied mathematicians, scientists, and engineers." And I can't tell if that means 'brief review because this is a prereq to the book' or if this is a good first take on it. Do you know which it is?

This is a little difficult for me since I learned it mostly in the sense of general relativity (which is why I said some differential geometry). For that course, I mostly used the instructor's lecture notes. However, the books for the course were:

Hartle, James B., Gravity: An Introduction to Einstein's General

Schutz, Bernard, A First Course in General Relativity

The first few chapters would be all you need, but they don't include the nice things I learned from the lecture notes like how to derive the gradient, divergence, and curl in any curvilinear coordinate system by using the Christoffel symbols.

Sorry that I can't be of much more help.

Sounds like EE.

For those who don't know, please check out 3blue1brown videos on youtube for a better understanding of concepts like Linear algebra and other things required for machine learning. Thank me later.

I love 3blue1brown! Will add to resources. :)


This is something we're striving hard to do at the startup I'm involved with (end-to-end resources for learning machine learning, with just high school math background assumed).

In our Data Scientist Track (https://www.dataquest.io/path/data-scientist?), I specifically focused on teaching K-nearest neighbors first b/c it has minimal math but you can still teach ML concepts like cross-validation, and then I wrote Linear Algebra and Calculus courses before diving into Linear Regression.


I recommend the No Bullshit books for anyone with no real math background past trig to get their feet wet, and/or anyone who hasn't done any serious math study for years.


Thx! Had I known you'll post this, I wouldn't have self-promoted so shamelessly :) I'll add some direct links to PDF previews:

MATH & PHYS book: https://minireference.com/static/excerpts/noBSguide_v5_previ...

LA book: https://minireference.com/static/excerpts/noBSguide2LA_previ... + free tutorial: https://minireference.com/static/tutorials/linear_algebra_in...

Hah I beat you by 2 minutes. Thanks for the great books!

Thanks for posting this!! Was actually searching for this the other day here on HN and found a link to the https://github.com/mml-book/mml-book.github.io. haven't checked it out yet but the links in the OP look solid.

Looks interesting! Have you gone through it yourself? And how does it compare to other resources?

Like I commented above I haven't had a chance to go through the https://github.com/mml-book/mml-book.github.io book yet. But now that I have read your article in full I think diving headlong first with ML and then back filling the Math/Stat/Prob holes is the best approach to learn ML engineering. Like SICP authors mused about modern software development as being "programming by poking at it using APIs" instead of just lesrning to program just for the heck of it.

Hi Vincent, you may want to point your "Best Practices for ML Engineering" link to the non-PDF version here: https://developers.google.com/machine-learning/guides/rules-...

Thanks for the heads up!

The LAFF: Linear Algebra class just started again for the "fall semester" https://courses.edx.org/courses/UTAustinX/UT.5.02x/1T2015/co...

Maybe one of these days I'll complete it :)

I really like 3Blue1Brown for a wide range of math topics. He's just a great teacher.


Frankly, I find the UTAustin linear algebra class less than ideal or optimal, but it's free and lots of classmates, material, so...

Re: PCA vs. tSNE. I don't know much about tSNE, but if it is a "manifold learning method" as the sklearn docs say, you could try something like LTSA instead:

e.g. http://www.aaai.org/ocs/index.php/aaai/aaai11/paper/download...

Then, it's not difficult to understand what a manifold is, but it took me a number of attempts to get it, and then I only did when studying them formally with Spivak 1963. Now the concept of manifold seems patently obvious to me and not really needing much formalization, but...

Thanks for the reference... will give it a read.

There's also UMAP, which is new but looks promising.

Just wondering if someone had a similar experience: I absolutely loved Math in school, zipped through the classes, always one of the best.

Then things changed at university (studying computer science) and I completely lost interest. Not sure why (bad teacher, going from being best in class to being average, the math at uni different from school).

Now, much later, I regret not having followed through and miss the beauty of Math. I'm re-discovering it and wondering how I could use more of it in my work.

>A student’s mindset, as opposed to innate ability, is the primary predictor of one’s ability to learn math (as shown by recent studies).

The article seems good overall, but I only skimmed the rest after seeing a citation of a 5-year-old Atlantic article describing disputed and at minimum highly exaggerated findings presented as 'shown in recent studies'.

That may be a bad reference, but there are lots of studies about this. See the book Mathematical Mindsets by Jo Boaler for many references.

Nice article. Would you recommend this MOOC? https://www.coursera.org/specializations/mathematics-machine... It doesn't focus on probability or statistics though. If not, is there any other MOOC you would suggest?

I really want a shallow-dive into machine learning and I know I need linear-algebra as a foundation. I would love an interactive course in linear algebra where we could input matrices and see some visual stuff with animations.

Check out 3blue1brown on youtube https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2x... Everything this guy does is gold!

This is also really good for connecting LA concepts with visuals http://immersivemath.com/ila/index.html

Does anyone have suggestions on learning resources for matrix calculus? I'm trying to come up to speed with the topic and could use pointers to worked examples, video lectures, etc.

[The Matrix Calculus You Need For Deep Learning](https://arxiv.org/pdf/1802.01528.pdf)

Anyone have a suggestion for a good online course in linear algebra?

Yes, UIUC offers very good online math courses: https://netmath.illinois.edu/college/math-415. There is also a more pure/abstract version of that course available.

If you don’t care about accreditation and are patient, sit down with Axler’s Linear Algebra Done Right and Hoffman & Kunze’s Linear Algebra, in that order.

I would caution you against trying to learn linear algebra using a “take what you need” approach. A random walk approach to learning the material is faster than an accumulation approach, but it’s more brittle and prone to confusion. A lot of things which appear to be irrelevant or unnecessary for machine learning (computation or research) can be imperative for understanding or implementing much more complex concepts later on.

Thank you! I am interested in the knowledge rather than the credits, so I appreciate the book recommendations :)

The video lectures of Prof. Gilbert Strang’s linear algebra class at MIT are very good: http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-...

He's an amazing teacher and conveys a lot of intuition + makes even complicated ideas look straightforward.

Awesome resource, thank you!

I like "Coding the Matrix" by Philip Klein of Brown delivered via Coursera. It's a deep content intro to linear algebra (and more), with a focus on applications in computer science. The course is accompanied by a textbook written by Klein, which makes the course material better organized and more in-depth than slides and videos alone would allow.


Thank you, this looks great :)

I have been waiting for something like this for months. This is inconceivably valuable to me. Thank you so much!

In the author's example, the function max(0, x) they subsequently differentiate isn't differentiable.

It is within the context of distributions or generalized functions (https://en.wikipedia.org/wiki/Distribution_(mathematics)) but people are often loose on the terminology and tend to just use the term "functions". It's a wonderful topic, with a lot of interesting applications in differential equations and physics.

I just found a quick explanation by Terence Tao about why people are generally loose in this case, meaning that some properties transition nicely from smooth (here, differentiable) top the rough categories by passing to the limit and density arguments: http://www.math.ucla.edu/~tao/preprints/distribution.pdf

Of course there are exceptions.

It’s differential everywhere except at x=0. At x=0 it actually has a subdifferential—think of it as the set of slopes of lines that are tangent at that point.

Might look at the thread

Foundations Machine Learning (bloomberg.github.io)



There machine learning (ML) is basically a lot of empirical curve fitting. The context is usually with a lot of data, thousands of variables, millions or billions of data points, observations, pairs of values of thousands of independent variables and the value of the corresponding dependent variable. The work is all a larger, more data, version of: You have a high school style X-Y coordinate system and some points plotted there. So, you want to find values for coefficients a and b so the line

y = ax + b

fits the points as well as possible. But, you can do variations, try to fit, say,

log(y) = a sin(x) + b

Or replace log or sin with any functions you want and try again.

The logic, rational support, is essentially as follows: So, take, say, 1000 x-y pairs. Partition these into 500 training data and 500 test data. Find the best fit you can, using whatever fits, to the training data. Then take the equation and see how well it fits the test data. If the fit of the test data is also good, then that is your model.

Now you want to apply the model in practice, apply the model to data did not see in the given 1000 points. So for the application, will be given a value of x, plug it into the equation, and get the corresponding value of y. That's what you want -- maybe the value of y gives you Y|N for ad targeting, Y|N cancer, what MSFT will be selling for next month, what the revenue will be for next year, etc.

The rational, logical justification here is an assumption (which should have some justification from somewhere) that the x you are given and the y you want for that value of x is sufficiently like the x-y values you had in the original 1000 points.

Okay. Empirical curve fitting to a lot of data to make a predictive model, that is found with training data, tested with test data, and applied where the given data in the application is like the data used in the fitting.

The OP mentions that some people believe that to make progress to real machine intelligence, need more math than what I outlined.

My guess is that to make that intended progress, for all but some tiny niche cases, first need some much more powerful and quite different ideas, techniques, etc. than in the curve fitting ML I outlined.

Yes, there is a chance that with lots of data from working brains and lots of such empirical fitting we will be able to find some fits that will uncover some of the workings of the brain crucial for real intelligence. Uh, that's a definite maybe!

But there is a lot more to what can be done to build predictive models than such curve fitting, empirical or otherwise. I outlined some such in the thread that I referenced above.

So, for the question in the OP, what math? Well, if want to pursue directions other than the empirical curve fitting in the Bloomberg course I referenced above, my experience is -- quite a lot. For the education, start with a good undergraduate major in pure math. So, cover the usual topics, calculus, abstract algebra, linear algebra, differential equations, advanced calculus, probability, statistics. Then continue with more in algebra, analysis, and geometry.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact