> As soft prerequisites, we assume basic comfortability with linear algebra/matrix calc [...]
That's a bit of an understatement. I think anyone interested in learning ML should invest the time needed to deeply understand Linear Algebra: vectors, linear transformations, representations, vector spaces, matrix methods, etc. Linear algebra knowledge and intuition is key to all things ML, probably even more important than calculus.
Book plug: I wrote the "No Bullshit Guide to Linear Algebra" which is a compact little brick that reviews high school math (for anyone who is "rusty" on the basics), covers all the standard LA topics, and also introduces dozens of applications. Check the extended preview here https://minireference.com/static/excerpts/noBSguide2LA_previ... and the amazon reviews https://www.amazon.com/dp/0992001021/noBSLA#customerReviews
To play devil's advocate, (EDIT: an intuitive understanding of) probabilistic reasoning (probability theory, stochastic processes, Bayesian reasoning, graphical models, variational inference) might be equally if not more important.
The emphasis on linear algebra is an artifact of a certain computational mindset (and currently available hardware), and the recent breakthroughs with deep neural networks (tremendously exciting, but modest success, in the larger scheme of what we wish to accomplish with machine learning). Ideas from probabilistic reasoning might well be the blind spot that's holding back progress.
Further, for a lot of people doing "data science" (and not using neural networks out the wazoo) I think that they can abstract away several linear algebra based implementation details if they understand the probabilistic motivations -- which hints at the tremendous potential for the nascent area of "probabilistic programming".
And of course, you're not going to get very far with probability theory and stochastic processes unless you have a mature understanding of analysis and measure theory :)
This comment exchange neatly demonstrates the intrinsic problem. Most of these articles start off much like this one does: by assuming "basic comfortability with linear algebra." That sounds straightforward, but most software engineers don't have it. They haven't needed it, so they haven't retained it even if they learned it in college. It takes a good student a semester in a classroom to achieve that "comfortability", and for most it doesn't come until a second course or after revisiting the material.
If you don't already have it, you can't just use StackExchange to fill in the blanks. The random walk method to learning math doesn't really pan out for advanced material because it all builds on prior definitions. Then people like you make a comment to point out (correctly) that probability theory is just as important for all the machine learning that isn't just numerical optimization. But unless you want to restrict yourself to basic statistics and discrete probability, you're going to have a bad time working on probability without analysis. And analysis is going to a pain without calculus, and so on and so forth.
There are certain things you need to spend a lot of time learning. Engineering and mathematics are both like that. But I think many of these articles do a disservice by implying that you can cut down on the learning time for the math if you have engineering experience. That's really not the case. If you're working in machine learning and you need to know linear algebra (i.e. you can't just let the underlying library handle that for you), you can't just pick and choose what you need. You need to have a robust understanding of the material. There isn't a royal road.
I think it's really great people like the author (who is presumably also the submitter) want to write these kinds of introductions. But at the same time, the author is a research assistant in the Stanford AI Lab. I think it's fair to say he may not have a firm awareness of how far most software engineers are from the prerequisites he outlined. And by extension, I don't think most people know what "comfortability with linear algebra" means if they don't already have it. It's very hard to enumerate your unknown unknowns in this territory.
There's always more you might want to learn, but when people talk about these basics, it's really just being super focused in 4 or so classes, not a whole ivy league undergrad curriculum in math.
probability & stats, multivariable calculus, and linear algebra will take you a long way.
True for me. I knew all of these from my course work when I graduated with my CS degree in 1996. I haven't used them at all in my career, and so I'd be starting basically from scratch re-learning them.
"comfort" is a perfectly cromulent word for this.
For intuition, particularly if you care about vision applications, I think one field of math which is severely underrated by the community is group theory. Trying to understand methods which largely proceed by divining structure without first trying to understand symmetry has to be a challenge.
I'm biased; my training was as a mineralogist and crystallographer! But the serious point here is that much of the value of math is as a source of intuition and useful metaphor. Facility with notation is pretty secondary.
Repeating structures have symmetries, so seeing the symmetries in your diffraction pattern inform you of the possible symmetries (and hence possible arrangements) in your crystal. Group theory is the study of symmetry.
By the way, this is also how the structure of DNA was inferred , although not from a crystal.
Great answer, thank you :-) Saved me a bunch of typing to explain it less well than you just did.
It's worth adding, for this crowd, that another way of thinking about the "other rules" you allude to is as a system of constraints; you can then set this up as an optimization problem (find the set of atomic positions minimizing reconstruction error under the set of symmetry constraints implied by the space group – so that means that solving crystal structures and machine learning are functionally isomorphic problems.
I know harmonic analysis in general combines the two, but I'm sure Crick and Watson could have done their work without knowing the definition of a group.
Crick was absolutely certainly familiar with the crystallographic space groups; he was the student of Lawrence Bragg (https://en.wikipedia.org/wiki/Lawrence_Bragg), who is the youngest ever Nobel laureate in physics – winning it with his father for more or less inventing X-ray crystallography. It's mostly 19th-century mathematics, after all.
A simple example is with linear regression: find w such that the squared l2 norm of (Xw - y) is minimized.
Linear algebra will help with generalizing to n data points; and calculus will help with taking the gradient and setting equal to 0.
Probability will help with understanding why the squared l2 norm is an appropriate cost function; we assumed y = Xw + z, where z is Gaussian, and tried to maximize the likelihood of seeing y given x.
I’m sure there’s more examples of this duality since linear regression is one of the more basic topics in ML.
I'd love to get into ML but the math keeps me at bay.
Hell, I have a CS degree, and my Maths knowledge is horrific. I didn't take Maths at A Level, so I stopped learning any Maths at 16. The course had Maths, but it's surprisingly easy to brute force a solution, and that was a good ten years ago now.
My goal for years has been to learn enough Maths to be able to read Introduction to Algorithms and TAOCP without aid, and recently to be able to better understand ML, but the more I try the more I realise that it's a multi-year investment.
What's a linear transformation? I get that it's f(x + y) = f(x) + f(y) and f(cx) = c * f(x)... but what does that really mean?
Why is the dot product equivalent to ||a||*||b|| cos C? I really have no idea, I just know the formulas.
That's a really good question!
Essentially, when some function (synonym for transformation) is linear, what this tells you is that it has "linear structure," which in turns out to be a very useful property to know about that function.
You can combine the two facts you mentioned above to obtain the equation
f(a*x + b*y) = a*f(x) + b*f(y)
Suppose now that the input space of f can be characterized by some finite set of "directions" e.g. the x- and y-directions in case f is a 2D transformation, or perhaps the x-, y-, and z-directions if f is a 3D transformation. If f is a 3D transformation, using the linear property of f, it is possible to completely understand what f does by "probing" it with three "test inputs," one along each direction. Just input x, y, and z, and record the three output f(x), f(y), and f(z). Since you know f is linear, this probing with three vectors is enough to determine the output of f for any other input ax + by + cz --- the output will be af(x) + bf(y) + cf(z).
See the same explanations as above but in more details here: https://minireference.com/static/excerpts/noBSguide2LA_previ...
So why is this important? Well this "probing with a few input directions" turns out to be really useful. Basically, if f is non-linear, it means it super complicated and would be no simple way to describe what its outputs are for different inputs, but if it is linear then the "probing procedure" works. Furthermore, since both the inputs and outputs have the form of a linear combination (a constant times something + another constant times another thing + a third constant times a third thing), you can arrange these "things" into an array called a matrix and define an operation called "matrix multiplication" which performs the constant-times-something operation of outputs, when you give the constants as an array (vector) of inputs.
In summary, linear algebra is a bunch of machinery for expressing various transformations in terms of vectors and matrices in order to help with modelling various real-world phenomena ranging from computer graphics, biology, chemistry, graphs, crypto, etc. Even ML ;)
After struggling to understand advanced math in a lot different contexts I decided to go through the entire K-12 set of exercises on Khan Academy. I blazed through the truly elementary stuff like counting and addition in a few hours, but I was suprised at how quickly my progress started slowing down. I found I could not solve problems involving negative numbers with 100% accuracy. Like (5 + (-6) - 4). I would get them right probably 90% of the time but the thing is Khan Academy doesn't grant you the mastery tag unless you get them right 100% of the time. I found most of my problems were due to sloppy mental models. Like, I didn't understand how division works -- if someone were to ask me what (3/4) / (5/6) even means conceptually I would not have been able to provide a coherent, accurate explanation. "Uh... it's like taking 5/6 of 3/4... wait no that's multiplication... you need to flip the second fraction over... for some reason..." It was around the 8th grade level that I found myself having to actually work hard. (What does Pi even mean?) And I've been through advanced Calculus courses at the university level.
In case you (or others reading this) still struggle to formalize division, a very nice way to conceptualize it is as the inverse of multiplication. This neatly sidesteps the problem of trying to figure out a clean analogue for what it means to to multiply a fraction of something by another fraction of something, since the intuitive group-adding idea of multiplication sort of breaks down with ratios.
Addition is a straightforward operation, but subtraction is trickier. For all real x there exists an additive inverse -x satisfying x + (-x) = 0. So to subtract 3 from 4 we instead take the sum 4 + (-3) = 1.
Likewise to multiply 3 by 4 we add four groups of 3: 3 + 3 + 3 + 3 = 12. We accomplish division by using a multiplicative inverse: for all real x there exists a 1/x such that x(1/x) = 1.
So (3/4) / (5/6) is equal to (3 * 1/4) / (5 * 1/6). In other words, take the multiplicative inverse of 4 and 6 and multiply them by 3 and 5 respectively. Then multiply the first product by the inverse of the second product.
This is the axiomatic basis of division as "repeated subtraction": subtraction is the sum of a number and another number's additive inverse, and multiplication is repeated addition. Then division is the product of a number and another number's multiplicative inverse. From this perspective you need not even understand division computationally if all you'll ever deal with are fractions and not decimals.
I applaud your counter-Dunning-Krugerish inquisitiveness about your own skills. I hope some of that rubs on me.
I don't know what is it about math -- especially when it involves manipulation of symbols as opposed to pictures or lay language -- that turns off so many people.
The fact that so many software developers "don't like math" is ironic, because they're perfectly happy to manipulate symbols such as "x", "file", or "user_id" that stand in for other things every day. The entirety of mathematical knowledge is very much like a gigantic computer language (a formal system) in which every object is and must be precisely defined in terms of other objects, using and reusing symbols like "x", "y", "+", etc. that stand in for other things.
Perhaps the issue is motivation? Many wonder, "why do I need to learn this hard stuff?" If so, the approach taken by Rachel Thomas and Jeremy Howard at fast.ai seems to be a good one: build things, and then fill the theoretical holes as needed, motivated by a genuine desire to understand.
The biggest turn off about math is the way people are taught math.
Most people are taught math as if it's an infinite set of cold formulas to memorize and regurgitate. Most students in my statistics class didn't know where and when to use the formulas taught in real life; they only knew enough to pass the tests. Students who obtain As in Algebra 2 hardly know where the quadratic formula comes from (and what possibly useful algebraic manipulation could you do if you can't even rederive the quadratic formula?). It's not just math, I've been in a chemistry class where the TA was getting a masters in chemistry and yet she taught everyone in my class a formula so wrong that if interpreted meant that everytime a photon hits an atom, then an electron will be ejected with the same energy and speed as the photon. This is obviously wrong but when I pointed it out, everyone thought I was wrong because "that's not what it says in the professor's notes" (later, the professor corrected their notes). In my physics class, the people who struggled the most are the ones who tried the least to truly grasp where the formulas come from. I don't blame them, it's the way most schools teach.
> build things, and then fill the theoretical holes as needed, motivated by a genuine desire to understand.
I totally agree.
My experience with tutoring people struggling with math for the past eight years. I used to like math then I got to college where 95% of people don't understand the math they're doing and thus can't be creative with it; this includes the professors who teach math as the rote memorization of formulas. Yeah, call me arrogant, but I have found it to be true in my experience. I strongly believe the inability to rederive or truly grasp where things come from destroys the ability to be creative and leads to a lack of true understanding. But everyone believes they understood the material because they got an A on the exam. I'll stop ranting on this now.
Years later, I'm trying to relearn math, but I'm taking the exact opposite approach. No calculator, no rote memorization, just reading about the concepts and thinking about what they mean until I can do the manipulations in my head. When I do practice problems, I don't care so much about the specific numbers, but about my ability to understand what's happening to each part of an equation, what the graph looks like, etc.
The problem is that in an immature field that's still evolving, the components are not yet well-understood or well-designed, so available abstractions are all leaky. However, modern software engineering is mostly built on the ability to abstract away enormous complexity behind libraries, so that a developer who is plumbing/composing them together can ignore a lot of details . People with that background now expect similarly effective abstractions for machine learning, but the truth is that machine learning is simply NOT at that level of maturity, and might take decades to get there. It is the price you pay for the thrill of working in a nascent field doing something genuinely uncharted.
"Math in machine learning" is a bit of a red herring. We hear the same complaints about putting in effort to grok ideas in functional programming, thinking about hardware/physics details, understanding the effects of software on human systems , etc. Fundamentally, I think a lot of people have not developed the skill to fluidly move between different levels of abstraction, and a variety of approximately correct models. And to be fair, it seems like most of software engineering is basically blind to this, so one can't shift all the blame on individuals.
 Why the MIT CS curriculum moved away from Scheme towards Python -- https://www.wisdomandwonder.com/link/2110/why-mit-switched-f...
 Building software through REPL-it-till-it-works leads to implicitly ignoring important factors (such as ethics) -- https://news.ycombinator.com/item?id=16431008
Deep learning, in particular, is a trade today. If we want to be generous we can call it an "experimental science"... but my perception is that only a minority of papers in the field actually deserve that moniker.
(Speaking as a deep learning practitioner with expertise in a narrow domain.)
I can tell you at least part of it, from my subjective perspective. I tend to "think" in a very verbal fashion and I instinctively try to sub-vocalize everything I read. So when I see math, as soon as I see a symbol that I can't "say" to myself (eg, a greek letter that I don't recognize, or any other unfamiliar notation) my brain just tries to short-circuit whatever is going on, and my eyes want to glaze over and jump to the stuff that is familiar.
OTOH, with written prose, I might see a word I don't recognize, but I can usually work out how to pronounce it (at least approximately) and I can often infer the meaning (at least approximately) from context. So I can read prose even when bits of it are unfamiliar.
There's also the issue that math is so linear in terms of dependencies, and it's - in my experience - very "use it or lose it" in terms of how quickly you forget bits of it if you aren't using it on day-in / day-out basis.
It's sad that many mathematical resources do not make a careful effort of helping someone reason verbally. I guess this is partly due to the fact that most people who are skilled in the subject and write about it prefer equational reasoning (for lack of a better word) to verbal reasoning! In my experience as a physics instructor for non-STEM majors, this might be one of the biggest impediments for otherwise intelligent people trying to learn math/physics.
 What's wrong with these equations? by David Mermin -- http://home.sandiego.edu/~severn/p480w/mathprose.pdf
I don’t find it ironic, because I wouldn’t expect engineers to make good mathematicians implicitly (nor vice versa). There is some similarity between math and programming, but there is also a collossal amount of dissimilarity that makes them different things entirely.
For example, notation and terminology in mathematics is not actually rigorous. It’s highly context dependent and frequently overloaded (take the definition of “normal”, the notation of a vector versus a closure, or the notation of a sequence versus a collection of sets). As another example, consider that beyond the first few courses of undergraduate math you’re wading into a sea of abstraction which you can only reason about. There is no compiler flag to ensure your proof is correct in the general case, and you don’t have good, automatic feedback on whether or not the math works. In this sense, the entirety of mathematical knowledge is actually very much not like a formal computer language.
Beyond that, the ceiling of complexity for theoretical computer science or applied mathematics is far higher than programming. It’s not so much motivation (though that can be an issue too), it’s that learning the mathematics for certain things simply takes a vast amount of time. Meanwhile a professional programmer has to become good at things that mathematicians and scientists don’t have to care about, like version control or the idiosyncrasies of a specific language.
They're really orthogonal disciplines, for much the same reason that engineering isn't like computer science. There is a world of difference between proving the computational complexity of an algorithm and implementing an algorithm matching that complexity in the real world.
Really depends on what kind of mathematician or scientist you are to be honest though. How good is someone's data analysis of an experiment if they can't reproduce it? Or if they've got 6 different versions of an application with 100k lines of code in a single file, each labelled "code_working(1).f90", "code_not_working.f90", etc... These are real problems with what people actually do in science; software development skills are poor and people do things badly.
There're organisations like Software Carpentry globally and the Software Sustainability Institute in the UK which exist to try and promote some thought about developing software as researchers, and making the software sustainable in the long term rather than letting it die every time a PhD student leaves.
This applies in both directions most mathematicians and scientists have such poor version control and development hygiene because mathematics doesn't imbue them with any special insight about how to be an engineer.
That said, the math we're talking about (that is, the math necessary for understanding, say, the sequence of transformations that make up a convnet) lies far below the ceiling of complexity you mention.
I'm not sure symbol reuse and overloading are as much of an issue. I've run into people who are quite proficient with Perl and routinely use complicated regular expressions who say they didn't like math growing up.
Various non-intuitive concepts are handwaved, the foundations skipped over and students then start struggling because they don't understand the foundation of what they are trying to learn. Reading from the textbook is fairly useless and it ends up being used as a problem set source.
I argued to a few math professors about teaching things like calculus with the textbooks referencing concepts that were not actually taught until 5 classes later is a bad idea.
In return I got a shrug of indifference telling me that's just the status quo and the status quo is OK.
Thank god khan academy exists now.
Of course, colleges should cater to a range of preparatory educations, but the textbooks you're talking about are pitched at the correct level for somebody.
Also, for me personally, it's just such a drag to learn all the notation. After the fact, I've always thought, "Wow, that's all this means?" but while I'm learning, I feel helpless. It doesn't feel like I have any way to google it. My professors never actually want to sit down and explain it to me. All the pages of math equations always look so intimidating. It's just such a drag.
I do, however, have a talent for language.
The reason I am a good developer is because I can communicate with different machines through different programming languages in the same way I can communicate with different people through different human languages.
Here's a crazy idea that machine learning might one day help with software engineers understanding algorithms and data structures.
You write some code to traverse a list or something and do some naive sorting, or maybe you're "everyday way of doing some operations on your lists is inefficent". I want some cool machine learning where I can submit my code and it does analysis.
I think Microsoft is working on that. https://techcrunch.com/2018/05/07/microsofts-new-intellicode...
Let's take it a step further. Explain to the programmer why what their doing is wrong.
I would pay big bucks for a "machine intelligence" IDE
Linear Algebra is typically the first course in which students have to transition from predominantly rote computation to proof-based theory. Axler's Linear Algebra Done Right is very often the textbook used for that course because it (mostly ) lives up to its name. This isn't Math 55: compared to Rudin and Halmos, Axler is a very accessible introduction to linear algebra for those who are ready for linear algebra. The floor for understanding this subject doesn't doesn't get much lower than Axler (and in my opinion, it doesn't get much better at the undergraduate level either).
It's unfortunate that so many people want to skip to math they're not ready for, because there's no shame in building up to it. A lot of frustration can be eliminated by figuring out what you're actually prepared for and starting from there. If that means reviewing high school algebra then so be it; better to review "easy" material than to bounce around a dozen resources for advanced material you're not ready for.
1. See Noam Elkies' commentary on where it could improve: http://www.math.harvard.edu/~elkies/M55a.10/index.html
Elkies’s post is in the context of a course for very well prepared and motivated first-year undergraduate pure math students who are racing through the undergraduate curriculum because most of them intend to take graduate-level courses starting in their second year.
Those two audiences are very far apart.
Yes, that's precisely why I said, "This isn't Math 55: compared to Rudin and Halmos, Axler is a very accessible introduction to linear algebra for those who are ready for linear algebra."
How do you propose to teach linear algebra beyond basic matrix operations and Gaussian elimination if you're not teaching any theory? You can take some disparate tools from linear algebra (just like you can with analysis to make calculus), but The presentation of learning the mechanical tools of linear algebra versus the theory of linear algebra is a false dichotomy. Axler's textbook is a very nice compromise that provides students an understanding of why things are the way they are while still teaching them how to work through the numerical motions of things. You need not go so far as reading Finite Dimensional Vector Spaces if you want to avoid theory, but you need enough of it to put the mechanical operations in some kind of context.
Students are often entirely unfamiliar with the context (problems, structures, goals, ...) for the new abstractions that are rained down on them, and end up treating their proofs as little exercises in symbol twiddling / pattern matching, without much understanding of what they are doing.
The undergraduate curriculum is put in this position because there is a lot of material to get through in not much time, and students are generally unprepared coming in. Ideally students would have a lot of exposure to basic material and lots of concrete examples starting in middle school or before, but that’s not where we are.
The thing is, I couldn't write the damn matrices well lined up and made mistakes when doing calculations. This was really a (de)formative experience. In college Linear Algebra for econ was 40% gaussian elimination, 40% eigenvalues and 20% linear programming. I mean, I still can't do gausian elimination by hand right.
I started crawling out of it when I started seeing (in self-study) a book on linear algebra that takes the linear transform/vector space-first approach.
Calc up to 3 (you can skip some of the divergence and curl stuff)
Linear algebra (no need for Jordan change of basis)
Intermediate probability theory (MAE, MAP, conjugate priors minus the measure theory stuff)
A little bit of differential geometry (at least geodesics. This is for dimension reduction)
Discrete math (know counting and sums really well)
Learn a little bit of Physics (at least know Lagrangians and Hamiltonians)
A little bit of complex analysis (to know contour integration and fourier/laplace transforms)
Some differential equations (up to Frobenius and wave equations)
Some graph theory (my weak spot, but I have used the matrix representations a few times)
After all that, read some Kevin Murphy and Peter Norvig.
Congrats, now you can read most machine learning papers. The above will also give you the toolkit to learn things as they come up like Robbins-Monro.
OP's article is much better if you are trying to be a ML developer/practitioner. Like I said, this list might be too theory focused, but it lets me read lots of applied math papers that aren't ML focused.
Additionally, we're seeing the introduction of even more sophisticated stochastic samplers (stochastic gradient hamiltonian monte-carlo, etc) that require even more esoteric branches of math and physics to really grok. I have a strong math background but frequently find myself struggling with a lack of knowledge in statistical mechanics when trying to read papers in these areas.
So yeah - there's plenty of bullshit and exaggeration. But there's also some wicked cool stuff happening which requires very sophisticated (and specialized) knowledge to understand.
I've been working in golang, which fortunately has built-in complex128 types, so it's proved very helpful in a project!
I took a complex analysis class and did OK, but I get the feeling that EEs are the ones who really benefit from it (at least in the applied world). They seem to have some very rich analyses of linear dynamical systems using frequency domain methods.
I believe the above was used in sk-learn or PyMC3 at some point.
Like I said, the list was a bit too theory focused and not just for ML. Hope that clears things up.
Also, maybe a bit out of place, but it makes my day happier when I assume good intentions out of random tidbits posted online.
If you have a background in physics then some combination of Nakahara's 'Geometry, Topology and Physics' and Baez and Muniain's 'Gauge Fields, Knots and Gravity' might be good (I haven't included relativity textbooks as I assume it you have a background in GR then you have enough differential geometry).
An unusual recommendation that I think is really nice is 'Stochastic Models, Information Theory and Lie Groups' by Chirikjian. It covers a few other topics mentioned in this thread and is really nice. It's _extremely_ concrete and spells out a lot of calculations in great detail. Plus, the connection to engineering applications is much more obvious.
Hartle, James B., Gravity: An Introduction to Einstein's General
Schutz, Bernard, A First Course in General Relativity
The first few chapters would be all you need, but they don't include the nice things I learned from the lecture notes like how to derive the gradient, divergence, and curl in any curvilinear coordinate system by using the Christoffel symbols.
Sorry that I can't be of much more help.
In our Data Scientist Track (https://www.dataquest.io/path/data-scientist?), I specifically focused on teaching K-nearest neighbors first b/c it has minimal math but you can still teach ML concepts like cross-validation, and then I wrote Linear Algebra and Calculus courses before diving into Linear Regression.
MATH & PHYS book: https://minireference.com/static/excerpts/noBSguide_v5_previ...
LA book: https://minireference.com/static/excerpts/noBSguide2LA_previ... + free tutorial: https://minireference.com/static/tutorials/linear_algebra_in...
Maybe one of these days I'll complete it :)
I really like 3Blue1Brown for a wide range of math topics. He's just a great teacher.
Frankly, I find the UTAustin linear algebra class less than ideal or optimal, but it's free and lots of classmates, material, so...
Then, it's not difficult to understand what a manifold is, but it took me a number of attempts to get it, and then I only did when studying them formally with Spivak 1963. Now the concept of manifold seems patently obvious to me and not really needing much formalization, but...
Then things changed at university (studying computer science) and I completely lost interest. Not sure why (bad teacher, going from being best in class to being average, the math at uni different from school).
Now, much later, I regret not having followed through and miss the beauty of Math. I'm re-discovering it and wondering how I could use more of it in my work.
The article seems good overall, but I only skimmed the rest after seeing a citation of a 5-year-old Atlantic article describing disputed and at minimum highly exaggerated findings presented as 'shown in recent studies'.
This is also really good for connecting LA concepts with visuals http://immersivemath.com/ila/index.html
If you don’t care about accreditation and are patient, sit down with Axler’s Linear Algebra Done Right and Hoffman & Kunze’s Linear Algebra, in that order.
I would caution you against trying to learn linear algebra using a “take what you need” approach. A random walk approach to learning the material is faster than an accumulation approach, but it’s more brittle and prone to confusion. A lot of things which appear to be irrelevant or unnecessary for machine learning (computation or research) can be imperative for understanding or implementing much more complex concepts later on.
He's an amazing teacher and conveys a lot of intuition + makes even complicated ideas look straightforward.
I just found a quick explanation by Terence Tao about why people are generally loose in this case, meaning that some properties transition nicely from smooth (here, differentiable) top the rough categories by passing to the limit and density arguments: http://www.math.ucla.edu/~tao/preprints/distribution.pdf
Of course there are exceptions.
Foundations Machine Learning (bloomberg.github.io)
There machine learning (ML) is basically a lot of empirical curve fitting. The context is usually with a lot of data, thousands of variables, millions or billions of data points, observations, pairs of values of thousands of independent variables and the value of the corresponding dependent variable. The work is all a larger, more data, version of: You have a high school style X-Y coordinate system and some points plotted there. So, you want to find values for coefficients a and b so the line
y = ax + b
fits the points as well as possible. But, you can do variations, try to fit, say,
log(y) = a sin(x) + b
Or replace log or sin with any functions you want and try again.
The logic, rational support, is essentially as follows: So, take, say, 1000 x-y pairs. Partition these into 500 training data and 500 test data. Find the best fit you can, using whatever fits, to the training data. Then take the equation and see how well it fits the test data. If the fit of the test data is also good, then that is your model.
Now you want to apply the model in practice, apply the model to data did not see in the given 1000 points. So for the application, will be given a value of x, plug it into the equation, and get the corresponding value of y. That's what you want -- maybe the value of y gives you Y|N for ad targeting, Y|N cancer, what MSFT will be selling for next month, what the revenue will be for next year, etc.
The rational, logical justification here is an assumption (which should have some justification from somewhere) that the x you are given and the y you want for that value of x is sufficiently like the x-y values you had in the original 1000 points.
Okay. Empirical curve fitting to a lot of data to make a predictive model, that is found with training data, tested with test data, and applied where the given data in the application is like the data used in the fitting.
The OP mentions that some people believe that to make progress to real machine intelligence, need more math than what I outlined.
My guess is that to make that intended progress, for all but some tiny niche cases, first need some much more powerful and quite different ideas, techniques, etc. than in the curve fitting ML I outlined.
Yes, there is a chance that with lots of data from working brains and lots of such empirical fitting we will be able to find some fits that will uncover some of the workings of the brain crucial for real intelligence. Uh, that's a definite maybe!
But there is a lot more to what can be done to build predictive models than such curve fitting, empirical or otherwise. I outlined some such in the thread that I referenced above.
So, for the question in the OP, what math? Well, if want to pursue directions other than the empirical curve fitting in the Bloomberg course I referenced above, my experience is -- quite a lot. For the education, start with a good undergraduate major in pure math. So, cover the usual topics, calculus, abstract algebra, linear algebra, differential equations, advanced calculus, probability, statistics. Then continue with more in algebra, analysis, and geometry.