> As soft prerequisites, we assume basic comfortability with linear algebra/matrix calc [...]
That's a bit of an understatement. I think anyone interested in learning ML should invest the time needed to deeply understand Linear Algebra: vectors, linear transformations, representations, vector spaces, matrix methods, etc. Linear algebra knowledge and intuition is key to all things ML, probably even more important than calculus.
Book plug: I wrote the "No Bullshit Guide to Linear Algebra" which is a compact little brick that reviews high school math (for anyone who is "rusty" on the basics), covers all the standard LA topics, and also introduces dozens of applications. Check the extended preview here https://minireference.com/static/excerpts/noBSguide2LA_previ... and the amazon reviews https://www.amazon.com/dp/0992001021/noBSLA#customerReviews
To play devil's advocate, (EDIT: an intuitive understanding of) probabilistic reasoning (probability theory, stochastic processes, Bayesian reasoning, graphical models, variational inference) might be equally if not more important.
The emphasis on linear algebra is an artifact of a certain computational mindset (and currently available hardware), and the recent breakthroughs with deep neural networks (tremendously exciting, but modest success, in the larger scheme of what we wish to accomplish with machine learning). Ideas from probabilistic reasoning might well be the blind spot that's holding back progress.
Further, for a lot of people doing "data science" (and not using neural networks out the wazoo) I think that they can abstract away several linear algebra based implementation details if they understand the probabilistic motivations -- which hints at the tremendous potential for the nascent area of "probabilistic programming".
And of course, you're not going to get very far with probability theory and stochastic processes unless you have a mature understanding of analysis and measure theory :)
This comment exchange neatly demonstrates the intrinsic problem. Most of these articles start off much like this one does: by assuming "basic comfortability with linear algebra." That sounds straightforward, but most software engineers don't have it. They haven't needed it, so they haven't retained it even if they learned it in college. It takes a good student a semester in a classroom to achieve that "comfortability", and for most it doesn't come until a second course or after revisiting the material.
If you don't already have it, you can't just use StackExchange to fill in the blanks. The random walk method to learning math doesn't really pan out for advanced material because it all builds on prior definitions. Then people like you make a comment to point out (correctly) that probability theory is just as important for all the machine learning that isn't just numerical optimization. But unless you want to restrict yourself to basic statistics and discrete probability, you're going to have a bad time working on probability without analysis. And analysis is going to a pain without calculus, and so on and so forth.
There are certain things you need to spend a lot of time learning. Engineering and mathematics are both like that. But I think many of these articles do a disservice by implying that you can cut down on the learning time for the math if you have engineering experience. That's really not the case. If you're working in machine learning and you need to know linear algebra (i.e. you can't just let the underlying library handle that for you), you can't just pick and choose what you need. You need to have a robust understanding of the material. There isn't a royal road.
I think it's really great people like the author (who is presumably also the submitter) want to write these kinds of introductions. But at the same time, the author is a research assistant in the Stanford AI Lab. I think it's fair to say he may not have a firm awareness of how far most software engineers are from the prerequisites he outlined. And by extension, I don't think most people know what "comfortability with linear algebra" means if they don't already have it. It's very hard to enumerate your unknown unknowns in this territory.
There's always more you might want to learn, but when people talk about these basics, it's really just being super focused in 4 or so classes, not a whole ivy league undergrad curriculum in math.
probability & stats, multivariable calculus, and linear algebra will take you a long way.
True for me. I knew all of these from my course work when I graduated with my CS degree in 1996. I haven't used them at all in my career, and so I'd be starting basically from scratch re-learning them.
"comfort" is a perfectly cromulent word for this.
For intuition, particularly if you care about vision applications, I think one field of math which is severely underrated by the community is group theory. Trying to understand methods which largely proceed by divining structure without first trying to understand symmetry has to be a challenge.
I'm biased; my training was as a mineralogist and crystallographer! But the serious point here is that much of the value of math is as a source of intuition and useful metaphor. Facility with notation is pretty secondary.
Repeating structures have symmetries, so seeing the symmetries in your diffraction pattern inform you of the possible symmetries (and hence possible arrangements) in your crystal. Group theory is the study of symmetry.
By the way, this is also how the structure of DNA was inferred , although not from a crystal.
Great answer, thank you :-) Saved me a bunch of typing to explain it less well than you just did.
It's worth adding, for this crowd, that another way of thinking about the "other rules" you allude to is as a system of constraints; you can then set this up as an optimization problem (find the set of atomic positions minimizing reconstruction error under the set of symmetry constraints implied by the space group – so that means that solving crystal structures and machine learning are functionally isomorphic problems.
I know harmonic analysis in general combines the two, but I'm sure Crick and Watson could have done their work without knowing the definition of a group.
Crick was absolutely certainly familiar with the crystallographic space groups; he was the student of Lawrence Bragg (https://en.wikipedia.org/wiki/Lawrence_Bragg), who is the youngest ever Nobel laureate in physics – winning it with his father for more or less inventing X-ray crystallography. It's mostly 19th-century mathematics, after all.
A simple example is with linear regression: find w such that the squared l2 norm of (Xw - y) is minimized.
Linear algebra will help with generalizing to n data points; and calculus will help with taking the gradient and setting equal to 0.
Probability will help with understanding why the squared l2 norm is an appropriate cost function; we assumed y = Xw + z, where z is Gaussian, and tried to maximize the likelihood of seeing y given x.
I’m sure there’s more examples of this duality since linear regression is one of the more basic topics in ML.
I'd love to get into ML but the math keeps me at bay.
Hell, I have a CS degree, and my Maths knowledge is horrific. I didn't take Maths at A Level, so I stopped learning any Maths at 16. The course had Maths, but it's surprisingly easy to brute force a solution, and that was a good ten years ago now.
My goal for years has been to learn enough Maths to be able to read Introduction to Algorithms and TAOCP without aid, and recently to be able to better understand ML, but the more I try the more I realise that it's a multi-year investment.
What's a linear transformation? I get that it's f(x + y) = f(x) + f(y) and f(cx) = c * f(x)... but what does that really mean?
Why is the dot product equivalent to ||a||*||b|| cos C? I really have no idea, I just know the formulas.
That's a really good question!
Essentially, when some function (synonym for transformation) is linear, what this tells you is that it has "linear structure," which in turns out to be a very useful property to know about that function.
You can combine the two facts you mentioned above to obtain the equation
f(a*x + b*y) = a*f(x) + b*f(y)
Suppose now that the input space of f can be characterized by some finite set of "directions" e.g. the x- and y-directions in case f is a 2D transformation, or perhaps the x-, y-, and z-directions if f is a 3D transformation. If f is a 3D transformation, using the linear property of f, it is possible to completely understand what f does by "probing" it with three "test inputs," one along each direction. Just input x, y, and z, and record the three output f(x), f(y), and f(z). Since you know f is linear, this probing with three vectors is enough to determine the output of f for any other input ax + by + cz --- the output will be af(x) + bf(y) + cf(z).
See the same explanations as above but in more details here: https://minireference.com/static/excerpts/noBSguide2LA_previ...
So why is this important? Well this "probing with a few input directions" turns out to be really useful. Basically, if f is non-linear, it means it super complicated and would be no simple way to describe what its outputs are for different inputs, but if it is linear then the "probing procedure" works. Furthermore, since both the inputs and outputs have the form of a linear combination (a constant times something + another constant times another thing + a third constant times a third thing), you can arrange these "things" into an array called a matrix and define an operation called "matrix multiplication" which performs the constant-times-something operation of outputs, when you give the constants as an array (vector) of inputs.
In summary, linear algebra is a bunch of machinery for expressing various transformations in terms of vectors and matrices in order to help with modelling various real-world phenomena ranging from computer graphics, biology, chemistry, graphs, crypto, etc. Even ML ;)