This is not a very good list. Several items are between "misleading" and "wrong":
1. "Pipe" should be called "bar" (IMO)
2. "Vector norm" is a much deeper idea than just "2-norm" and if you limit yourself to the 2-norm, you will run into problems
3. "Set membership" is not called "epsilon"!
4. I wouldn't describe functions as operating on "pools", but that's just me. ("Pools" seems to imply vector arguments, or at least set arguments, which is not how functions are usually thought of operating, though they are defined that way.)
5. The description of R^2 as "2-D array" is wrong.
6. I've never seen that notation for elementwise multiplication, though that doesn't mean much.
7. I've rarely seen that notation for dot products. Usually the center dot is much more common.
8. Hat is... nonstandard. It has many, many meanings.
Good points. I know you know, but it's also worth mentioning that most math symbols don't have a definite meaning. Meaning is defined in the context it is presented, in whatever way the author feels makes the most sense. Math is a natural, not a formal language.
That's a little complicated as math is intertwined with science. But at the core I only see them as a difference in subject matter. Math is a philosophical study of space, logical structure, quantity, etc.
Proofs are just tools for articulating arguments about these subjects. Deduction just tends to work better in math, so the methods are more refined and agreed upon.
In "proper" mathematics, the symbols are defined precisely in their context. This does not mean that the symbols' meanings are universal. I think this is what GP meant.
The lack of universality of meanings of symbols is a feature, not a bug, of math notation.
The set membership symbol is called a lunate epsilon. As I half remember, someone used the Greek version of the first letter of the Latin word "est," meaning "is."
It is faulty to think of the math notation and code being equivalent. The math states something much more general, and often much deeper, than some mundane for-loop. I’d have titled this “Python Recipes to Calculate Values of Simple Vector Math Expressions”.
Often times the symbols may not even be concrete numbers. Often the “real” number may not even be representable on a computer. Often a symbol does represent a concrete number but indirectly defined by some set of rules (e.g., x is the smallest eigenvalue of M; or y is the greatest value less than the supremum of f) that may require a whole library of code to calculate.
The mathematical notation is “meant” to be flexible and manipulated, not to be interpreted (necessarily) as a rigid computational process. It should also be noted that while some notation comes from convention, a lot of it is improvised and contextual!
I claim that even in the math of machine learning, many of the symbols presented do not mechanically translate as such. Other commenters have discussed differing notational conventions, but even to assume that x_i is indexing some real numbers is memory is often too much.
I’m not saying there isn’t any value showing somebody how an elementary summation corresponds to a for-loop. I am saying it’s a lot more than that, though.
> The math states something much more general, and often much deeper, than some mundane for-loop
Eh? Code can be just as generalized and "deep". The code may (probably) not be usefully executable, but it could describe any concept. At that point it's just "math" again, albeit with different format and symbols.
So I guess your point stands, but its a very thin distinction of common usage.
Code can be studied, reasoned about, interpreted with some semantics, and often even turned into some manner of equations. Heck, Standard ML is even defined by a bunch of equations. But very rarely is code written for the same reason a mathematical expression is written. I agree code can be as you say, but most often isn't.
Code, at the end of the day, more often than not, is written to express some logic to be executed by a computer. At least 80%, if not more than 98%, of code is written to express imperative commands to a machine. I say this as a fan of abstraction, Lisp, Prolog, etc. I just don't buy that
s = 0
for x in a:
s += x
return s
is interpreted with the same generality and depth as
s := \sum_{x\in a} x,
the latter of which is seen in and of itself more as an "object" which may or may not be used to represent a computation that the code above indicates.
Yes. Math notation is independent of time and space. Code isn't. Math is a pineapple upside down cake. Code expresses one (of many!) recipes to produce one.
The danger here is if you start equating math to code and you write an algorithm in, say, an imperative language on a uniprocessor, you can become blind to declarative/parallel solutions. The math is the Platonic ideal of the thing in question, and it doesn't change. Always keep the math in mind and don't get lost in the code.
Yeah, but those deep general properties don’t matter to someone who can’t even evaluate the expression.
I’ve encountered these math explanations through code before, and I think there a fantastic way to answer the question “what does this notation do?”
I am currently taking an algorithms class and often ad hoc notation is introduced and explained through a code analogy. I think it’s a great way to help someone understand unfamiliar math notation quickly.
I do wonder, at there people writing ML programs in Python without learning basic math notation?
It's hard to imagine someone learning important fundamt math concepts without using math notation, so anyone who benefits from this article should probably take a detour through a math book before continuing their Python ML, or else risk having beautiful code that implements nonsensical math.
>so anyone who benefits from this article should probably take a detour through a math book before continuing their Python ML, or else risk having beautiful code that implements nonsensical math.
That seems akin to suggesting someone should learn music theory before ever attempting to pluck a guitar string. I personally learn best by putting a subject to practice and figuring out the nuances by experimenting. No one expects beginners in any subject to be capable of producing quality work, whether programming or learning to cook.
That's a very bad analogy. People can become great musicians without ever learning music theory, since being a "musician" is such a wide category. You can be an awesome singer and not know what a diminished seventh is. I learned how to play Moonlight Sonata, 3rd Movement in high school [0] (not that that makes me good, just low-intermediate at best). I know practically nothing about music theory except how to read sheet music.
On the other hand, you can never become a great, or good, or even mediocre ML engineer without understanding the math. If anything, in a production business environment, if you are responsible for creating the model yourself, you'd just be doing more harm than good. Yes, you'd be able to use pre-baked frameworks to run data through an algorithm or model that you have no idea of what it's doing or why and get some numbers and graphs out, but you won't be doing actual science or modeling. You might know how to use Python to do something as simple as run a linear regression, which if you don't know the math, you'd just have a vague understanding of it "finding the best fit line", whatever that means. However, you wouldn't understand that (under the L^2 norm) it's minimizing sum of squared errors, the properties of that (BLUE [1]), whether or not you'd want to apply regularization [2] or not, statistical tests of whether or not a linear fit is even applicable, the importance of outliers due to their overweighting under the L^2 norm, etc.
If I were responsible for a modeling or prediction project, I would never trust it to a software engineer that didn't understand the math of what's actually going on.
> so anyone who benefits from this article should probably take a detour through a math book before continuing their Python ML, or else risk having beautiful code that implements nonsensical math.
I couldn't agree with the GP more. That is 100% spot on.
>That's a very bad analogy. People can become great musicians without ever learning music theory, since being a "musician" is such a wide category
I disagree, you're equating being good at something to being a world class expert. You can become a great python developer without ever knowing how the language implements a dictionary. You can't expect to remove the black box aspect from everything you use, that's just impossible.
We're not talking about being a great Python developer. We're talking about being a great applied ML practitioner. You simply cannot be good at that without having a firm grasp of the underlying math.
This was about how the content is learned, not what content is learned.
>so anyone who benefits from this article should probably take a detour through a math book before continuing their Python ML, or else risk having beautiful code that implements nonsensical math.
My response to that comment was you can learn both at the same time. I preferred to learn music theory while also learning how to play the guitar. I also preferred to learn mathematics subjects while applying them, such as machine learning.
I found that writing a vector math library around the same time I was learning the math to be immensely helpful.
Given that I look at Python everyday and only consult math books when I need to learn a new algorithm, seeing the basic symbols translated into my everyday language is pretty helpful.
Math = good.
Code = good.
Example: given arrays A,B,C,D,E compute ABCDE.
Most mathy people just multiply left-to-right, while a person with some CS grounding will use dynamic programming [1]. If you are working with Dask-sized arrays, this is important.
I don't see why knowing the math notation is important at all. The computer doesn't understand the math notation, it only understands code. The math notation is esoteric and overrated, if you understand what the code does then there is no need to understand a notation that is entirely useless inside of the code.
The article explains \hat{} as meaning a vector of unit length but in my experience in ML it nearly always designates an estimate (including in the first expression in this article).
Lol yeah. You are probably best off thinking of x-hat as a completely different letter to x, where x-hat is useful because it looks like x, and so you can talk about an object x-hat that has some relation to the object x without losing track of which objects you are talking about.
x-dot and x-tilde are also in the same family and I'm sure there are lots of other symbols that people have put on top of x too.
It’s interesting where the “technical limitations” set in on computers. We can’t have symbolic math accurately represented because “how would you even do that with keyboard input, and how would you encode it?!” Yet we have poop emojis because that’s an absolute essential part of written communication these days.
I'm delighted. I've programmed since my teens but I never really enjoyed math and have mostly picked up what I need when I really need to. I think more in the ways of programming than in math, so this is a good way of helping me with the syntax.
People come here from many angles. Mine was largely free of math. I gather yours wasn't.
I upvoted this because I feel the same way, however I have a rebuttal.
There are plenty of articles on learning hiw to learn but fewer on the process of learning what to learn (plenty of lists of “what you should know” though). If you’re just past the very beginning of learning something new, being exposed to stuff quite a bit outside your competency set can be valuable — if you can mostly grasp something then the stuff you can is pointers to expand your knowledge. While a fully advanced treatise is likely gibberish.
I also come to HN for the more technical links but often read the comments even on basic articles like this as they may have interesting pointers.
Also: I’ve been writing systems code for 40 years: bring up an os on bare hardware, write small real-time kernels, build high performance distributed systems, work on compilers, etc. I could probably write a (crappy) algorithms or datastructures textbook coz yes, I use a lot of that stuff. But quickly write a secure, reactive web front end for a simple CRUD app that let you browse a catalog? I would struggle. So one person’s obvious is another’s “no clue”.
I see many comments criticizing the post for only implementing simple mathematics. This is true.
However, the approach of understanding math through code is still very helpful, I think. Personally, implementing things that are fuzzy mathematically provides immense clarity once I write them down.
For example, a simple monty hall simulator[1]. Or implementing matrix multiplication multiple ways to understand why each is equivalent[2], and why multiplying A(BC) can sometimes be faster than (AB)C[3].
I am not sure why this helps me. It may be because I was "raised" as a coder, and so that is how my brain works. But I also think that implementing something in code is very close to constructivist mathematics, in spirit. You cannot prove anything if you cannot construct (/implement) it.
"Structure and Interpretation of Classical Mechanics" by Gerald Jay Sussman and Jack Wisdom
> There has been a remarkable revival of interest in classical mechanics in recent years. We now know that there is much more to classical mechanics than previously suspected. The behavior of classical systems is surprisingly rich; derivation of the equations of motion, the focus of traditional presentations of mechanics, is just the beginning. Classical systems display a complicated array of phenomena such as nonlinear resonances, chaotic behavior, and transitions to chaos.
> Traditional treatments of mechanics concentrate most of their effort on the extremely small class of symbolically tractable dynamical systems. We concentrate on developing general methods for studying the behavior of systems, whether or not they have a symbolic solution. Typical systems exhibit behavior that is qualitatively different from the solvable systems and surprisingly complicated. We focus on the phenomena of motion, and we make extensive use of computer simulation to explore this motion.
They are basically making the computer do the work, with emphasis on unambiguous, computable notation.
Function domains and ranges can be specified and checked at compile-time with type annotations or at runtime with type()/isinstance() or with something like pycontracts or icontracts for checking preconditions and postconditions.
Seeing `for i in range(len(lst)): lst[i]...` gives me hives. That's cool if you're wanting to be super explicit about how the indexing works, but in this page it goes on to say "or you can just write sum(lst)" without worrying about the indexing.
I would write the explanations like:
result = 1
x = [1, 2, 3, 4, 5]
for number in x:
result = result * number
print(result)
which in my opinion is much closer to the way a mathematician would think about the process.
Maths notation can be wonderfully concise and precise, so it worth thinking about following it closely when programming. One of my favorite examples of this is the numpy `einsum` call [1]. It implements Einstein summation convention [2] - thereby making working with the many dimensions of high-rank tensors feasible.
Solid post. I think someone who's just starting out with ML or even basic mathematically notations would find this very helpful. Especially someone knows programming, but struggles with mathematically jargon
For arrays, the math uses ordinal convention (1-based) but the Python uses offset convention (0-based). That's OK, but the article should explicitly mention that to avoid confusion.
Shameless plug, but I worked for some time on "Thinking in Tensors, Writing in PyTorch" pretty much focusing on turning LaTeX math into executable code.
E.g. Gradient Descent: formula, code, and visualization:
I think it is confusing not to address that math notation often (including the one used here) uses 1-based indexing, while Python uses 0-based indexing.
Given that FORTRAN was designed as basically rote translations from discrete math notation, it is deeply ironic that we now need to explain things in the opposite direction.
1. "Pipe" should be called "bar" (IMO)
2. "Vector norm" is a much deeper idea than just "2-norm" and if you limit yourself to the 2-norm, you will run into problems
3. "Set membership" is not called "epsilon"!
4. I wouldn't describe functions as operating on "pools", but that's just me. ("Pools" seems to imply vector arguments, or at least set arguments, which is not how functions are usually thought of operating, though they are defined that way.)
5. The description of R^2 as "2-D array" is wrong.
6. I've never seen that notation for elementwise multiplication, though that doesn't mean much.
7. I've rarely seen that notation for dot products. Usually the center dot is much more common.
8. Hat is... nonstandard. It has many, many meanings.