
Confession as an AI researcher; seeking advice - subroutine
https://www.reddit.com/r/MachineLearning/comments/73n9pm/d_confession_as_an_ai_researcher_seeking_advice/
======
stochastic_monk
Being a grad student, this person is at a perfect place to build up their math
background. Any school almost certainly offers the following:

1\. Convex Optimization -- not all problems are convex, but solutions for
nonconvex problems end up primarily using convex methods with slight
adaptations.

2\. Stochastic Optimization -- ML is pretty much all stochastic optimization.
No surprise there.

3\. Statistical/Theoretical Machine Learning -- courses built around
concentration bounds, PAC learnability, and the Valiant/Vapnik school of
thought. This gives you what you need to talk about generalizability and
sample complexity.

4\. Numerical Linear Algebra -- being smart about linear algebra is most of
efficient machine learning. Knowing which kinds of factorizations help you
solve problems efficiently. Can you do a Gram-Schmidt factorization? Cholesky
decomposition? LU factorization? When do these things fail? When do you
benefit from sparse representations?

5\. Graphical Models -- Markov chains, Markov fields, causal relationships,
HMMs, factor graphs, forward-backward algorithm, sum-product algorithms.

If you're in school, _take advantage of the fact that you 're in school_.

Once you have a grasp on these things (and you'll have to catch up on real
analysis, matrix calculus, and a few other fields of math), you'll be able to
start reasoning about ways to improve existing methods or come up with your
own. I think a lot of it is just developing mathematical maturity to give you
a vocabulary to think about things with.

~~~
hackernewsacct
What courses (starting from pre-calculus) should one take to do what you
listed above? I want to match your recommendations to course titles starting
with pre-calculus. List book recommendations as well if you would. Thanks!

~~~
nextos
I guess baby Rudin (or Hubbard & Hubbard for something simpler) in the
analysis department; and Halmos (or Axler) in the linear algebra department.

This is, essentially, Math 55. All 4 books have been used at different stages
in this famous course.

~~~
jhanschoo
Halmos seems to discuss the same things as Hoffman and Kunze, which is the
more “standard” and recommended book. Nevertheless after these you will still
have to read up on multilinear algebra (tensors and determinant-like
functions) as well as stuff on the numerical side of linear algebra.

------
ivan_ah
Learning all the advanced math within a few years is a hopeless endeavour. It
would take decades of hard work because there is so much out there, and we
need to know all of it if you want to make progress (e.g. pull out a fancy-
name theorem out of nowhere to solve some practical problem).

I find a better approach is to focus on a few basic ideas you need
_specifically_ for your work and digging deep in there. Nobody can be expert-
level in everything, but you can be expert level in your specific domain of
research.

Also for ML stuff, it's hard to overemphasize the importance of understanding
linear algebra really well. Here is a excerpt of a book I wrote on LA which
should get you started on learning this fascinating swiss-army-knife-like
subject:
[https://minireference.com/static/excerpts/noBSguide2LA_previ...](https://minireference.com/static/excerpts/noBSguide2LA_preview.pdf)

~~~
vladislav
"Learning all the advanced math within a few years is a hopeless endeavour."

Not in my experience. It's possible to get the equivalent of a bachelors and
masters in math within two years (which is enough to overcome the issues
listed in the post), but it's all you'll be doing for that period of time.
Well worth it imo.

~~~
nolemurs
> Not in my experience. It's possible to get the equivalent of a bachelors and
> masters in math within two year

Seriously.

In fact, _most_ people who learn this math learn it in a period of 2-3 years.
Far from being impossible, learning this math in a few years is normal. It's
not even a full time job. Most people learn all this math while also doing
other classes and school stuff. Even a very dedicated math major probably only
spends 20-25 hours a week actually studying math. I'm not sure much more than
that is sustainable for most people anyway.

Now, I'll grant, this is going to be a lot harder to do without the structure
of well thought out syllabi and lectures, but it's certainly manageable.

~~~
gsylvie
If you're already a grad student you can usually take any undergrad course at
your institution for free. It will slow down your progress on your graduate
degree, but you might as well do it right if it's what you really want.

------
empath75
You don’t need to understand everything in every field and I doubt very many
people do.

When Einstein was working on general relativity, he had a lot of help from
friends and colleagues who pointed him towards the math he needed. He didn’t
learn differential geometry until he was already deep into general relativity.

Find a level of abstraction that you’re comfortable with and learn to be okay
with black boxes at the lower level, and only dig into those boxes when what’s
inside them actually matters.

~~~
joshvm
I think an important lesson for any grad student is to learn to read through
the bullshit in papers and try and understand what the authors actually did.

It helps a lot that in CS you can often see the code that the authors
published along with the paper. Just staring at formulae doesn't mean much,
because for all you know the author just hammed up the equations to get their
paper into a top conference. That's not to say that the equations are
excessive, or the authors are being misleading, but I think there is
definitely an expectation in some fields that putting equations in makes your
paper look clever even if they're broadly unecessary.

It's also wildly different depending on the field. If you look at variational
methods in computer vision, images are [continuous] mappings from some domain
onto the reals (I : Ω->R3 for colour). Does that change the fact that an image
in memory is just a bunch of numbers in a grid? Not really, but it's bloody
confusing the first time you see it.

This doesn't help with understanding the maths, but at some point you have to
give up and say "This guy proved it, and someone else peer reviewed it, so I
can use it to solve my problem". It's perfectly OK to stand on other people's
work and still make creative contributions to your field, that's the point of
research.

~~~
mljoe
>I think an important lesson for any grad student is to learn to read through
the bullshit in papers and try and understand what the authors actually did.

We actively work to make our writing hard to understand in this field. I do
this all the time myself. I don't really need this complex looking equation to
make my point. But if I don't have it in there a reviewer will think my
writing is not academic enough. So there you have it. Once you go in realizing
this is the case everywhere, it becomes a lot easier to understand academic
papers.

~~~
asadlionpk
I get your point but I wish this wasn't the case with most research. I, like
the author, am not a math guy but have been reading tons of ML papers
recently. I usually skip the formal definition parts and get to the 'juicy'
implementation parts.

I wish there was a ELI5 section in each paper.

~~~
skiman10
What have been your favorite papers so far?

~~~
asadlionpk
That's hard as I haven't read too many. The recent deepmind papers (the ones
about imagination) were good. The papers were pretty standard but they came
along with explanatory blogpost[1] and some videos covered them too[2][3][4].
This supplementary content is what made them accessible for me.

[1] [https://deepmind.com/blog/agents-imagine-and-
plan/](https://deepmind.com/blog/agents-imagine-and-plan/) [2]
[https://www.youtube.com/watch?v=xp-
YOPcjkFw](https://www.youtube.com/watch?v=xp-YOPcjkFw) [3]
[https://www.youtube.com/watch?v=agXIYMCICcc](https://www.youtube.com/watch?v=agXIYMCICcc)
[4]
[https://www.youtube.com/watch?v=56GW1IlWgMg](https://www.youtube.com/watch?v=56GW1IlWgMg)

------
afpx
How does one expect to do ‘AI research’ without much of a background in math?
Machine learning is pretty much all math.

Researchers are generally expected to be experts in their field. The people
writing the papers on arxiv likely spent most of their lives learning about
machine learning and mathematics.

Unfortunately, there’s not an easy path to become an expert. One just has to
dig in and learn from the ground up.

Edit: The good news is that it’s never been as easy to learn math than it is
now. When I was an undergrad in math, there were almost no resources available
to learn the intuitions behind the math. One just has to keep doing proofs and
exercises over and hope that it would ‘click’ at some point. But, sometimes
that wouldn’t happen until many years later. Nowadays, one can watch YouTube
videos where experts describe the intuition behind the math. It’s awesome.

~~~
joshvm
Absolutely you can do 'AI Research' without a degree in maths. Sure you need a
grounding in linear algebra, stats, probability and calculus, but not much
more than a CS or physics degree will teach you. That stuff is indeed learned
easily, and it's not what the Reddit user is worried about.

That also ignores applications of machine learning, which is also a massive
(and lucrative) field. But because it's a trendy field, I think there is an
obsession with people _needing_ to understand everything theoretical that
comes out for fear of missing the boat.

Some of the really interesting papers that have come out over the last few
years - for example artistic style transfer and Faster R-CNN - have hardly any
maths in. You can count the equations on one hand in both those papers. No
doubt the authors know their stuff, but how readable are those papers compared
to e.g. a 100-page proof? Which did I learn more from?

They're a combination of two things: intuitive network architecture and a
clever loss function. The first thing is a combination of intuition and
programming, the second involves a little maths, but nothing outrageous.

[1] [https://arxiv.org/abs/1508.06576](https://arxiv.org/abs/1508.06576) [2]
[https://arxiv.org/abs/1506.01497](https://arxiv.org/abs/1506.01497)

~~~
afpx
You're right - I see now when he says 'AI Research' he really means doing
applied ML. And, that can certainly be done without knowing everything.

If the goal is to make a lot of money doing applied ML, then become a
consultant and aim to know 10% more than the customers. If the goal is to
create models that are relatively effective, then read tutorials, play with
data, experiment, and iterate. But, if the goal is to create very effective
models and be able to actually explain why they work (which I think is what
many companies want), then one has to understand the math.

That is, showing some interesting relationships, trends, predictions, or
inferences on a data analysis portal or consumer web site is one thing. But,
using ML to dispense medication, regulate a medical device, drive a power
plant, or identify criminal suspects - those may require different skills.

(BTW I don't mean to disparage the middle group, as that's largely what I do.
But, luckily I have people in the latter group who can validate what I'm
doing.)

------
pierre_d528
"“I understood nothing, but it was really fascinating,” he said. So Scholze
worked backward, figuring out what he needed to learn to make sense of the
proof. “To this day, that’s to a large extent how I learn,” he said. “I never
really learned the basic things like linear algebra, actually — I only
assimilated it through learning some other stuff.”"

[https://www.quantamagazine.org/peter-scholze-and-the-
future-...](https://www.quantamagazine.org/peter-scholze-and-the-future-of-
arithmetic-geometry-20160628)

~~~
tzs
I've long wanted a series of interactive math ebooks that work that way. Each
would take one interesting theorem, such as the prime number theorem, and work
backward.

When you start the book, it would give the theorem and proof at a level that
would be used in a research journal. For each step of the proof, you would
have two options for getting more detail.

The first option would be at the same level, but less terse. E.g., if the
proof said something like "A implies B", asking for more detail might change
that to "A implies B by the Soandso theorem". Asking for more detail there
might elaborate on how you use the Soandso theorem with A".

The second expansion options gives you the background to understand what is
going on. In the above example, doing this kind of expansion on the Soandso
theorem would explain that theorem and how to prove it.

Both types of expansion can be applied to the results of either type of
expansion. In particular, you can use the second type to go all the way down
to high school mathematics.

If you started with just high school math, and used one of these books, you
would get the basics...but only those parts of the basics you need to
understand the starting theorem.

Pick a different starting theorem, and you get a different subset of the
basics. It should be possible to pick a set of theorems to treat this way that
together end up covering most of the basics.

That might be a more engaging way to teach mathematics, because you are always
working directly toward some interesting theorem.

~~~
Blazespinnaker
Yes, you and absolutely everyone else in the world that loves math, didn't
have time to get a phd and isn't elitist wants this.

Sadly, the monetization of this is tricky. Probably has to be an open source
effort. Need some visionary like wales or khan, but they are very very rare.

------
indescions_2017
I think this is fairly common first-year grad student emotional response. And
quite frankly, it is the job of your mentor and department to ensure you
receive sufficient training for an academic or research career.

Modern AI is evolving rapidly but there is a foundation upon which everyone
draws upon. The Sutton and Barto book is one such foundational text.

Find a collaborator in the Math department to work with. And participate daily
in stack overflow forums for math and stats, such as Cross Validated.

I can also recommend CASI by Efron and Hastie. Deep historical understanding
of where we are today in probabilistic inference.

[https://web.stanford.edu/%7Ehastie/CASI/](https://web.stanford.edu/%7Ehastie/CASI/)

------
vermontdevil
The explanation by TillWinter about systematic task assessment is quite
impressive

[https://www.reddit.com/r/MachineLearning/comments/73n9pm/com...](https://www.reddit.com/r/MachineLearning/comments/73n9pm/comment/dnrsmh9)

------
jokoon
I always had the same feeling. I'm not bad at math in general (I'm not well
versed in high level maths either), but as a developer, trying to jump in the
ML field seems really impossible. One would think that he could teach himself
ML algorithms, but you ALWAYS end up reading math notation instead of pseudo
code.

To be honest the 3blue1brown videos seems really wonderful at explaining what
is going on without going too deep, as the math in ML lecture seems to be
trying to prove everything, and is always trying to teach using math notation
all the time.

I guess this is happening because most of ML is mostly coming from research
since it's all new, so it's being taught mostly by people who can grok the
math, meaning mathematicians, it's not taught by people who are programmers.
This really shows how much math should keep being math, and not leak into
fields where practice matters more. Programming languages and pseudo code are
not for nothing. Computers don't talk math.

So as years go by, ML will be taught more as a practice subject rather than a
theory one, and things will get better. I think it's just a matter of how it's
being taught, because reading code will always make more sense than reading
high level math. Videos and oral explanations also help a lot.

~~~
DLTarasi
You might appreciate the Deep Learning for Coders courses from Fast.ai. It's
basically ML as a practice subject rather than theory as you suggested.

I felt similar to you when I first started learning ML but their code first
approach really helped it click for me on an intuitive level. Then you can go
back and dig into the maths behind it.

------
ssanders82
Currently getting my Masters in AI. I'll be honest, I can understand the
concepts when presented to me, but the mathematical proofs are beyond me. I've
learned to be ok with that. There's just not enough time in a 2 year program
to teach myself the underlying vagaries of everything I encounter.

------
punnerud
Sound like a need for a “imposters handbook” for Mathematics, just like there
is for Computer Science: [https://bigmachine.io/products/the-imposters-
handbook](https://bigmachine.io/products/the-imposters-handbook)

------
mi_lk
A little off-topic, but hope this would be an another reminder that despite
having huge success/hype these days, deep learning/machine learning are just
another tools for solving problems, and you can't go really far if you just
treat it like a magic framework (i.e. import tensorflow as tf) and fail to
understand its underlying principles.

------
imh
When I left school, I found myself in a similar boat, and decided to set a
goal of getting myself the knowledge equivalent of an undergrad degree in
math. I already had a physics degree under my belt, so it wasn't as long of a
path as it might be for others, but over a bunch of years of self study it
paid off. When it comes to your career a few years is nothing. The strong
foundation pays regular dividends because learning things that use it comes so
much more quickly.

It's a huge, slow, painful investment, but totally doable and with tremendous
ROI if you want to work with stats/ML/optimmization/really any numerical
computing for a living.

The reason I recommend this route is that most of the more advanced math books
you will encounter will assume this stuff as the readers' common knowledge.
Having that foundation, the majority of the literature is already tailored to
you!

------
Blazespinnaker
A technique I have found that works well is read _a lot_ of paper and their
citations, but don't dive deep. Each paper usually provides some easy to grasp
insight (far too little per paper, but that is elitism for you) that you can
use to get a good picture of the field. Reread papers to grasp more insight.
Once you have a good overall picture, find some area/problem that really
interests you, you like the math, and isn't covered well and bone up on the
math techniques. Do your research and present.

Showing up at thesis defenses is good too. Learn a lot from the back and forth
with advisors.

The key is to understand at a high level why the different math techniques are
being used without actually understanding all the details. This won't be
sufficient for your own work,but at least you'll have a good idea how your
part fits in the scheme of things.

------
cosmic_ape
That person approaches the issue as his personal problem. However, there are
likely many other students around him with the same problem. It is a problem
of the whole field recently.

One solution would be simply to a arrange a local seminar, and understand a
couple of papers _in full detail_. It would help to invite a couple of
mathematically aware students, from mathematics, physics, or the part of cs
faculty where they prove stuff. They should be able to explain and answer
questions immediately, which is way more effective than reading whole books or
taking courses. Those can be read for details later.

If the papers for the seminar are deep learning papers, part of the outcome of
the seminar is likely to be an understanding that the authors of these papers
do not necessarily understand the mathematics themselves.

------
Blazespinnaker
This all does bring up a good point. Why not create a publicly accessible
'math tree' that people can use to learn about any kind of math. If there is a
symbol or step they don't understand they should be able to follow it all the
way down to basic counting.

------
vladislav
Based on your post, I highly recommend taking a year or two off to focus on
math only. You can get the equivalent of a bachelors and masters in pure math
in just two years (if that's all you're doing with your time), and it would be
enough to fix all the issues you're experiencing. Just take the pure math
courses instead of computational math, as abstract and difficult as possible,
it will generalize much better :).

I got into math for exactly the same reasons while doing research in computer
vision as an undergrad, and taking the requisite time off to learn advanced
math (actually going overboard on it) has been an incredible boon to my AI
career.

------
agentofoblivion
I think the real issue is that the OP has a mistaken expectation that they
should understand everything. For instance, the group that wrote the
Wasserstein GAN paper are surely those that think night and day about distance
metrics. And they might be totally lost reading a paper about some energy
based method that relies on concepts from physics.

The point is that researchers have their little niche and they try to make
contributions in areas adjacent to it. It's unrealistic to think everyone
publishing papers understand all the other papers, particularly in such a
cross-disciplinary field like ML. There's also a big gap between a researcher
deep in their career and a student fresh out of a masters program.

It's also hard to transition from someone who's used to reading and
understanding textbooks to someone who's often reading really technical
research and understanding very little of it at first. You just have to push
through and have confidence that you'll eventually learn enough to make a
contribution. That's what it means to "become an expert"\--you start off as
not being an expert and then beat your head against the wall for a few years
until you bootstrap your way out of it. And if you want to do it in a
reasonable amount of time, you should probably choose something you have some
of the fundamentals for.

------
nonbel
From one of the comments:

>"Professional heavy math people are those who said in the 60s that the
perceptron's limitations proved all AI was impossible. And in the 90s that one
hidden layer was all you needed, deep learning was useless."

Can anyone provide the citations for this? I was aware of the latter but not
the first one. You can find people still repeating the one layer stuff up to a
few years ago just by reading stackexchange.

~~~
mannykannot
I wonder if the author is referring to the oft-repeated claim that Minsky's
and Papert's proof that a perceptron cannot learn the Xor function had a
chilling effect on research into neural networks generally, even though Minsky
and Papert themselves had shown that multi-layer networks were capable of
doing so [1][2].

I realize that even this alleged misunderstanding is not the same as a claim
that AI is impossible. The closest attempt of a mathematical proof of the
impossibility of AI that I am aware of is the Lucas-Penrose argument from
Gödel's first incompleteness theorem [3].

[1]
[https://en.wikipedia.org/wiki/Perceptrons_(book)](https://en.wikipedia.org/wiki/Perceptrons_\(book\))

[2] Minsky M. L. and Papert S. A. 1969. Perceptrons. Cambridge, MA: MIT Press.

[3] [http://www.iep.utm.edu/lp-argue/](http://www.iep.utm.edu/lp-argue/)

~~~
nonbel
Thanks, the first ref looks like it may be the one.

------
internetman55
Do you really not need familiarity with the relevant math to be admitted to AI
doctoral programs? I wouldn't have thought that was the case.

------
akhilcacharya
How does one get into an ML PhD like this? I was under the impression it was
impossible if you’re not a math double major.

~~~
jefft255
I am a math double major doing a research masters in ML/CV right now, and I
know plenty of pure CS majors who are doing just fine. The math that 95% of ML
scientists use is not that hard to grasp. Sure, when they encounter functional
analysis stuff, they start to cry inside a little, but that doesn't happen
very often.

~~~
akhilcacharya
Interesting. I’m interested in a related MS/PhD with a similar math background
as OP and assumed I was disqualified.

------
rdlecler1
As Alan Kay noted, the right point of view can add 80 IQ. I was in a
quantitatively heavy field and always felt out classed by those with strong
physics and maths backgrounds. Nevertheless I published to papers in Nature
journals and overturned about 10 years of high profile research, not because I
was smarter but because I spent more time trying to find the right perspective
and when I found anomalies instead of brushing over it, confident in my own
intelligence, I instead drilled down until I found the root of the problem —
something that everyone else had overlooked. You don’t need to be a classical
genius to make a contribution but you probably do have to be tenacious.

------
mathgenius
One thing that hasn't been mentioned: learning mathematics from talking to
another human can be 10-100 times faster than getting it from books. Another
thing: mathematics is huge and seems to accommodate all personality types.
Pick something that turns you on and grow outwards from there. The folks on
reddit seem to be obsessed with Rudin, and that's good stuff, but there's so
many other roads to follow.

And I'm so impressed by how much better the comments are here than on reddit!
Good job HN, you rock.

------
arkh
> the “utility density” of reading those 1000-page textbooks is very low. A
> lot of pages are not relevant, but I don’t have an efficient way to sift
> them out. I understand that some knowledge might be useful some day, but the
> reward is too sparse to justify my attention budget. The vicious cycle kicks
> in again.

That is their main problem. All those useless pages are what becomes useful
later.

And we find the same kind of attitude everywhere in tech: why read a full RFC
when you can assume shit and get done with a 2 paragraphs tutorial?

------
graycat
I'll try to reply to the frustrations of the author of the OP.

I'll give some high level views and also outline the math topics mentioned,
high school through much of a Ph.D. in parts of applied math.

I respond in three parts:

Part I

I have a good pure/applied math Ph.D. and work in applied math and computing;
while I call my work applied math and not artificial intelligence (AI),
machine learning (ML), or computer science, it appears from the OP that there
is significant overlap between my background and work and what the OP is
concerned about.

The Reddit post by the guy in Germany was terrific although easy to parody as
a big feature of old German _culture_! :-)! That post is maybe a bit over
organized.

I've posted too often that the best future for computer science was
pure/applied math, e.g., that someone seriously interested in the future of
computing should as an undergraduate just major in pure math with some applied
math and, essentially f'get about anything specifically about computer
science.

Or, for the _essential_ computer science, write some code in some common
procedural programming language for some simple exercises, check out at the
library D. Knuth's _The Art of Computer Programming, Volume 3, Sorting and
Searching_ , learn about big-O notation, the heap data structure and heap
sort, as an exercise program, say, a priority queue based on the heap data
structure, learn about the Gleason bound and how heap sort achieves it so is
in an important sense the fastest possible sort algorithm, as a side exercise
look at AVL trees, and call computer science done for an undergraduate! This
is partly a joke but not entirely.

Well, it appears that the OP has started to discover some of why I've said
such things about math.

This role for math is just a special case of the old standard situation that,
in nearly all fields, the best work _mathematizes_ the field as in, e.g.,
mathematical physics. Indeed, there is the old joke that good coverage of the
math needed for theoretical physics is so much about just the math that can do
the physics just in the footnotes.

It is a standard situation that nearly everyone in the STEM fields is
convinced that they need to know more math. As I read papers by computer
science professors, I tend to agree!

Here I'll try to help the person with their lament in the OP:

(1) Start at about 50,000 feet up and begin to identify in what fields,
broadly on what problems, you want to work. Remember: One of the keys to
success is good, early work in problem selection.

(2) If you want to work in AI, I suggest you try to regard the current
headline topics in AL/ML as nearly irrelevant. Sure, for political reasons,
might have to keep up, and maybe there are some current, hot applications, but
I don't see that work -- first-cut, ballpark, basically empirical curve
fitting to huge quantities of data -- as much of a start on AI.

E.g., at one time I was hired to work in an AI project, specifically expert
systems. My first reaction was that expert systems -- rule based programming,
working memory and the RETE algorithm, rule firing conflict resolution -- were
junk and, in particular, nothing like a good start on AI, programming style,
or anything else. After 20 years my opinion has changed: Expert systems were
worse than junk, as a style of programming, for much of anything, and in
particular on anything like AI.

Maybe I got some revenge: We were trying to use expert systems for the
monitoring and management of large server farms and networks. For the
monitoring, for essentially _health and wellness_ , they were using
essentially just thresholds set by hand on single variables one at a time. My
view was, wrap that data and processing in all the rules possible, and still
the results won't be very good.

Sure, for monitoring there are two ways to be wrong, (A) a false alarm and (B)
a missed detection. So, right, we're forced into statistical hypothesis
testing with (i) a _null hypothesis_ that the system is healthy, (ii) a false
alarm is Type I error, (iii) a missed detection is Type II error. Quickly
decide that need some statistical hypothesis tests, with the null hypothesis
that the system is healthy, that are both multidimensional (treat several
variables jointly) distribution-free (make no assumptions about probability
distributions) and find that apparently there were none such in the
literature. So, I invented a large class of such tests that, really, totally
knocked the socks off expert systems for much of monitoring. And what I did
was just some applied math and applied probability -- some group theory, some
probability theory based on measure theory, some measure preserving as in
ergodic theory, etc.

So, yes, I could say that my monitoring took in _training_ data, did some
_machine learning_ , was better at its monitoring than humans so was some
_artificial intelligence_ , was _computer science_ , etc., but I didn't: I
just called my work some applied math. E.g., I got real hypothesis tests where
false alarm rate was adjustable and known in advance and some useful best
possible results on detection rate. So, as I wrote in the paper, take away the
mathematical assumptions, derivations, theorems and proofs, forget about false
alarm rate and detection rate, and just do the specified data manipulations
and call the work AI.

So, such real problems and applied things are one approach to computer
science, AI, ML, etc. Then, sure, for such work as in what I did for
detection, need, really, the Ph.D. coursework in pure and applied math and
some ability to do publishable theorems and proofs in math -- for both, net,
need a pure/applied math Ph.D.

So, back to AI and mostly setting aside the current headlines: For AI, I'd
suggest that from 50,000 feet up start by watching

[https://www.youtube.com/watch?v=vc8MddDFRw4](https://www.youtube.com/watch?v=vc8MddDFRw4)

with a mommy kitty cat (domestic short hair tabby) and 6 or so of her kittens,
maybe only 2 days old, and how they learn. Their learning is astoundingly
fast. E.g., for how they learn to use their hind legs, can see fairly directly
some of just how they do that. In just that one video clip, in real time of
likely just a week or so, those kittens go from nearly helpless balls of fur
to young kitty cats. Easy to guess that in two months they will be safely 40
feet up a tree catching something or other, effortlessly doing gymnastic feats
that would shame Olympic athletes, etc. Astounding.

Okay for AI from 50,000 feet up, start by trying to guess how those kittens
learn. Maybe there will be some math in that, maybe not. Then if you can
implement your guess in software, if the software appears from good tests in
practice to learn some things well, and if your guess is fairly general, then
maybe you have some progress on AI. Here f'get about my view and confirm with
some AI profs who would have to review your work, hire you for a research
slot, give you a research grant, etc.

So, for some research, (i) could do some math as I did for that monitoring or
(ii) try to do some real AI, e.g., by starting by watching those kittens learn
where maybe, eventually there will be some math.

For both those research directions, notice that need (i) some overview of the
real problem, (ii) some intuitive insights, (iii) some good, new ideas. Maybe
the ideas will be mathematical or use math and maybe not. E.g., might go a
long way on what those kittens are doing before use much math.

(3) But for the math as mentioned by the OP, I'll try to give an outline:

(i) Start with the real numbers, e.g., as learned likely well enough by the
9th grade. Then, sure, learn about the complex numbers. So, net, have a high
school major in math, e.g., everything short of calculus.

(ii) Learn college calculus well. At least in part, you can do well alone: Due
to some circumstances beyond my control, for freshman calculus I just got a
good book, studied, worked the exercises, and learned. Then at a college with
a good math department, I started on sophomore calculus. You can do such stuff
yourself.

(iii) It would be good to take a course in abstract algebra, especially one
where nearly all the exercises are proofs. So, learn about sets, functions,
groups, rings, fields, more about the real and complex numbers, maybe some
basic number theory, the greatest common divisor and least common multiple
algorithms, and some about vector spaces. Might touch on cryptography and
coding theory.

Really the more important, maybe main, value of the course is just learning
how to write proofs. The math there is nearly all just so childishly simple
that it's easy to learn to write very highly precise proofs -- crucial stuff
if later want to publish theorems and proofs.

Blunt fact of life, politically incorrect observation: Without such training
in writing theorems and proofs, and, really, just in math notation and how to
do math derivations, tough ever to learn how. So, can find chaired professors
of computer science at top computer science departments in top US research
universities who, however, fumble terribly with just standard math notation
and, especially, with how to write theorems and proofs.

~~~
graycat
Part II

Super simple view: In math, there are sets with elements. That's the logical
foundation of essentially all of current pure/applied math. The details are in
Zermalo-Fraenkel axiomatic set theory assuming the axiom of choice. E.g., can
construct from sets a set that looks like the real numbers we knew about in
the 9th grade. Soon define ordered pairs and, then, functions. After that, a
huge fraction of everything is functions. The proofs -- as actually written
but without the crucial intuitive ideas that permitted finding the proof --
are all essentially just symbol substitution as in basic logic and Whitehead
and Russell. Warning: Any math you write for publication should be easily
translated back to just sets and symbol substitution; that and nothing else is
the criterion. If you also want the proof to be readable by humans, there is
more. E.g., in the proof might mention one by one each theorem assumption and
where it gets used.

(iv) Learn linear algebra. Really the subject grew out of Gauss elimination
for solving systems of linear equations -- with some additional attention to
numerical stability (e.g., partial pivoting, double precision inner product
accumulation, and iterative improvement) \-- that's nearly always still the
way to do it. It's fun to program a good Gauss elimination routine, e.g., just
in C.

There see clearly that any such system of equations has none, one, or
infinitely many solutions. Later will discover that the set of all the
solutions is an affine subspace, that is, a vector space plus some one vector;
that is, a plane that does not pass through the origin. And will discover that
the left side with the unknowns is a linear function -- big time stuff.

So, that start was a small thing for a huge future.

Next up, we consider n-tuples of real (or complex, here and always in linear
algebra) numbers. Then we see how to make the n-tuples a vector space. They
are the most important of the vector spaces.

But should also see the abstract (with no mention of n-tuples) definition of a
vector space because (a) the n-tuples are the most important example, (b) even
when working with just n-tuples often need the more abstract definition
(especially for subspaces, which are often the real interest, e.g.,
hyperplanes in curve fitting and multivariate statistics), and (c) there are
other important vector spaces that are not just n-tuples (in signal
processing, sets of random variables, wave functions in quantum mechanics,
solutions to some differential equations, and much more).

So, learn about linear independence and bases (essentially coordinate
systems).

Learn about inner products, distance, angles, and orthogonality. See
generalizations of cosines, e.g., the Schwarz inequality, and the Pythagorean
theorem.

Learn about eigenvalues and eigen vectors \-- those eigen vectors are often
the most important ones in applications, e.g., your favorite coordinate axes.

Then for the crown jewel, learn about the polar decomposition and, thus,
singular values, principal components (e.g., data compression), the core of
the normal equations in statistics, etc.

There is a remark in G. Simmons that the two pillars of mathematical analysis
are linearity and continuity. The _superposition_ in physics is essentially
linearity. In applied math, linearity is the main tool, the key to the land of
milk and honey. Well, a good course in linear algebra is a good start on
linearity. In particular, those linear equations solved with Gauss elimination
are _linear_ as in linear transformations in, right, linear algebra.

Then, sure, that version of linearity takes one through much of all of applied
math, e.g., Fourier analysis, the fast Fourier transform, X-ray diffraction,
Banach space, Hilbert space, the classic Dunford and Schwartz, _Linear
Operators_ , etc.

So, a Banach space is just a vector space where the scalars are the real or
complex numbers, there is a _norm_ , that is, a definition of distance, and
the space is _complete_ in that norm. _Complete_ is what the rational numbers
are not but the real numbers are. Or, _complete_ means that a sequence that
appears to converge, that is, converges in the weaker sense of Cauchy,
actually does have something to converge to and does. E.g., in the rationals,
the approximations to more and more decimal places of pi have nothing to
converge to but in the reals do.

In theorem proving, nice to have completeness. But, sure, computing knows next
to nothing about completeness because we compute essentially only with
rational numbers. So, can do a lot of work without completeness. Indeed, in
applications, often we are just approximating, and the rationals can get as
close as we please to pi!

Banach spaces are not trivial or useless: E.g., based on the Hahn-Banach
theorem, there is the grand applied math dessert buffet

David G. Luenberger, _Optimization by Vector Space Methods_ , John Wiley and
Sons.

A Hilbert space is a Banach space where the norm comes from an inner product.

E.g., the set of all real valued random variables X such that E[X^2] is finite
form a Hilbert space. That the space is complete for those random variables is
nearly mind blowing; actually the proof is short.

(v) There remains _Baby Rudin, Principles of Mathematical Analysis_.

There see calculus done with essentially full care, as theorems and proofs.
So, again, get lessons in how to write theorems and proofs on the way to being
a good mathematician.

The main content of the book is just showing that a real valued continuous
function defined on a closed interval of the reals has a Riemann integral.

The key is that that interval is _compact_. So, learn about compactness, which
is of quite general usefulness.

Then with compactness and continuity, have uniform continuity. Now the doors
to grandeur start to open: That the Riemann integral works is a short proof.
And, later in the book, get the _three epsilon_ proof that the uniform limit
of continuous functions is a continuous function (was a question on one of my
Ph.D. qualifying exams -- from baby Rudin, I got it!).

Compactness is so powerful that it is nearly the same as just finiteness --
there's a famous, old paper on that.

Well for a positive integer n and the set of real number R, in R^n a set is
compact if and only if it is closed and bounded. Now we are cleaning up the
Riemann integral and a lot of associated stuff.

At the back, Rudin gives a nice, short definition of a set of the real numbers
that has _measure zero_ (without really getting deep into measure theory) and,
then, shows the Riemann integral exists if and only if the function is
continuous everywhere except on a set of measure zero. Nice. Now an exercise
is to find a function that is differentiable but whose derivative is not
Riemann integrable. Might look in Gelbaum and Olmstead, _Counterexamples in
Analysis_.

Also the later editions of baby Rudin cover the exterior algebra of
differential forms. That material is of interest in differential geometry,
some applications, and general relativity.

That's an overview of what baby Rudin is about.

Go through that book carefully and will come out with (a) some good knowledge
of the "principles" of the analysis part of pure math and (b) much better
skills at doing math derivations, definitions, theorems, and proofs. If you
want to write and publish new proofs for, say, AI, baby Rudin is one of your
best mentors, maybe your Fairy Godmother?

For being an applied mathematician, sure, you already guessed, and you were
correct, that essentially always in practice the Riemann integral exists; so,
why sweat the details? Okay, then just use baby Rudin to learn about
compactness, continuity, uniform continuity, measure zero, and more on how to
write proofs. And, really, focusing just on compactness, continuity, etc., can
pull that off in a few, nice weekends, maybe just one. Then look at that
result near the back on the uniform limit of continuous functions is
continuous to see how to do such work.

Sure, Rudin discusses compactness, etc. on metric spaces. Well, easily enough,
the set of real numbers R is also a metric space! And for positive integer n,
so is R^n.

So, why say metric space instead of, say, just R^n? Well, first, the theory is
_cleaner_ because a metric space has so many fewer assumptions than R^n so
that can see more clearly just what assumptions make the results true. Second,
maybe some fine day all you will have is just a metric space and, then, can
still use the results -- don't hold your breath while waiting for a
significant application, either pure or applied, with a non-trivial metric
space that isn't also much more! Or, proving the stuff in a metric space is no
more difficult, more general, and maybe, and actually, more useful.

Or, maybe math had the results in R^n and then invented a metric space just to
have a place to have just enough to make the results true! So, a metric space
was invented to have the least assumptions needed for proving the results; so,
what came first were the results, and the metric space definition came later.
Maybe!

~~~
graycat
Part III

(vi) Continuing on, there is the subject of measure theory. That was from H.
Lebesgue, a student of E. Borel, right, in France near 1900.

They were correct: They improved on the Riemann integral. Don't worry:
Whenever the Riemann integral exists, the integral from Lebesgue's measure
theory gives the same numerical answer.

So, why a new (way of defining the) integral? Two biggie reasons:

(a) For a lot of theorem proving about integrals, e.g., differentiation under
the integral sign, clean treatment of what physics needed from the Dirac delta
function (right, there can be no such function, but measure theory has a good
answer), definition of an integral of several variables, interchange of order
of integration in iterated integrals, tying off some old, loose ends in
Fourier theory, the deep connections between integration and linear operators,
and more, Lebesgue's work is crucial and terrific stuff.

(b) Lebesgue's integral is much more general than the Riemann integral, and
that generality is crucial, especially as the foundation for probability
theory.

Now, for what Lebesgue did:

First, he developed _measure_ theory. That's essentially just a grown up
theory of area like you have known about since grade school. E.g., given a
set, in the real numbers, R^n, something more complicated, or something fully
abstract, the _measure_ of that set is essentially just its area. With the
generalization, sure, can have some sets with measure infinity, negative
measure, complex valued measure, etc. But on the real line, with the usual
measure, _Lebesgue_ measure, the measure of an interval is just its length,
and you already know about that.

But Lebesgue measure is darned general: It's tricky to show that there is a
set of the reals that is not Lebesgue measurable, and the usual proof uses the
axiom of choice.

So, for measure theory, there is a _measure space_ with three things: There is
a _space_ , just some non-empty set, say, M; there is a collection of
_measurable_ sets, subsets of M called, say, S; and there is a measure m. Then
the collection of measurable sets S satisfies the simple, essentially obvious
axioms we would want for area and, thus, is a _sigma algebra_ of sets. Then
for each set A in S, its _measure_ is the real (or complex) number m(A). We
ask that m have the properties we want for a good theory of area. It's just a
grown up version of area. The theory is not trivial; it was tricky to get all
the details just right so that have a good theory of area, that is, so that
area works like we want it to.

E.g., for the space, can have the set of real numbers R. For the measurable
sets, have the intervals and, then, all the other sets needed to have a sigma
algebra (a short proof shows that this definition is well defined). Then the
Lebesgue measure of an interval is just its length, and the measure of all the
other measurable sets are what they, then, have to be (need some theorems and
proofs here).

So, for probability theory, a _probability_ space is just a measure space
(each point in that space is one experimental _trial_ ); a _probability_ , P,
is just a positive measure with maximum value 1; an _event_ A is just one of
the measurable sets; and the _probability_ of A is just its measure P(A). What
we want for probability is already so close to a theory of area that, really,
we have little choice but just to follow what Lebesgue did. That's what A.
Kolmogorov observed in his 1933 paper.

Second, with the foundation of measure theory, Lebesgue defined the Lebesgue
integral.

So, what is being integrated is (usually) a function taking real or complex
values. The domain of the function is a measure space.

Then for the integral, say, in the case of a real valued function, we
partition on the Y axis, that is, in the range of the function instead of in
its domain. So, we don't have to do Riemann-like partitions of the domain of
the function and, thus, the domain can be much more general.

As the first step, we only integrate functions that are >= 0, and we do that
by approximating, right, again with essentially rectangles, only from below
the function, not both above and below as for Riemann. Here the domain of the
function can be the whole space, e.g., the whole real line. We don't care
about either continuity or compactness.

For a function that is both positive and negative, we multiply the negative
part by -1 and integrate the two parts separately. If at least one of the two
results is not infinity, then we subtract, and that's the integral.

A _random variable_ is just such a function, and its _expectation_ is just its
integral.

Summary

Don't feel like the Lone Ranger; not everyone knows this stuff. E.g., from all
I've been able to see from quantum mechanics, the wave functions are
differentiable. Then I'm told that the wave functions, wondrous, are also
continuous. From baby Rudin, of course they are continuous; every
differentiable function is continuous! Then I'm told that the wave functions
form a Hilbert space. Well, I can see that they can be points in a Hilbert
space, but they can't form a Hilbert space because the continuous functions
won't be complete.

In elementary probability, it is common, e.g., for finding the expectation of
a Gaussian random variable, to integrate over the whole real line. Tilt: Baby
Rudin defines the Riemann integral only over compact sets. So, can use an
improper integral, and then they want to differentiate under the integral
sign. Tilt again -- the needed theorems are not so easy to find. So, really,
this common stuff, in probability, physics, etc. of integrating over the whole
real line or over all of R^n is using the Lebesgue theory where have clean
theorems for such things.

That's your overview and your work for the weekend. You are now permitted two
beers and half a pizza but only if you have someone you like a lot for the
rest of the pizza and some more beer. Can substitute Chianti for the beer. But
no more math permitted; maybe a movie, but not math!

------
tabtab
Perhaps you can focus on improving the usability of AI tool-sets to a wider
market rather than focus on finding the Next Big Magic Equation. Example:
[https://github.com/RowColz/AI](https://github.com/RowColz/AI) An AI expert(s)
may do the initial setup, but factor tables allow more "typical" office
workers to tune and prune the results.

------
archagon
I like the comment that likens this sort of deeply-linked knowledge to a DAG.
In my own (limited) experience, once I’ve mentally found the DAG where every
node either references some other node or baseline knowledge, the learning
task almost immediately switches from daunting to routine. Just work on
understanding each node in the dependency chain until you get to the one you
seek!

------
ohazi
From TillWinter's response: > Also: doing the master is to understand that you
don't know anything, and doing your doctorate is to learn the others know
nothing as well.

I had always heard variations on the first part -- that going to a good school
was supposed to humble you by showing you how much you don't actually know.

Never heard the second part. That's great.

------
togelius
I am an AI researcher and faculty member at a large and famous university. I
probably know less math than that Reddit poster. Math is important if you are
specifically interested in the math of AI. If you are interested in inventing
algorithms and solutions you mostly don't need the math.

------
hackernewsacct
Starting from pre-calculus what areas of mathematics (with book
recommendations) should one study rigorously to have the foundations to pursue
a PhD in Machine Learning?

------
amai
Remember:

"Complex models are rarely useful (unless for those writing their
dissertations)." (V.I.Arnold)

------
amelius
Perhaps the author should write a ML tool to help sift through all the
material ;)

------
tzahola
I think he should just go on with his research and don’t bother with
understanding every obscure reference in papers. One can grasp the core ideas
surprisingly well even when skipping over proofs and formulas.

And when the time comes to write his own papers, he should remember to
intentionally make it harder to read for outsiders. E.g. instead of writing “I
calculated the total error by summing the per-neuron errors”, one should write
“the loss function utilized an integral over the output lattice using a
discretized method by Newton et. al.”, or some other bullshit.

~~~
gfodor
As an amateur who has jumped in and out of learning basic ML over the years,
it has been interesting to see the web of terminology expand to the point
where your post is no longer satire. Writing a dictionary or annotating papers
to decipher ML-speak to basic-math speak would be a pretty worthwhile endeavor
for someone (I see glimmers of this in the work being done by the folks at
fast.ai) and in general would probably not remove much real information.

------
miloshadzic
Most-upvoted reply is excellent.

~~~
aashu_dwivedi
you should link to the reply instead, because the most upvoted reply might
change.

~~~
gbuk2013
[https://www.reddit.com/r/MachineLearning/comments/73n9pm/d_c...](https://www.reddit.com/r/MachineLearning/comments/73n9pm/d_confession_as_an_ai_researcher_seeking_advice/)

------
ryanx435
The depth and illegibility of the field he is describing make me believe that
we are much further away from general purpose ai than I previously thought

