Not sure about these books as a self-study curriculum — their unifying theme seems to be that they require a reasonable level of mathematical maturity going in. But, they absolutely comprise an excellent “greatest hits” list of math books in the most influential subdisciplines. You’re guaranteed to learn a tonne if you study any one of these books.
I don't buy the narrative that the article is promoting.
I think the machine learning community was largely over overfitophobia by 2019 and people were routinely using overparametrized models capable of interpolating their training data while still generalizing well.
The Belkin et al. paper wasn't heresy. The authors were making a technical point - that certain theories of generalization are incompatible with this interpolation phenomenon.
The lottery ticket hypothesis paper's demonstration of the ubiquity of "winning tickets" - sparse parameter configurations that generalize - is striking, but these "winning tickets" aren't the solutions found by stochastic gradient descent (SGD) algorithms in practice. In the interpolating regime, the minima found by SGD are simple in a different sense perhaps more closely related to generalization. In the case of logistic regression, they are maximum margin classifiers; see https://arxiv.org/pdf/1710.10345.
The article points out some cool papers, but the narrative of plucky researchers bucking orthodoxy in 2019 doesn't track for me.
Yeah this article gets a whole bunch of history wrong.
Back in 2000s, the reason why nobody was pursuing neural nets was simply due to compute power, and the fact that you couldn't iterate fast enough to make smaller neural networks work.
People were doing genetic algorithms and PSO for quite some time. Everyone knew that multi dimentionality was the solution to overfitting - the more directions you can use to climb out of valleys the better the system performed.
To the left of the "detailed spaceship" I think I see a distortion pattern reminiscent of a cloaked Klingon bird of prey moving to the right. Or I'm just hallucinating patterns in nebular noise.
Two schools of thought here. One posits that models need to have a strict "symbolic" representation of the world explicitly built in by their designers before they will be able to approach human levels of ability, adaptability and reliability. The other thinks that models approaching human levels of ability, adaptability, and reliability will constitute evidence for the emergence of strict "symbolic" representations.
TLDR: Browser vendors made Shadow DOM for themselves.
Browser implementors use Shadow DOM extensively under the hood for built-in HTML elements with internal structure like range inputs, audio and video controls, etc. These elements absolutely need to work everywhere and be consistent, so extreme encapsulation and fixed api for styling them is an absolute must.
The Shadow DOM API is the browsers exposing, to developers, a foundational piece of functionality.
If you’re thinking about whether Shadow DOM is appropriate for your use case, consider how/why the vendors use it —- when an element’s API needs to be totally locked down to guarantee it works in contexts they have no control over. Conversely, if your potential use case is scoped to a single project, the encapsulation imposed (necessarily!) by Shadow DOM is probably overkill.
Web components are a decent way to make reusable UI, but if they don’t have strong encapsulation needs, you might avoid Shadow DOM.
The argument of (1) doesn't really have anything to do with humans or antromorphising. We're not even discussing AGI, we're just talking about the property of "thinking".
If somebody claims "computers can't do X, hence they can't think".
A valid counter argument is "humans can't do X either, but they can think."
It's not important for the rebuttal that we used humans. Just that there exists entities that don't have property X, but are able to think. This shows X is not required for our definition of "thinking".
Certainly many cultures and religions believe in some flavor of intelligent design, but you could argue that if the natural world (for what we generally regard as "the natural world") is created by the same entity or entities that created humans, that doesn't make humans artificial. Ignoring the metaphysical (souls and such) I'm struggling to think of a culture that believes the origin of humans isn't shared by the world.
In this case, I was thinking of unusual beliefs like aliens creating humans or humans appearing abruptly from an external source such as through panspermia.
Could you give more details about what precisely you mean by interpolation and generalization? The commonplace use of “generalization” in the machine learning textbooks I’ve been studying is model performance (whatever metric is deemed relevant) on new data from the training distribution. In particular, it’s meaningful when you’re modeling p(y|x) and not the generative distribution p(x,y).
It's important to be aware that ML textbooks are conditionalising every term on ML being the domain of study, and along with all computer science, extremely unconcerned with words they borrow retaining their meaning.
Generalisation in the popular sense (science, stats, philosophy of science, popsci) is about reliability and validity, def. validity = does a model track the target properties of a system we expect; reliability = does it continue to do so in environments in which those features are present, but irrelevant permutations are made.
Interpolation is "curve fitting", which is almost all of ML/AI. The goal of curve fitting is to replace a general model with a summary of the measurement data. This is useful when you have no way of obtaining a model of the data generating process.
What people in ML assume is that there is some true distribution of measurements, and "generalisation" means interpolating the data so that you capture the measurement distribution.
I think it's highly likely there's a profound conceptual mistake in assuming measurements themsleves have a true distribution, so even the sense of generalisation to mean "have we interpolated correctly" is, in most cases, meaningless.
Part of the problem is that ML textbooks frame all ML problems with the same set of assumptions (eg., that there exists an f: X->Y, that X has a "true distribution" Dx, so that finding f* implies learning Dx). For many datasets, these assumptions are false. Compare running a linear regression on photos of the sky, through stars to get star signs, vs. running it on V=IR electric circuit data to get `R`
In the former cases, there is no f_star_sign to find; there is no "true distribution" of star sign measurements; etc. So any model of star signs cannot be a model even of measurements of star signs. ML textbooks do not treat "data" as having these kinds of constraints, or relationships to reality, which breads pseudoscientific and credulous misunderstandings of issues (such as, indeed, the othello paper).
I’m reading a winking, ironic acknowledgement from the authors that the mathematical definition of individual utility may not map perfectly onto the psychology of a patron of the arts.