
Machine learning leads mathematicians to unsolvable problem - magoghm
https://www.nature.com/articles/d41586-019-00083-3
======
p1esk
This work was published in “Nature Machine Intelligence“, the journal
boycotted by pretty much everyone who matters in ML.

Reflects poorly on the authors, regardless of the actual merits of their
finding.

~~~
joe_the_user
As far as I can tell, the article isn't really of any interest either.

Formulating a proposition that is independent of standard axioms is simple.
Formulating it in the language of machine learn is an exercise. The main thing
is the authors didn't provide any motivation for this to matter to the overall
enterprise of machine learning, because there isn't motivation for this. It's
just a novelty.

~~~
calf
I don't understand your reasoning. Because, you could also "formulate" CH in
the "language" of Turing machines. But that is clearly disanalogous to what
the article is saying.

------
chaboud
One doesn't need to go back to Gödel for this to make it a "huh" moment.
Instead, go back to 2006 to make it a "duh" moment.

Aggregability is NP-Hard:
[https://www.google.com/url?sa=t&source=web&rct=j&url=http://...](https://www.google.com/url?sa=t&source=web&rct=j&url=http://citeseerx.ist.psu.edu/viewdoc/download%3Fdoi%3D10.1.1.78.6237%26rep%3Drep1%26type%3Dpdf&ved=2ahUKEwjg3oON9p7gAhXiHDQIHctEBfEQFjAAegQIAxAB&usg=AOvVaw3KFPsAt9iWqBpUqZrEigpu&cshid=1549174909564)

That is, even for linear systems, determining whether or not macrovariables
(e.g. complete eigenvector sets, complete embeddings) exist for a given space
is an NP-Hard problem. With linear systems serving as, effectively, a lower
bound for ML problems (because, otherwise, why are you even ML'ing the
thing?), whether or not a problem is "learnable" is, unsurprisingly, probably
NP-Hard. Aggregability and learnability look staggeringly similar to me.

Demonstrating otherwise would be a _huge_ result. Writing a paper confirming
Kreinovich and Shpak in a slightly different domain is basically a "Water is
Wet" paper.

Having not yet read the paper, I'm unaware of any new ground here.

~~~
Cacti
NP-hard and undecidability are completely different things. They’re barely on
the same planet.

~~~
miclill
I would argue that for practitioners it is the same planet. But I agree that
they are not the same thing.

~~~
shoo
disagree, some small size NP-hard problems can be solved in practise.

or, given enough effort / luck / redefining the problem, sometimes extra
structure can be found for the particular problem instances that you want to
solve, meaning that they actually belong to an easier problem class that can
be solved in practise -- e.g. you've got extra information or constraints that
you're not actually using, so you don't need to solve the NP-hard problem in
general, just solve the specific problem you've got.

i'll willing to change my position if someone can argue it is similar for
undecidable problems.

~~~
pfortuny
Well: the halting problem is undecidable. That does not prevent us to try and
prove (many many times) that an algorithm does or does not stop.

I guess the parent meant something like this.

Of course the concepts are totally dissimilar but the practical consequences
are less apart.

------
madhadron
The theorem is proved for only for probability distributions with support at a
finite number of points on [0,1], and the sigma-algebra of all subsets of
[0,1]. This is an utterly bizarre choice and makes the whole thing basically
uninteresting.

One of the first things you see in a graduate course on real analysis that
covers measure theory is the construction of an unmeasurable set using the
axiom of choice (equivalent to the continuum hypothesis). Then you say,
"that's useless," and restrict the subsets you work with to those that don't
depend on this problem, or you go work with intuitionist, finitist,
ultrafinitist, or some other logic, which gives basically the same effect.
Then you build all of probability theory and machine learning in this actually
useful world.

The unmeasurable set construction that everyone uses is to consider all
subsets of [0,1] of the form {x + r : x in [0,1], r is rational}. Since there
are uncountably many irrational numbers, you end up with an uncountable number
of these subsets, and then use the axiom of choice to select an element of
each.

For their probability distributions with finite support, consider the sets of
points that form the support of the distributions. I can form a sequence of
such sets that heads towards that unmeasurable set, and a corresponding
sequence of probability distributions. They completely ignore whether their
set of distributions is complete (that is, contains all limits of
sequences—hint, it isn't), but they're taking suprema, so they effectively are
working in the closure anyway, where the limits are included.

So basically they have found an obscure way of showing the first result taught
in measure theory and dressed it up in the clothes of machine learning, even
though no machine learning work ever conducted or that will ever be conducted
with work on this mathematical structure because of this exact reason.

~~~
dltnhic
This is not adding to the conversation. Your comment was my impediment in
moving past the second page of the paper. Problem is, I never got to study
foundations of measure theory, and presumed I could be wrong. Do you have any
self-study recommendations, paper or concise books?

~~~
madhadron
I really don't have a good recommendation. :( I learned it from a person, not
a book, and the textbook he was assigning exercises from (Folland) is
essentially useless for self study to my mind.

------
sampo
Earlier discussion:
[https://news.ycombinator.com/item?id=18858724](https://news.ycombinator.com/item?id=18858724)
(Same research, but different news coverage)

The actual research, which I find more understandable than the news coverage:
[https://www.nature.com/articles/s42256-018-0002-3](https://www.nature.com/articles/s42256-018-0002-3)

------
sn41
Isn't all this press by the Nature publishing group just about hyping up one
of its newer magazines that the larger research community is boycotting?

By the way, Paul Cohen, who proved the independence of the Continuum
Hypothesis, says that Godel did not express much interest in independence
results (I think Godel believed in Platonism):

[https://www.youtube.com/watch?v=VBFLWk7k1Zo](https://www.youtube.com/watch?v=VBFLWk7k1Zo)

An article by Godel, taking a philosophical viewpoint about set theory and
independence, is "The modern development of the foundations of mathematics in
the light of philosophy":

[http://www.marxists.org/reference/subject/philosophy/works/a...](http://www.marxists.org/reference/subject/philosophy/works/at/godel.htm)

Godel believed that for well-defined areas of mathematics, we can add more
_intuitive_ axioms to make the theory decidable: The appropriate quote from
the article: "I would like to point out that this intuitive grasping of ever
newer axioms that are logically independent from the earlier ones, which is
necessary for the solvability of all problems even within a very limited
domain, agrees in principle with the Kantian conception of mathematics. "

~~~
cwzwarich
> By the way, Paul Cohen, who proved the independence of the Continuum
> Hypothesis, says that Godel did not express much interest in independence
> results (I think Godel believed in Platonism):

Gödel proved the other direction of the independence of CH (and the Axiom of
Choice), and had been trying to prove independence some years after that. He
had actually succeeded in proving the independence of AC by 1943, and later
said in an interview with Hao Wang that his methods could have probably been
extended to prove the independence of CH by 1950, but he stopped working on it
because he developed a distaste for the subject. In retrospect, Gödel wished
that he had continued his work so that set theory progressed faster.

The article The Origins of Forcing by G. H. Moore in the proceedings of the
1986 Logic Colloquium is a good discussion of the relevant history.

> Godel believed that for well-defined areas of mathematics, we can add more
> _intuitive_ axioms to make the theory decidable: The appropriate quote from
> the article: "I would like to point out that this intuitive grasping of ever
> newer axioms that are logically independent from the earlier ones, which is
> necessary for the solvability of all problems even within a very limited
> domain, agrees in principle with the Kantian conception of mathematics. "

This line of research hasn't gone very well in practice since Gödel's time. It
has produced a lot of interesting set theory, but has not really impacted
mathematics at large.

~~~
sn41
Thank you for the reference on forcing. Will read it carefully.

> This line of research hasn't gone very well in practice since Gödel's time.
> It has produced a lot of interesting set theory, but has not really impacted
> mathematics at large.

What about Hugh Woodin's work? I agree that it has not had impact yet, but do
you think it might lead along this suggested line?

------
placebo
_" In the latest paper, Yehudayoff and his collaborators define learnability
as the ability to make predictions about a large data set by sampling a small
number of data points. The link with Cantor’s problem is that there are
infinitely many ways of choosing the smaller set, but the size of that
infinity is unknown."_

Reading this got me wondering about more philosophical aspects, such as the
relation of this conclusion to to the process of human learning and prediction
(which obviously works) and whether it's somehow tied in some way to the
problem of induction. Could be nonsense, but I'll be putting some more thought
into the meaning of this

~~~
pas
The problem with these papers is, that if you want to do practical ML, you can
simply fix some upper bounds on various numbers that arise during the analysis
of the problem, and thus make it mathematically finite, and trivial. Not easy,
of course, but this frees you from the infinite regress that's the problem of
induction.

Sure, you can the start to ask what are the good upper bounds. How many epochs
to use for training, start to quantify the training set quality, and then ask
how good that must be to get a good enough model. Etc. But all of these then
can be limited by estimation.

------
curiousStarDust
Not completely on the subject, but I had to read the first sentence 5 times,
before I was able to understand its meaning. Then I realized it was too long,
so modified it to following version. Is it better or worse?

"A team of researchers has stumbled on a question that is mathematically
unanswerable. It is linked to logical paradoxes, that were discovered by
Austrian mathematician Kurt Gödel in the 1930s and it can’t be solved using
standard mathematics."

~~~
elliotec
Yeah, it’s a bit better. Not sure how much, maybe more for non native
speakers?

~~~
curiousStarDust
Thanks! English is my 3rd language, so I do struggle with reading sometimes.

------
galaxyLogic
If I understood the article correctly (and I think I didn't) it says
learnability of a given data-set is like the halting problem, there is no
general algorithm for deciding whether an arbitrary data-set is learnable. Am
I close?

~~~
hatsuseno
This was my takeaway from the article as well. Even though I have no academic
knowledge on the subject, can't say I was surprised to learn this. To me it
seems obvious in hindsight, though might be Dunning-Kruger.

------
fooker
Can someone explain why/if this is a big deal? It seems to me that they just
rehashed undecidability and logical incompleteness.

------
thyrsus
I don't understand how a result concerning infinities can apply to a finite
number of data points represented by finite (bounded - probably by 2^32)
integers processed in finite time.

~~~
perfmode
The reals are infinite. Float/double

So, while I think your question is interesting and good, I don’t think integer
size is a strong counterexample in support of a different conclusion.

~~~
repiret
float/double can store a subset of the rationals plus some other values which
aren’t found in the reals

~~~
lern_too_spel
And those other values are finite (NaNs, infinities, unequal zeros). I don't
know why you're downvoted. What you said is absolutely correct.

It's an absurd thing for GP to say integers aren't infinite (apparently
talking about fixed width integer types) and then turn around and say
float/double are infinite (talking about single precision and double precision
floating point types).

------
dabs
Question for someone with a more theoretical background: The paper shows that
the EMX learnability of some class of functions with respect to some set of
probability distributions is undecidable. Does EMX learnability encompass all
notions of learnability (or is it equivalent to other notions)? Conversely are
there or could there be notions of learnability different from EMX that are
not undecidable? Maybe I missed this in the paper but clarification would be
appreciated.

------
rahuldottech
I mean, c'mon we all saw this coming...

------
min2bro
Where is mathematics in ML today? Most of that are happening under the hood
and all algorithms are just a black box.

~~~
YjSe2GMQ
There's a very neat and often forgotten piece of math in learning, VC
dimension (unrelated to venture capital):

[https://en.m.wikipedia.org/wiki/Vapnik–Chervonenkis_dimensio...](https://en.m.wikipedia.org/wiki/Vapnik–Chervonenkis_dimension)

Also, the reason why black box methods are such a big deal now is precisely
because controlled/engineered methods turned out to be inferior (obvious
example: image recognition; look no further than into the story of the dropout
method and Alex Krizhevsky).

Edit: s/basically forgotten/often forgotten/ in the first sentence.

~~~
selimthegrim
It's not very forgotten if it's in the Learning from Data course, to say the
least

------
JayRoni
Professor Peter O'Hearn (UCL Computer Science) says latest research findings
on 'unsolvable' mathematical problems is “of a rare kind”, and will probably
be important for the theory of machine learning.

