
Unprovability comes to machine learning - joker3
https://www.nature.com/articles/d41586-019-00012-4
======
nafizh
It's funny to note that the first author signed a boycott [0] for publishing
research in Nature's new journal (Nature Machine Intelligence) covering AI
topics but is one of the first to publish there.

[0]
[https://openaccess.engineering.oregonstate.edu/signatures](https://openaccess.engineering.oregonstate.edu/signatures)

~~~
kristianov
Step 1: Sign the Nature boycott.

Step 2: Get boycott popular. Less people submitted to Nature.

Step 3: Submit to Nature. Accepted due to less competition.

~~~
hyperbovine
This is a News and Views piece, published in the journal Nature, about a paper
in Nat. Mach. Int. The distinction is important.

~~~
Bartweiss
That is an important distinction, but in this case the problem still applies.

The boycott doesn't apply to Nature, and arguably doesn't apply to non-
publication writing like News and Views. Rather the boycott is of Nature
Machine Intelligence, and the first author of the underlying paper in NMI is a
signatory on the boycott list.

------
jerf
It sounds like we can too determine its practical effect, which is zero. No
real-world machine learning algorithm can use the real numbers. We can not
obtain them to feed them as input, our real-world machines can not process
them if we could, and if we got real-number-based answers, we could not use
them. Whether the universe itself is or is not based on real numbers, we do
not have access to them either way. Our real-world machines only have access
to the computable numbers (and only a subset of them, at that).

I'm only speaking to the practical impact.

It's possible someone will find some way work this down into something
practical, but I'd be fairly surprised. Real machine learners get finite data,
for which the continuum hypothesis is irrelevant.

(I assume the original authors fully understand this.)

~~~
jackpirate
That's like saying that the undecidability of the halting problem has no
practical effect because real world computers are actually finite state
machines and not turing machines. This argument is true in only the most
superficial way.

I haven't read the paper and so don't know how applicable it is, I just don't
like your argument.

~~~
ubershmekel
The undecidability of the halting problem has no practical effect.

~~~
pure-awesome
That statement is too strong.

The practical effect is that you should not try to build an algorithm to
detect infinite loops in general (though you wouldn't need undecidability for
this, per se. E.g., NP-completeness of the halting problem would probably have
been sufficient to kill attempts at a general algorithm dead in the water).

Of course, it is true in specific cases that you can decide whether a given
program halts, and, indeed in almost all practical instances if you know how
to solve a given problem then you should be able to construct an algorithm to
solve it which provably halts. (Not including, of course, things that depend
on environmental inputs, something that computes whether a given number
satisfies Goldbach's conjecture, etc.)

There is, of course, enormous theoretical value in the undecidability of the
halting problem, and in Kurt Gödel's undecidability of a general axiomatic
system. And this theoretical value guides future theoretical research which is
likely to lead to further practical value in future - which indicates a
practical effect in the long-term. Also, of course, it indicates that an
attempt to build a general theorem-prover is a useless endeavor (though
approximate / specialized theorem provers are, of course, possible and indeed
exist).

In particular, Gödel's undecidability has quite real implications for an
Artificial General Intelligence which would have to reason about mathematics
in general, and about its own thought processes in particular. Under classical
mathematical reasoning, such an AI would be unable to "have confidence" in its
own reasoning process. However, if the assumptions are relaxed from statements
with 0/1 truth values to statements with probabilities, and time-independence
of truth is relaxed, it is possible to define an agent which is able to assign
(almost) consistent (actually "coherent") probabilities to what its own
beliefs w̶i̶l̶l̶ ̶b̶e̶ ̶i̶n̶ ̶t̶h̶e̶ ̶l̶i̶m̶i̶t̶ ̶(̶a̶s̶ ̶t̶ ̶g̶o̶e̶s̶ ̶t̶o̶
̶i̶n̶f̶i̶n̶i̶t̶y̶)̶) are at time t. See
[https://arxiv.org/abs/1609.03543](https://arxiv.org/abs/1609.03543) for more
details. This is probably the strongest such statement that can be made in
this class of arguments.

~~~
red75prime
To be more concise. The undecidability of the halting problem has no practical
effect, barring usual considerations about fundamental research.

------
sampo
The actual paper:
[https://www.nature.com/articles/s42256-018-0002-3](https://www.nature.com/articles/s42256-018-0002-3)

(I find the actual research paper easier to understand than the news
coverage.)

------
roywiggins
I much prefer Yedidia & Aaronson's paper that explicitly builds a Turing
machine whose halting behavior is independent of ZFC:

[https://www.scottaaronson.com/busybeaver.pdf](https://www.scottaaronson.com/busybeaver.pdf)

~~~
jeremysalwen
Isn't this an effective argument that ZFC's axioms are "not enough"?

~~~
roywiggins
I'm pretty confident that no set of axioms could make every Turing machine's
halting problem decidable. From the paper:

"consider a Turing machine Z that enumerates, one after the other, each of the
provable statements in ZFC. To describe how such a machine might be
constructed, Z could iterate over the axioms and inference rules of ZFC,
applying each in every possible way to each conclusion or pair of conclusions
that had been reached so far. We might ask Z to halt if it ever reaches a
contradiction; in other words, Z will halt if and only if it finds a proof of
0 = 1. Because this machine will enumerate every provable statement in ZFC, it
will run forever if and only if ZFC is consistent. It follows that Z is a
Turing machine for which the question of its behavior (whether or not it halts
when run indefinitely) is equivalent to the consistency of ZFC. Therefore,
just as ZFC cannot prove its own consistency (assuming ZFC is consistent), ZFC
also cannot prove that Z will run forever."

This argument works for every axiomatic system: there will be a Turing machine
whose halting problem cannot be proved in it. If there wasn't, you'd be able
to prove that the consistency-checking machine would halt, thereby proving
that the system is consistent, which is impossible per Godel incompleteness.

~~~
IngoBlechschmid
Indeed, precisely that is true.

The shocking conclusion of Gödel's incompleteness theorem is that the problem
that our logical foundations are incomplete is _inherently unfixable_ ; any
extended system of axioms will have new statements which are true but which it
cannot prove.

(That said, there are certain sub-areas of mathematics where we can prove that
every true result in one of these sub-areas is actually provable from a
certain system of axioms adapted to the given sub-area. But there is no way
around Gödel's incompleteness phenomenon if one wants to encompass all of
mathematics (and even arithmetic with natural numbers suffices to yield the
incompleteness phenomenon).)

------
goldenkey
Though I haven't analyzed the particulars of this article or the paper -- I
can say: we already know that compression is incomplete. That is: we cannot
know the actual Kolmogorov Complexity of a piece of data due to the halting
problem. It shouldn't be surprising then, that programs of a certain schema,
Neural Networks, suffer from similar issues. One could suppose that neural
networks that are less code-parameterized, and more data-parameterized
(weights), would be less prone to having divergences. Well, it's already
established that the more data-like NN aren't turing complete, and aren't
powerful enough to solve the kind of problems that we really want to solve
(AI.) We have to turn to Hopfield Nets, Boltzman Machines, and RNNs for that.
The learning/training process for these nets is pretty much encumbered by
their capabilities. That is, exploring a field of numbers is one thing.
Exploring a field of code? Code<->Data is the one function in the entire
universe that is the most non-linear. It is the one function that cannot be
described concisely by mathematics. It's like Wolfram terms, "computationally
irreducable." The closer a NN reflects an actual space of turing-complete
functions, the farther it is from actually being trainable. Alas...we willl
figure out middlegrounds as we have already.

[1]
[https://en.wikipedia.org/wiki/Kolmogorov_complexity](https://en.wikipedia.org/wiki/Kolmogorov_complexity)

~~~
selestify
What do you mean by divergences?

------
hexhex
The author puts Gödel's Incompleteness and the Continuum hypothesis on the
same level, which is misleading. The continuum hypothesis is unprovable in our
current mathematical foundation ZFC, but there are extensions to ZFC that
either make the continuum hypothesis true or false.

Incompleteness is a property of every sufficiently complex formal system and
thus poses a general constraint in logic.

That a particular learning problem is not provable in ZFC is not that
surprising. Connections between learning and Incompleteness are way more
interesting (and there is a lot of pseudo-research going on, "proving" that
humans are not simulatable by computer etc.)

------
yters
How would this affect the practical nature of the problem? Also, why is this
novel? All machine learning algorithms can be implemented on a Turing machine,
so they all are subject to the halting problem.

~~~
amelius
> All machine learning algorithms can be implemented on a Turing machine, so
> they all are subject to the halting problem.

I think the argument should be the other way around.

------
hitsthings
I bought into this article until "the field of machine learning, although
seemingly distant from mathematical logic" and then I sold it all.

------
jeromebaek
I'm curious about the applications of this to machine learning in public
policy. In papers like Big Data's Disparate Impact
([https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2477899](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2477899))
we see misguided and underbudgeted policymakers using very sketchy and very
opaque ML algorithms to, for example, decide who stays in prison and who does
not.

If we can prove that the "decision problem" of who stays in prison and who
does not is _undecidable_ , invariant of the specific implementations of the
ML algorithm, this could make a case for stopping such overreaches of ML.

------
alhazen
As the authors say in their paper, "[a] closer look reveals that the source of
the problem is in defining learnability as the existence of a learning
function rather than the existence of a learning algorithm". Algorithmic
decision making under uncertainty is NOT the same thing as finding learning
functions over infinite domains. Sometimes such a function does not exist. Why
should it? Although the authors seem pretty much aware of this fundamental
shortcoming in their analysis, they nevertheless go ahead and publish a whole
paper about learning functions! This has a name, intellectual fraud.

------
jcoffland
> Because EMX is a new model in machine learning, we do not yet know its
> usefulness for developing real-world algorithms. So these results might not
> turn out to have practical importance.

I don't think that's the correct conclusion. More likely, all learning models
have unprovable problems so the choice of model is not important wrt the
existence of unprovable problems.

Really, this is not a surprising result. All mathematical systems gave
unprovable problems. ML is a mathematical system and no exception.

------
gromit1987
Can someone explain the meaning of the sentence: ``All distributions in this
text are finitely supported over the σ-algebra of all subsets of X'' ? On the
one hand, it is impossible to define a probability measure on the σ-algebra of
all subsets of [0,1]. On the other hand, if a distribution is finitely
supported, then there is no point in using a σ-algebra different than what is
generated by its support, right?

------
DiegoDiazEspin
Conclusions emphasize on no practical use for machine learning algorithms but
in the way learnability is defined and treated while using mathematical models
to describe it. Some of this mathematical models could result in undecidable
algorithms if the wrong model is chosen

------
sgt101
The idea that machine learning is a subset of mathematics is very dangerous.
In the real world the data is never prepped and clean, and the system is
complex and interacts with humans and society at many levels.

~~~
yonkshi
..could you elaborate? Are you saying ML is dangerous because it’s based on
mathematics, and math is too clean to handle real world data?

~~~
syn0byte
I think it's just that calling it "maths" tends to give a false sense of
certainty where none is warranted.

Example: All the really clever math you use to make an encryption algorithm is
all 100% correct. Then all the really clever math you use to show that it
would take the heat death of the universe to crack your clever encryption is
100% correct. The user uses 'password' as the key; How does your crypto stand
up to a brute force? Is that your algorithms fault? Did your difficulty proof
lie to you?

I know key length is a well understood. In terms of how algorithmically
"valid" real world data that can otherwise torpedo entire complex systems,
it's as good an example as any.

~~~
Rainymood
Machine learning is literally mathematics, or more specifically, applied
statistics. However, human stupidity can never be ruled out of the equation.
Not calling something mathematics while it simply is mathematics is
obfuscating the issue.

------
dooglius
> Because EMX is a new model in machine learning, we do not yet know its
> usefulness for developing real-world algorithms. So these results might not
> turn out to have practical importance. But we do now know that we should be
> careful when introducing new models of learning. Moreover, we might need to
> look again at the many subtleties that can come up, even in established
> learning models.

In other words, "Machine Learning" here is a buzzword, without which the paper
would probably get less attention.

~~~
logancg
How do you come to the conclusion that ML is a buzzword here? It's natural to
publicize interesting new research that grounds a field -- as statisticial
learning theory does for machine learning. Historically, Ben-David and co-
authors have produced foundational work on the implications of SLT in ML.

~~~
dooglius
The problem is that the model was invented in this paper itself, and doesn't
actually seem to relate to any actual ML algorithm. Admittedly, I didn't read
the paper itself so this may be the overreach on the article's part.

~~~
zwaps
I don't think that can be true. If it was, it were like this:

Hey guys, here's an algorithm. Oh by the way, we can not prove that this
algorithm does something useful, and their can you.

 _gets published in nature_

Talk about bang for the buck

------
Frenum
For me, the interesting conclusion is that there are things that humans are
mathematically incapable of knowing.

~~~
Rainymood
>For me, the interesting conclusion is that there are things that humans are
mathematically incapable of knowing.

Define

>mathematically

>incapable

>knowing

These hastily drawn pop-sciency conclusions are part of the problem why ML/AI
has garnered so much media attention these days ...

~~~
Frenum
I probably should have used the word "implication" instead of "conclusion". My
thought process was that there are certain input functions that the learning
structures within the human neural network can't discern.

