Hacker News new | past | comments | ask | show | jobs | submit login
Unprovability comes to machine learning (nature.com)
218 points by joker3 69 days ago | hide | past | web | favorite | 124 comments



It's funny to note that the first author signed a boycott [0] for publishing research in Nature's new journal (Nature Machine Intelligence) covering AI topics but is one of the first to publish there.

[0] https://openaccess.engineering.oregonstate.edu/signatures


Step 1: Sign the Nature boycott.

Step 2: Get boycott popular. Less people submitted to Nature.

Step 3: Submit to Nature. Accepted due to less competition.


This is a News and Views piece, published in the journal Nature, about a paper in Nat. Mach. Int. The distinction is important.


That is an important distinction, but in this case the problem still applies.

The boycott doesn't apply to Nature, and arguably doesn't apply to non-publication writing like News and Views. Rather the boycott is of Nature Machine Intelligence, and the first author of the underlying paper in NMI is a signatory on the boycott list.


We are talking about the paper this news piece is talking about.


Step 4: ???

Step 5: Profit!!


Step 4: Increased h-index

Step 5: You get tenure and more grants


Similar: In the recent baby gene editing controversy around He Jiankui it became apparent that he just had co-authored a paper proposing a set of ethical rules around human gene editing, which he promptly broke himself.

https://www.wired.com/story/he-jiankui-crispr-babies-bucked-...


Is this some next-level of publishing scandal? First there was a reproducability problem, now there's a fake-and-jab? What's next?


Yeah, even funnier though is anyone can submit signatures to this with no verification. I just signed it for you (search for your username). Who knows which of these people actually 'signed' or not. Computer scientists!


The article is published in Nature, not in Nature Machine Intelligence.

The names may sound similar, but there’s a big, big difference.


https://www.nature.com/articles/s42256-018-0002-3

Published in Nature Machine intelligence. Publicized through the regular Nature site.


It's published in Nature Machine Intelligence:

https://www.nature.com/articles/s42256-018-0002-3

But at least this article seems to be available as Open Access. I am not sure if this is temporary or not.


Do as I say, not as I do.


Most researchers are not independent and don't have much choice where they publish. Also, prestige plays a huge role in academia. Getting in Nature is a big deal to them. That's one reason I really dislike academia. They are often playing games that are very far from genuine progress. It's all ladder climbing and groveling at their masters feet. Someone please disrupt academia.


You perfectly described all the non academic jobs I've ever had.


I have worked both in academia and in the industry. Academia is heaven when compared to some of the serpentine performance evaluations in the industry. I have even seen people in the same team undermining each other's work because the top 2 in the group would earn a bonus.

The real merit of academia is that people can be alone in a corner doing one's work, if they choose to. Industry enforces socialisation, conformity to norms, and shmoozing.

Also, the track record of the academic world vis-a-vis disruption is impressive. The oldest western universities are around 900 years old.

That said, the article seems to be a fluff piece about a good but not groundbreaking paper. (Nature MI is not Nature, not by a long shot. It does not have any prestige, yet.)


> The real merit of academia is that people can be alone in a corner doing one's work, if they choose to. Industry enforces socialisation, conformity to norms, and shmoozing.

That's a bit dubious. If you don't network in the academia I doubt you can get far at all, while you can very well just comfortably do your job as a technical employee even if you are not really into networking (if you don't have further ambitions). Networking is absolutely essential to would-be professors or anybody who wants long-term stable employment in the academia, and the competition is much fiercer.


It sounds like we can too determine its practical effect, which is zero. No real-world machine learning algorithm can use the real numbers. We can not obtain them to feed them as input, our real-world machines can not process them if we could, and if we got real-number-based answers, we could not use them. Whether the universe itself is or is not based on real numbers, we do not have access to them either way. Our real-world machines only have access to the computable numbers (and only a subset of them, at that).

I'm only speaking to the practical impact.

It's possible someone will find some way work this down into something practical, but I'd be fairly surprised. Real machine learners get finite data, for which the continuum hypothesis is irrelevant.

(I assume the original authors fully understand this.)


The press release says "So these results might not turn out to have practical importance."

As you state, that's a much weaker statement than the truth: "So these results almost certainly have no practical importance", and as expected, the stronger statement is nowhere in the original paper.

The press release, while being careful not to make false statement, commits a pragmatics sin of letting the reader think there is something going on (by saying "maybe" where they should say "almost certainly not", and by even publushing in the first place a pop-science summary of a highly technical result that has no meaning to a popular audience, and so necessarily the (abstract mathematical, unphysical) importance of the reuslt is completely destroyed in translation.


That's like saying that the undecidability of the halting problem has no practical effect because real world computers are actually finite state machines and not turing machines. This argument is true in only the most superficial way.

I haven't read the paper and so don't know how applicable it is, I just don't like your argument.


C++ grammar is undecidable [0]. The solution? Throwing your hands up and giving up on writing C++ compilers is certainly not the solution. Compiler writers just limit template instantiation depth making the compiler reject some correct programs.

[0]: http://blog.reverberate.org/2013/08/parsing-c-is-literally-u...


The undecidability of the halting problem has no practical effect.


That statement is too strong.

The practical effect is that you should not try to build an algorithm to detect infinite loops in general (though you wouldn't need undecidability for this, per se. E.g., NP-completeness of the halting problem would probably have been sufficient to kill attempts at a general algorithm dead in the water).

Of course, it is true in specific cases that you can decide whether a given program halts, and, indeed in almost all practical instances if you know how to solve a given problem then you should be able to construct an algorithm to solve it which provably halts. (Not including, of course, things that depend on environmental inputs, something that computes whether a given number satisfies Goldbach's conjecture, etc.)

There is, of course, enormous theoretical value in the undecidability of the halting problem, and in Kurt Gödel's undecidability of a general axiomatic system. And this theoretical value guides future theoretical research which is likely to lead to further practical value in future - which indicates a practical effect in the long-term. Also, of course, it indicates that an attempt to build a general theorem-prover is a useless endeavor (though approximate / specialized theorem provers are, of course, possible and indeed exist).

In particular, Gödel's undecidability has quite real implications for an Artificial General Intelligence which would have to reason about mathematics in general, and about its own thought processes in particular. Under classical mathematical reasoning, such an AI would be unable to "have confidence" in its own reasoning process. However, if the assumptions are relaxed from statements with 0/1 truth values to statements with probabilities, and time-independence of truth is relaxed, it is possible to define an agent which is able to assign (almost) consistent (actually "coherent") probabilities to what its own beliefs w̶i̶l̶l̶ ̶b̶e̶ ̶i̶n̶ ̶t̶h̶e̶ ̶l̶i̶m̶i̶t̶ ̶(̶a̶s̶ ̶t̶ ̶g̶o̶e̶s̶ ̶t̶o̶ ̶i̶n̶f̶i̶n̶i̶t̶y̶)̶) are at time t. See https://arxiv.org/abs/1609.03543 for more details. This is probably the strongest such statement that can be made in this class of arguments.


To be more concise. The undecidability of the halting problem has no practical effect, barring usual considerations about fundamental research.


Isn't it one of the reason limiting the power of static analysis and force us to use dynamic analysis to get certain information about the correctness of our programs ?


If the halting problem was merely NP complex, it would have still limited practical possibilities of static analysis.


Eh, modern SMT solvers are very practical tools, as are solvers for Mixed Integer Linear Programs. NP-completeness doesn't mean that you can't solve most instances you care about in reasonable time.


Yes. And the halting problem doesn't preclude static program analysis even if it works with only a subset of all possible programs.


It looks like we're in perfect agreement then.


>We can not obtain them to feed them as input, our real-world machines can not process them if we could...

I'm not sure about this. We often make calculations with computer representations of real numbers. We can calculate quantities based on algebraic or transcendental numbers with arbitrary precision (optimizations, navigation), or we can use them as a basis (image processing) and represent rational numbers in their terms. Real numbers are all over computer applications.


Arbitrary precision is irrelevant. By it we mean arbitrary finite precision, which leaves us not only with computable numbers, but rationals. Representing rationals isn't affected by the paper's findings, because they have the same cardinality as the naturals. Arbitrary precision models reals, it doesn't actually use them.


>We can not obtain them to feed them as input, our real-world machines can not process them if we could

Suppose we are only working with rational multiples of Pi. Then we can ONLY have finite calculations of irrational numbers and it's impossible to have a finite calculation of a rational number (except for multiplication by 0).

>Representing rationals isn't affected by the paper's findings, because they have the same cardinality as the naturals.

The Natural numbers are still infinite which presents its own inherent difficulty. It's not as though countability lends itself better to programming.


First and foremost:

> The Natural numbers are still infinite which presents its own inherent difficulty. It's not as though countability lends itself better to programming

Actually countability does lend itself to programming, essentially by definition. You cannot write a function to enumerate all the real numbers precisely because they're uncountable, even if you had an infinite amount of time. You can trivially do this with a "merely" countably infinite set, like the naturals or rationals.

> Suppose we are only working with rational multiples of Pi. Then we can ONLY have finite calculations of irrational numbers and it's impossible to have a finite calculation of a rational number (except for multiplication by 0).

It sounds like you're suggesting we can store an irrational number in a finite representation by generating it with a finite function (e.g. multiply a rational number of times). But that doesn't work. You can't represent a number N in a finite number of steps if you can't represent all factors of N in a finite number of steps.

This fact follows from the field axioms for the same reason that all products of irrationals and rationals are themselves irrational. Simply put, if you can't do it for Pi, you can't do it for a rational multiple of Pi.


>You cannot write a function to enumerate all the real numbers precisely because they're uncountable, even if you had an infinite amount of time

Even with infinite amount of time your enumeration function will not generate all of the natural numbers even though the natural number are countable.

You're right that we can't write a function to determine an immediate successor but we can write successor functions.

>It sounds like you're suggesting we can store an irrational number in a finite representation by generating it with a finite function (e.g. multiply a rational number of times). But that doesn't work. You can't represent a number N in a finite number of steps if you can't represent all factors of N in a finite number of steps.

We are getting a few concepts muddled. Storing an irrational number as a finite representation is trivially easy. For example: "pi". Alternatively I could write an algorithm to generate the decimal expansion of the square root of 2. Where it's difficult is when we talk about numbers that aren't describable in a finite number of terms, i.e. Uncomputable. There is a certain unknowable quality to these numbers and I would suggest that this isn't inherent from the cardinality of the real numbers and that we run into similar problems when trying to compute some natural numbers, like random numbers.


> Even with infinite amount of time your enumeration function will not generate all of the natural numbers even though the natural number are countable.

Huh? You can trivially write a function to generate all natural numbers:

    def increment(x):
        return increment(x + 1)
    increment(1)
How do you propose doing that for the reals?

> Where it's difficult is when we talk about numbers that aren't describable in a finite number of terms, i.e. Uncomputable.

This is not what uncomputable means. Uncomputable means we cannot calculate a number to any precision, not that the number has an infinite decimal expansion.

Pi is not uncomputable. You can’t store its entire decimal expansion in finite space but it is certainly computable - we compute it all the time. However, almost all real numbers are strictly uncomputable. Among many other proofs, this is evident from the simple fact that the set of all possible Turing machines is countable. Since the reals are uncountable, it follows that almost all reals must not be computable by any Turing machine.

Likewise, there are many potential calculations involving real numbers for which a computer will output a strictly incorrect - not just imprecise - result because its real solution is uncomputable. This does not happen in the naturals or rationals.


> Uncomputable means we cannot calculate a number to any precision

Did you mean “for any precision, we cannot compute to that precision” or “we cannot compute to every precision”?


> Where it's difficult is when we talk about numbers that aren't describable in a finite number of terms, i.e. Uncomputable.

I might be nitpicking here, but uncomputable != indescribable. For example, Chaitin’s constant [1] is uncomputable but perfectly describable (and in fact limit-computable [2]). The set of arithmetically-definable [3] numbers is even larger than that of limit-computable numbers. Computable and limit-computable numbers lie at levels Δ^0_1 and Δ^0_2 of this hierarchy, respectively.

[1] https://en.wikipedia.org/wiki/Chaitin%27s_constant

[2] https://en.wikipedia.org/wiki/Computation_in_the_limit

[3] https://en.wikipedia.org/wiki/Arithmetical_set


All of the rationals are computable. Almost all of the irrationals aren't computable at all... so there's a big difference there.


Yeah, and I suspect that has a lot to do with the underlying mathematical principles too. But the paper instead reduces the issue to the Continuum Hypothesis. The problem might not even have anything to do with countability but instead be fundamentally about the differences between infinite cardinals.


> Suppose we are only working with rational multiples of Pi.

How would you start? You can't put a rational multiple of pi exactly into a computer. You can't get a rational multiple of pi exactly out of a computer. You can calculate rational numbers on the computer and scale them by a factor of pi outside the computer, if you really want, but that doesn't mean the computer is working with irrationals.

> It's not as though countability lends itself better to programming.

Huh? Yes it does; countable sets are still infinite but they're better-behaved than bigger sets.


>You can't put a rational multiple of pi exactly into a computer.

Of course we can: "1/2*pi"

>You can't get a rational multiple of pi exactly out of a computer.

Using the same limitations you seem to have in mind, we can't get one third exactly either.

>...countable sets are still infinite but they're better-behaved than bigger sets.

Yes, and that is surprising because we do have numerous clever applications that skirt around most of the intractability of exotic real numbers. That is why I think this paper is a bid deal, especially because it points to the cardinality of the real numbers instead of, say, their inclusion of the uncomputable numbers.


> Using the same limitations you seem to have in mind, we can't get one third exactly either.

We can work with exact rationals; we can do arithmetic on them and normalize them. In particular, we can compute whether two rationals are equal (in finite time), which we can't do with reals.


No, we often make calculations with a finite, countable subset of real numbers. We can only approximate the real numbers with finite precision.


>we often make calculations with a finite, countable subset of real numbers.

We also often make calculations with finite subsets of rational numbers. We don't work with all of the rational numbers.

> We can only approximate the real numbers with finite precision.

The same can be said for rational numbers. It really matters that we can be arbitrarily precise when we talk about .333333... (one third), 1.41421... (square root of 2) or 3.14159... (pi) because we can be as exact as our calculations demand.


Two points:

First, yes the rationals are already countable. But the reals are uncountable. You cannot write a function to generate all real numbers - not even if you allowed the function to run for an infinite amount of time.

Second, most of the reals are individually uncomputable, which literally means no, we cannot calculate them to arbitrary precision (or any exact precision).

Likewise we cannot just redefine the set of real numbers to be restricted to the set of computable reals. Analysis rapidly breaks down when you do so and it follows that the real field is no longer closed (by definition, and therefore no longer subject to the field axioms).


>Likewise we cannot just redefine the set of real numbers to be restricted to the set of computable reals. Analysis rapidly breaks down when you do so and it follows that the real field is no longer closed (by definition, and therefore no longer subject to the field axioms).

I was with you until you wrote that Analysis would break down. We can easily do analysis with dense subsets of the reals and also extensions of the reals, like in Non-Standard Analysis.


> Likewise we cannot just redefine the set of real numbers to be restricted to the set of computable reals. Analysis rapidly breaks down when you do so and it follows that the real field is no longer closed (by definition, and therefore no longer subject to the field axioms).

Whilst this is true, I feel it should be noted that there exist constructive and computable versions of analysis which exist to work under under assumptions like "all the reals are computable".

Of course, the big difference is that we've mostly agreed on a single canonical version of the classical reals and of classical analysis, whilst there are numerous versions of different constructive analyses depending on what exactly you mean by "constructive".

Additionally many theorems of classical analysis "split up" into multiple independent versions which would be equivalent classically, but only some of which are true constructively.

Worse than that, often there is not a single definition of the "reals" any more. For example, two definitions of the reals (Dedekind Cuts and Cauchy Sequences) may result in entirely different objects constructively speaking. And, in fact, it may be that neither one is actually a good way of capturing the concept of a "real" "number" in that particular universe, so you need yet another definition.

As terrible as this might make it sound, it is worth the effort, in my opinion. In trying model the real world, classical mathematics throws away just a little bit too much information, in my opinion. This messiness is not just the result of wanting to work under some weird axioms, but an actual statement about the messiness of reality. Trying to sweep these issues under the rug is not a sustainable strategy if you eventually want to solve real-world problems.

Using classical mathematics is fine as a first-pass. It's a good way of investigating the problem under some simplifying assumptions and getting some good and useful results. But after that is done, it's not always enough to just declare the problem "solved" (or "unsolvable"). You need to consider what the practical version of the results will be, especially if you want to write a program to use them.

I am glad that the world is moving towards more computable and constructive versions of mathematics. It is still early days, and there are still way too many different takes on trying to solve the same mathematical issues, but soon enough we'll converge on a select few "generally acceptable" versions of constructive mathematics (in particular, something based on some form of constructive Type Theory) and we'll be able to reform our mathematical and scientific knowledge on a more stable base.


Dedekind cuts and Cauchy sequences don't result in different reals. They're different methods of proof, but the resulting sets are isomorphic to one another, so they're structurally the same even if a real from one looks a bit different from a real in the other.

All the Dedekind cuts and Cauchy sequences do is tell us, in essence: "yes, the axiomatized reals you're using are correct and fine because there are constructive proofs of their existence." Moreover, the point of those proofs (and analysis more generally) is to demonstrate mathematical continuity and uncountability. No matter how you slice this problem, if you construct analysis you will lose computability along the way.


I think there's perhaps a misunderstanding.

You are correct that under classical assumptions of logic ("normal" mathematics), the Dedekind Cuts and Cauchy Sequences indeed result in the construction of isomorphic objects.

However, under constructive assumptions of the logical axioms of mathematics, such as intuitionism (not accepting the law of excluded middle, i.e., assuming that the statement "P and NOT P" is not necessarily true for all P), the construction of the Dedekind Cuts and the Cauchy Sequences may result in different objects, i.e., it may no longer provable that they are equivalent.


We can effectively calculate all of the rationals to arbitrary precision, but we can't do that with all of the reals. Most of the reals have no effective procedure to calculate their decimal expansion.


Right, but we're getting away from illustrating the point of the paper, which isn't about number computability (interestingly) but about cardinality of sets. The intractability of this machine learning problem could have more to do with an absence of a successor function in the Real numbers than about the cardinality of uncomputable numbers. Or it could be a consequence of properties of differing infinite sets in general, like the difference between the Reals and the power set of the Reals. What I'm trying to get at is that we're not just talking about infinite decimal expansions.


It's extremely counter intuitive what would happen if a computer system (a Turing machine) was operating symbolically on real numbers, not floating point numbers which have vastly different properties (for example, addition is not associative there although it is commutative). We can refer to constructive logic proofs in the real number theory. There's a ton of surprising facts there, to pick two:

  1. For x and y you cannot prove that x<=y or x=>y
  2. Every function is continuous
Some hand-waving to see why 1. is a problem: given two numbers, x and y, as a lazily-computed infinite decimal expansion, the program that loops in parallel over the expansions and seeks the first difference (to determine which number is greater) will never terminate if x and y have identical expansions.

See this submission from two months ago: https://news.ycombinator.com/item?id=18411935


Maybe I'm just arguing semantics here but I would say that while calculations are carried out using floating points the numbers themselves are represented symbolically. For example, if you came up with an algorithm to calculate euclidean distance then you would eventually employ a square root algorithm instead of an approximation of a square root stored as a floating point number. The algorithm "sqrt(2)" is a symbolic representation of the square root of 2.


Yeah I meant that there's another layer to the symbolic representation of "sqrt(2)" than just the tree of operation, so that you can ask sensible questions about the value of the number. For instance a generator of decimal representation or a function like "given N return a fraction q/p that's within 1/2^N of the value".

And floats are really a whole other beast than fractions or naturals or real numbers. As I wrote earlier - the "+" operator isn't associative. Python snippet:

    def before():
        x = 0.0
        x += 1.0
        for i in xrange(1000):
            x += 1e-17
        return x

    def after():
        x = 0.0
        for i in xrange(1000):
            x += 1e-17
        x += 1.0
        return x


    print before() == 1.0
    print after() == 1.0

Will print:

   True
   False


Rewrite your functions to be generator functions. They don't even need to be based on the desired level of precision to always be decidable.


Not sure what your point is. I can just suggest to see the PDF: https://news.ycombinator.com/item?id=18411935


Isn't it also possible for two numbers to have unequal expansions but still be equal? Take .3... * 3 vs 1. The first expands to .9... while the second expands to 1.0... but they are also equal.


Right, but note that we can use numbers from countable sets and find different symbolic representations for them (3.0 vs 3). That problem is not inherent from the uncountability of the real numbers.


Yup, it is possible. That was just a hand wavy example to show where the issue is. Symbolic real numbers need not be represented as small programs/objects generating a sequence of digits. But all of those implementations/representations will fail to compare some numbers.

Edit: for instance, you could imagine a representation (think object of a virtual class in c++) that gives you a fraction and a 1/2^n bound for error, where you can query the number for whatever n you see fit. Still, in some cases it's undecidable to determine which number is greater.


There are no real numbers in nature, as the Heisenberg uncertainty principle means that all measurements have an error bound--i.e. are intervals.

Our mathematical tools for quantum mechanics are wave functions, with presumed real-valued quantities. Problem is, we can never feed them precise enough inputs for that to matter, since all of our measurements are interval-valued quantities.


>No real-world machine learning algorithm can use the real numbers.

That depends entirely on what one means by "real numbers".

See previous hacker news discussion of a different paper on real numbers, here:

https://news.ycombinator.com/item?id=17291752


To the many people who attempted to correct me about what portion of the reals we can use, the computable numbers are exactly that portion (or, as I said, the subset the of it we are large enough to represent). Look up "computable number". Pi, and all rational multiples of pi, are in that set, for instance.

It has many interesting properties. Among those properties is that it contains every number you could every conclusively identify, and that it is only countably infinite, meaning it is much smaller than the reals. It is useful for many things (and a good thing it is, since it is the most sophisticated set of numbers we actually get to compute with), but it is no substitute for the reals.


It seems to me that if your argument is valid then the situation is even worse for machine learning.

That is to say, if machine learning cannot use the real numbers then they cannot solve any problem which depend on any properties of the reals, such as producing a machine learned function to compute the square root of two.

Now, it may be true (in fact I think it likely to be true) that the cardinality of the reals is a little bit different of a property than the denseness, enough so that machine learning can approximate the denseness and not the cardinality, but the reason for this doesn't seem to be simply "machine learning doesn't use the real numbers".


That is not surprising at all. The set of real numbers that can be produced by a computer is countable. That means that almost all real numbers are uncomputable. (Square roots of natural numbers are computable though.)


You can certainly use real numbers. You just have to pick the finite subset of them that you need for a given problem and an appropriate encoding for them.


That's not what they mean by "use real numbers." If you're using a finite subset of the reals, you're not using the reals. We can only approximate real numbers in a finite representation.


That's all anybody has ever done. You may as well be claiming that the reals don't exist because no mathematician has conceptualized one non-symbolically and non-approximately.

When you do calculus on the reals, you only ever approximate or encode a finite (really, countable) subset of them.


The actual paper: https://www.nature.com/articles/s42256-018-0002-3

(I find the actual research paper easier to understand than the news coverage.)


I much prefer Yedidia & Aaronson's paper that explicitly builds a Turing machine whose halting behavior is independent of ZFC:

https://www.scottaaronson.com/busybeaver.pdf


Isn't this an effective argument that ZFC's axioms are "not enough"?


I'm pretty confident that no set of axioms could make every Turing machine's halting problem decidable. From the paper:

"consider a Turing machine Z that enumerates, one after the other, each of the provable statements in ZFC. To describe how such a machine might be constructed, Z could iterate over the axioms and inference rules of ZFC, applying each in every possible way to each conclusion or pair of conclusions that had been reached so far. We might ask Z to halt if it ever reaches a contradiction; in other words, Z will halt if and only if it finds a proof of 0 = 1. Because this machine will enumerate every provable statement in ZFC, it will run forever if and only if ZFC is consistent. It follows that Z is a Turing machine for which the question of its behavior (whether or not it halts when run indefinitely) is equivalent to the consistency of ZFC. Therefore, just as ZFC cannot prove its own consistency (assuming ZFC is consistent), ZFC also cannot prove that Z will run forever."

This argument works for every axiomatic system: there will be a Turing machine whose halting problem cannot be proved in it. If there wasn't, you'd be able to prove that the consistency-checking machine would halt, thereby proving that the system is consistent, which is impossible per Godel incompleteness.


Indeed, precisely that is true.

The shocking conclusion of Gödel's incompleteness theorem is that the problem that our logical foundations are incomplete is inherently unfixable; any extended system of axioms will have new statements which are true but which it cannot prove.

(That said, there are certain sub-areas of mathematics where we can prove that every true result in one of these sub-areas is actually provable from a certain system of axioms adapted to the given sub-area. But there is no way around Gödel's incompleteness phenomenon if one wants to encompass all of mathematics (and even arithmetic with natural numbers suffices to yield the incompleteness phenomenon).)


Though I haven't analyzed the particulars of this article or the paper -- I can say: we already know that compression is incomplete. That is: we cannot know the actual Kolmogorov Complexity of a piece of data due to the halting problem. It shouldn't be surprising then, that programs of a certain schema, Neural Networks, suffer from similar issues. One could suppose that neural networks that are less code-parameterized, and more data-parameterized (weights), would be less prone to having divergences. Well, it's already established that the more data-like NN aren't turing complete, and aren't powerful enough to solve the kind of problems that we really want to solve (AI.) We have to turn to Hopfield Nets, Boltzman Machines, and RNNs for that. The learning/training process for these nets is pretty much encumbered by their capabilities. That is, exploring a field of numbers is one thing. Exploring a field of code? Code<->Data is the one function in the entire universe that is the most non-linear. It is the one function that cannot be described concisely by mathematics. It's like Wolfram terms, "computationally irreducable." The closer a NN reflects an actual space of turing-complete functions, the farther it is from actually being trainable. Alas...we willl figure out middlegrounds as we have already.

[1] https://en.wikipedia.org/wiki/Kolmogorov_complexity


What do you mean by divergences?


The author puts Gödel's Incompleteness and the Continuum hypothesis on the same level, which is misleading. The continuum hypothesis is unprovable in our current mathematical foundation ZFC, but there are extensions to ZFC that either make the continuum hypothesis true or false.

Incompleteness is a property of every sufficiently complex formal system and thus poses a general constraint in logic.

That a particular learning problem is not provable in ZFC is not that surprising. Connections between learning and Incompleteness are way more interesting (and there is a lot of pseudo-research going on, "proving" that humans are not simulatable by computer etc.)


How would this affect the practical nature of the problem? Also, why is this novel? All machine learning algorithms can be implemented on a Turing machine, so they all are subject to the halting problem.


As other commenters have pointed out, this is backwards. The unsolvability of the halting problem says that there is no one algorithm that can tell you whether an arbitrary Turing machine will halt. But for special types of Turing machines, it is often possible to say whether they will halt. So, if you define a special type of Turing machine, and it turns out that the halting problem is unsolveable even for that restricted type, that is noteworthy.


It's novel because no one had bothered to prove the result before.

It's not an earth-shattering paper, it's a technical curiosity.

the OP is a light article about a light paper, to promote a new lightweight journal that launched this week.

https://www.nature.com/natmachintell/


> All machine learning algorithms can be implemented on a Turing machine, so they all are subject to the halting problem.

I think the argument should be the other way around.


What do you mean by "subject to the halting problem"? There are plenty of Turing machines that can be proven to halt.

The halting problem doesn't say "given any Turing machine, no algorithm can determine whether it halts". It says "no algorithm can determine whether any given Turing machine halts".


I suspect the Halting Problem was already an obvious theoretical limitation to machine learning. Just imagine a network trained to classify programs according to whether or not they halt. Then construct a program that runs forever if the network says the program halts, and halts if the network says the program does not halt. No training strategy would ever produce a network that always gives the correct answer in this case.

In the Generative Adversarial Network (GAN) framework, this would be the One Weird Trick the generator could pull that would always foil the discriminator.

This paper is interesting because it connects learnability to a different kind of undecidability than the Halting Problem: the Continuum Hypothesis. In some ways it's more troubling than the Halting Problem, because at least we can usually figure out whether some actual program halts, using some cleverness, but no one can begin to say whether the Continuum Hypothesis is true or false.

The Continuum Hypothesis is more like P!=NP, in that most mathematicians suspect the CH is true (that is, there's no cardinality in between the integers and the reals). That's good news, because it implies the EMX learning model will work:

> Their results imply that the finite subsets of the unit interval have monotone-compression schemes, and therefore are learnable in EMX, if and only if the continuum hypothesis is true, which is known to be unprovable.

If someone comes along and casts doubt on the Continuum Hypothesis through some mystical but persuasive line of reasoning (news flash: not gonna happen), then there might be some EMX functions that can't be learned, but no one is going to lose sleep over that. There are lots of more immediate reasons why machine learning fails in practice, like not having good training data.


I'm confused. I'm pretty sure that the issue isn't "no one can begin to say whether the Continuum Hypothesis is true or false," but rather that we know it to be independent of the widely-used axiomatic set theories (at least ZFC). Since "truth" is defined in terms of the chosen axioms, and we have proven that the axioms of ZFC can neither prove nor disprove the continuum hypothesis, that means that we already know that you could arbitrarily choose to use "ZFC with CH" or "ZFC with !CH" and either will be consistent (or as consistent as plain old ZFC).

> The Continuum Hypothesis is more like P!=NP, in that most mathematicians suspect the CH is true (that is, there's no cardinality in between the integers and the reals).

I don't think this is correct. It's very well-accepted that CH is independent of ZFC. Now, you might mean that most mathematicians suspect that adding CH to ZFC is more intuitive/natural/useful than adding !CH to ZFC, but that's a quite different statement. I don't know much about this, but it seems plausible that both extensions of ZFC could be useful for different things, in the same way that using the parallel postulate is useful for some things (Euclidean geometry) and using alternative parallel postulates is useful for other things (non-Euclidean geometries).

The P=NP problem, AFAIK, isn't yet proven to be independent of ZFC, although I have heard some seemingly educated people suggest their suspicions that it may be.

And now you have me wondering if you win the Millennium Prize if you prove that P=NP is independent of ZFC. Surely you would.


Indeed, unlike the axiom of choice, which is routinely used by most working mathematicians, the continuum hypothesis is not. (In fact, most working mathematicians can probably not even state this hypothesis precisely, as they're concerned with different areas of mathematics and haven't received the necessary training in logics or set theory; and among set theorists, consequences of CH are explored just as often as consequences of !CH are. (CH does simplify several things, such as arithmetic with infinite cardinal numbers, but the point still stands.))

Yes, the P=NP problem is not yet proven to be independent of ZFC. It's certainly a possibility, and to be honest a quite intriguing and fun one, but we shouldn't make the mistake to think that any hard open problem is independent of ZFC. (I know that you didn't state that, I'm just writing this for the record.) Sometimes hard open problems are simply hard open problems.


I agree on your last point, but I will say the P=NP feels like a particularly good candidate for being independent of ZFC, not only because it seems to be very difficult, but because it feels so adjacent to things like the halting problem and incompleteness theorems. :)


> Now, you might mean that most mathematicians suspect that adding CH to ZFC is more intuitive/natural/useful than adding !CH to ZFC, but that's a quite different statement.

I guess I'm a constructivist about this.

You can't just wave your hands and take !CH as an axiom. Remember what that means: you reject that there is no set with cardinality between the integers and the reals, so you're implying there is such a set. I realize not all mathematicians share this point of view, but if you can't somehow construct the intermediate set you're claiming exists, or show that it can be constructed, then you don't really have any grounds for taking !CH to be true. Without a positive reason for believing such a set exists, I have a hard time pretending that !CH could possibly be a "useful" (your word) axiom to add to ZFC.

By contrast, imagining that CH might be true is easy. You don't have to construct any never-before-seen mathematical objects to make your point. You just presume, without fear of being contradicted by ZFC, that the integers are countably infinite, and the reals are uncountably infinite but only sort of "minimally" so (there are higher orders of uncountable infinity, but none lower). You may never be able to prove CH (certainly not with ZFC, and almost certainly not with any other logical system), but at least the model makes sense: there's no cardinality in between that of the integers and the reals.

Which is not to say that CH is particularly useful! In fact, maybe the most interesting thing about the paper is the idea that the CH could be meaningfully connected to anything in an applied form of math like machine learning. Does that weird connection between CH and ML have any practical consequences for ML in general, or the EMX learning algorithm in particular? Probably not, especially since assuming CH is true just implies EMX should work, so it's fine to keep using it to solve learning problems.

In other words, if you wanted to convince an ML practitioner that their learning algorithms might be unreliable because the Continuum Hypothesis might be false, I think they would be within their rights to request you show them an actual set with cardinality between the integers and the reals. When you couldn't, they would then be within their rights to ignore your objection.


Forgive me for not knowing much about constructivism. You mention having “a positive reason to believe such a set exists,” but what more reason could there be other than the fact that the axioms you’re using imply that such a set exists? Do constructivists believe that a prime with a googol base-10 digits exists in ZF, even though no one has constructed one?


Wouldn't a constructivist require an explicit bijection between R and aleph_1 before accepting CH?


Most mathematicians don't care at all about CH and among those who do not all believe that ZFC+CH is more intuitive than ZFC+not CH (really it makes no sense to believe whether it is true or not unless you believe in some platonic universe of sets, but I do maths, not philosophy), for example Gödel himself thought that the continuum could be aleph_2, Woodin's Omega logic argued the same (but this is somewhat controversial) and the proper forcing axiom also implies that the continuum is aleph_2


Please see my reply to @baddox.

You say you do math and not philosophy, but the whole CH independence proof kinda moves this discussion out of the realm of (ZFC-based) math, doesn't it? If you don't want to consider philosophy here, there's not much more to say.


Plenty of math is done outside of ZFC, a lot of modern set theory for example, but that doesn't make it philosophy.

Even Grothendieck's EGA has a couple of arguments that cannot be carried out in ZFC in the generality he presents them in, but I wouldn't call EGA a philosophy treatise nor an algebraic geometer a philosopher


This is about the possibility of proving certain things about the model, not about properties of any implementation. Properties include for example the relation between the amount and quality of data available for training and the accuracy of the model. If a model is proven to learn too slowly it costs more to train it. And some models might never approximate a target function.


its the old Godel trick of translating it into a different form..ie what Godel originally did to prove his own theorem was to translate his theorem into symbolics and than negate it,..

In their example they are expressing it in set theory and than operating on it...

I probably failed at explaining on a simple level...sorry


Theorems are strings of symbols and can thus be encoded as numbers, and statements about numbers become statements about theorems. That's why the theorem applies to systems at least as strong as arithmetics


> All machine learning algorithms can be implemented on a Turing machine, so they all are subject to the halting problem.

Assumption that all learning algorithms can be implemented on a Turing machine cannot hold true, since there are mathematical problems that cannot (take for example https://en.m.wikipedia.org/wiki/Hilbert%27s_tenth_problem). Unless you categorise machine learning problems as problems that can be implemented with a Turing machine, making the statement a tautology.

Edit: thinking about my comment maybe I am the one being tautologic. If there is a learning algorithm that can solve Hilbert's problem it would transcend mathematics


"X can be implemented as a Turing machine" means the same thing as "there exists an algorithm to do X".

Turing machines are just a formalization of the concept of algorithms/computer programs.

So problems that can't be implemented as a Turing machine are not "learning algorithms", or any kind of algorithm.


I think that simplifies things a bit too far.

A Turing Machine represents the most complete computational toolkit we happen to know of--not the most complete computational toolkit possible. If we discovered a new computational trick tomorrow--one that a Turing Machine can't perform (the ability to solve the halting problem for Turing Machines, for instance), then we would have discovered another--higher--class of computers, and it would be a reasonable linguistic move to refer to their programs as "algorithms" as well.


All usual models of computation (such as Turing machines, register machines, the lambda calculus, ...) agree on which functions from the naturals to the naturals they deem computable, and the Church–Turing thesis states that these are exactly those functions which can actually be computed by machines in the real world.

That said, the models of computation bifurcate when studying the question which higher-order functions (functions which input functions as arguments) are computable. Such a function can be easily computable according to one model and not computable at all according to another. I'm writing this because far too often the unqualified statement "all the usual models of computation are equivalent" is made. This is emphatically only true for first-order computation, not for higher-order computation.


It is probably not possible to discover a new computational trick, by the Church-Turing thesis.

The Church-Turing thesis can't be proven, since it is a statement about what is possible in the physical universe, not a statement about mathematics. But the intuitive arguments in favor of it (i.e., that Turing machines capture exactly what we mean by "computation") are, for me, extremely convincing.


The Church-Turing thesis has more to do with what a human can achieve than what is physically possible. I think your position appears section 2 here: https://plato.stanford.edu/entries/church-turing/

...but whether or not it's Turing's, it's still a valid position to take. It just strikes me as a trope of human history to identify the limits with our limits. Kind of like how we once thought we were the center of the universe.

Yeah, it seems unlikely that we'll ever watch a neural network perform an action that would yield an unprovable result, but given that we can't know either way I think it's more fun to think that it's at least possible.


That depends on what you mean by "Church-Turing thesis":

* http://www.cse.uconn.edu/~dqg/papers/strong-cct.pdf The Interactive Nature of Computing: Refuting the Strong Church-Turing Thesis (Published version here: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.455...)

* https://www.semanticscholar.org/paper/Computability-Beyond-C...

See also a previous hacker news discussion of the Church-Turing thesis here:

https://news.ycombinator.com/item?id=18253994

albeit I take issue with the supposed "proof" by counter example via Compass & Straightedge construction given in the video there, due to results like these two:

* https://link.springer.com/article/10.1007/s10472-018-9603-0 Kellison, Ariel; Bickword, Mark; Constable, Robert L. - Implementing Euclid’s straightedge and compass constructions in type theory [September 2018] (Code here: http://www.nuprl.org/LibrarySnapshots/Published/Version2/Mat...)

* https://geocoq.github.io/GeoCoq/ (Numerous papers by Michael Beeson et al., just check the page, I won't link them all here)

(Note that the authors of the two above works take very different yet similar approaches to the whole matter, it really pays off to compare both to each other.)

However, there seem to exist at least two different spanners in the works:

* https://link.springer.com/article/10.1007/s10472-018-9610-1 Makowsky, Johann A. - Can one design a geometry engine? On the (un)decidability of certain affine Euclidean geometries

* https://link.springer.com/chapter/10.1007/978-3-662-57669-4_... Makowsky, Johann A. - The Undecidability of Orthogonal and Origami Geometries

(If the first of the two above papers makes you wonder "well who DIDN'T overlook Ziegler's theorem... Well... https://scholar.google.com/scholar?cites=2279217158168053275)

Oh and in case anyone here still believes Hilbert's old wives tale about /his/ Axiom of Choice, there's plenty of AC to go around in constructive mathematics:

https://vrahli.github.io/articles/bar-induction-lics-long.pd...

But if that doesn't convince you yet, you might go through the 5 stages of accepting it anyway, as outlined by Andrej Bauer here:

* https://mathoverflow.net/questions/25363/au-revoir-law-of-ex...

* Here: https://www.youtube.com/watch?v=21qPOReu4FI

* And here: https://www.ams.org/journals/bull/2017-54-03/S0273-0979-2016...

And here's one last paper to almost conclude what turned into a bit of a larger-than-I-expected Constructivism propaganda piece comment:

http://t-news.cn/Floc2018/FLoC2018-pages/preprint_cX1w.pdf "On Expanding Standard Notions of Constructivity"

And to now conclude this, a semi-joke: Wouldn't the fact that one can't prove the Church-Turing thesis, but could falsify it by counterexample, put propagation of belief in the meme of the Church-Turing thesis into either the complexity class of RE, or co-RE? ;)


Scott Aaronson's paper "NP-complete Problems and Physical Reality" makes some persuasive arguments about what sorts of computations are probably at all plausible to implement in reality.

https://www.scottaaronson.com/papers/npcomplete.pdf


I bought into this article until "the field of machine learning, although seemingly distant from mathematical logic" and then I sold it all.


I'm curious about the applications of this to machine learning in public policy. In papers like Big Data's Disparate Impact (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2477899) we see misguided and underbudgeted policymakers using very sketchy and very opaque ML algorithms to, for example, decide who stays in prison and who does not.

If we can prove that the "decision problem" of who stays in prison and who does not is undecidable, invariant of the specific implementations of the ML algorithm, this could make a case for stopping such overreaches of ML.


As the authors say in their paper, "[a] closer look reveals that the source of the problem is in defining learnability as the existence of a learning function rather than the existence of a learning algorithm". Algorithmic decision making under uncertainty is NOT the same thing as finding learning functions over infinite domains. Sometimes such a function does not exist. Why should it? Although the authors seem pretty much aware of this fundamental shortcoming in their analysis, they nevertheless go ahead and publish a whole paper about learning functions! This has a name, intellectual fraud.


> Because EMX is a new model in machine learning, we do not yet know its usefulness for developing real-world algorithms. So these results might not turn out to have practical importance.

I don't think that's the correct conclusion. More likely, all learning models have unprovable problems so the choice of model is not important wrt the existence of unprovable problems.

Really, this is not a surprising result. All mathematical systems gave unprovable problems. ML is a mathematical system and no exception.


Can someone explain the meaning of the sentence: ``All distributions in this text are finitely supported over the σ-algebra of all subsets of X'' ? On the one hand, it is impossible to define a probability measure on the σ-algebra of all subsets of [0,1]. On the other hand, if a distribution is finitely supported, then there is no point in using a σ-algebra different than what is generated by its support, right?


Conclusions emphasize on no practical use for machine learning algorithms but in the way learnability is defined and treated while using mathematical models to describe it. Some of this mathematical models could result in undecidable algorithms if the wrong model is chosen


The idea that machine learning is a subset of mathematics is very dangerous. In the real world the data is never prepped and clean, and the system is complex and interacts with humans and society at many levels.


I think it's perfectly fair to say Machine Learning is a subset of mathmatics. Everything can be broken down into independent equations, not to mention computers are deterministic.

I do get your point, The outcome of these equations is not always as expected, due to the real world data not being prepped and cleaned.

HOWEVER, I would argue this has nothing to do with ML at all, and only to do with the input data.

If I have a complex equation: y=x+1

And I put in x=1 I expect y=2. If I put in x=11 and expect y=2 then the equation "producted unexpected results". This is not the fault of the equation, it's the fault of the input.


..could you elaborate? Are you saying ML is dangerous because it’s based on mathematics, and math is too clean to handle real world data?


I think it's just that calling it "maths" tends to give a false sense of certainty where none is warranted.

Example: All the really clever math you use to make an encryption algorithm is all 100% correct. Then all the really clever math you use to show that it would take the heat death of the universe to crack your clever encryption is 100% correct. The user uses 'password' as the key; How does your crypto stand up to a brute force? Is that your algorithms fault? Did your difficulty proof lie to you?

I know key length is a well understood. In terms of how algorithmically "valid" real world data that can otherwise torpedo entire complex systems, it's as good an example as any.


Machine learning is literally mathematics, or more specifically, applied statistics. However, human stupidity can never be ruled out of the equation. Not calling something mathematics while it simply is mathematics is obfuscating the issue.


I think OP means that idealizing ML experiments using contrived data can distort the picture. Real world, ecologically valid results can only be discovered as they emerge when the algorithms are deployed in production. ML algorithms sometimes cook up solutions that can surprise or even disturb their creators.

I'm not sure I exactly agree with this premise. If you read about the principles of chaos engineering, (https://principlesofchaos.org/) it's possible to simulate real world events in testing. And if there's a rigorous mathematical backbone to ML as there clearly is, some determinations about its limitations should be universal for all cases, even if the emergent results in production are unpredictable and could range over intractably many possible outcomes.


I'd elaborate by saying that for the focus of ML to be on mathematics risks creating technology that isn't useful at great cost. I think that software engineering followed a similar arc; the mathematical appeal and authority of formal methods was very great, in the absence of wide and deep experience of the reality of system development there was a huge focus on trying to use formal methods as a fundamental part of system development with very marginal pay off in terms of current practice. I think that the impact of this can be seen in the way that software development is essentially an artisanal practice now. AI and ML risk the same sort of diversionary excursion where mathematical fundamentals are the focus of the field and the real world is demoted/externalised. We do not understand the transformations between data and a deployed application, they are semantic, organisational, system dependent and very human. We can't characterise the type of mistakes that classifiers will make, or why those mistakes are, or aren't significant in an application. We can't engineer a machine learning system in the sense that we can't evaluate or certify that it'll work reliably or consistently.

Right now ML and AI are like airships in the 1920's if and when something goes wrong and lots of people die (or are blinded) the community isn't even in a position to properly investigate what's happened. Before we get to focusing on the equivalent of hydrodynamics we need to move to an organisation and practice of engineering discipline - that's what the aircraft people did, and that's why the windows in jetliners aren't square, and that's why you can fly off on holiday.

If AI and ML don't do this and instead everyone spends their hours and days doing maths that isn't absolutely at the core of the real issues of application then watch as confidence and trust evaporates and be ready to wait 20 years to see any value arise.

But - maths that achieves results like those in compiler design and optimisation, I'll buy that for $1!


"We can't characterise the type of mistakes that classifiers will make, or why those mistakes are, or aren't significant in an application. We can't engineer a machine learning system in the sense that we can't evaluate or certify that it'll work reliably or consistently."

These statements are false. It sounds like your extrapolating what you've read in a few blog posts and assuming that's how the entire industry operates. You don't read headlines about people digging through the data and error logs on a daily basis b/c it's not headline worthy but that doesn't mean it's not being done.


It is a subject of math. Safety and other qualifies of it can be analysed logically... Or not, as the article points out


> Because EMX is a new model in machine learning, we do not yet know its usefulness for developing real-world algorithms. So these results might not turn out to have practical importance. But we do now know that we should be careful when introducing new models of learning. Moreover, we might need to look again at the many subtleties that can come up, even in established learning models.

In other words, "Machine Learning" here is a buzzword, without which the paper would probably get less attention.


How do you come to the conclusion that ML is a buzzword here? It's natural to publicize interesting new research that grounds a field -- as statisticial learning theory does for machine learning. Historically, Ben-David and co-authors have produced foundational work on the implications of SLT in ML.


The problem is that the model was invented in this paper itself, and doesn't actually seem to relate to any actual ML algorithm. Admittedly, I didn't read the paper itself so this may be the overreach on the article's part.


I don't think that can be true. If it was, it were like this:

Hey guys, here's an algorithm. Oh by the way, we can not prove that this algorithm does something useful, and their can you.

gets published in nature

Talk about bang for the buck


For me, the interesting conclusion is that there are things that humans are mathematically incapable of knowing.


>For me, the interesting conclusion is that there are things that humans are mathematically incapable of knowing.

Define

>mathematically

>incapable

>knowing

These hastily drawn pop-sciency conclusions are part of the problem why ML/AI has garnered so much media attention these days ...


I probably should have used the word "implication" instead of "conclusion". My thought process was that there are certain input functions that the learning structures within the human neural network can't discern.




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: