Hacker News new | more | comments | ask | show | jobs | submit login
Machine learning leads mathematicians to unsolvable problem (nature.com)
197 points by magoghm 20 days ago | hide | past | web | favorite | 92 comments

This work was published in “Nature Machine Intelligence“, the journal boycotted by pretty much everyone who matters in ML.

Reflects poorly on the authors, regardless of the actual merits of their finding.

Had to lookup what the boycott was about - free access and fees: http://www.sciencemag.org/news/2018/05/why-are-ai-researcher...

As far as I can tell, the article isn't really of any interest either.

Formulating a proposition that is independent of standard axioms is simple. Formulating it in the language of machine learn is an exercise. The main thing is the authors didn't provide any motivation for this to matter to the overall enterprise of machine learning, because there isn't motivation for this. It's just a novelty.

I don't understand your reasoning. Because, you could also "formulate" CH in the "language" of Turing machines. But that is clearly disanalogous to what the article is saying.

As someone who isn't in the know, I found this[1] Forbes article about the journal. Forbes portrays the opposition as opposing the fact that it's a closed access journal, but are there other points of contention that weren't reported in the article?

[1] https://www.forbes.com/sites/samshead/2018/04/30/tech-giant-...

That's enough. ML research is posted on arxiv and presented in open conferences. You can paywall your slide deck but not your paper. If pay-to-play journals get even an inch, they'll try to take everything.

What does “open conferences” mean?

Last time I checked, fees for participating in ML related conferences are very high (compared to registration fees for conferences in academic fields with less hype). This makes them less open because cost is a financial barrier to participation.

The proceedings are freely available online and most sessions are freely casted. Attending the conference in person has surely some cost!

Also, if you live outside USA/Europe, let's go to a important conference may be a few thousand of dollars more expensive that sending an email to the editor with the draft of the paper.

[Hi from Argentina!]

what's the reason it's boycotted? appeal to authority isn't very informative for those not keeping up with this stuff

Subscription fee requirements for access, mostly.

and fees for publishing.

i.e. Why should people have to pay to view the results of what is often publicly funded research? Even more so why should researchers have to pay to publish work that the publishers profit off of but the researchers don't.

Because being a peer reviewer shouldn’t be done for free? Do you like being paid to work? Because Nature has established themselves as a premier journal over the the past 150 years and are known for their quality?

Or maybe I’m just taking crazy pills..

There's a strange thing, there seems to be a universal rule of Open Access discussions:

There will always be someone who defends publishers based on not knowing anything about how scientific publishing works.

My favourite article about open access, about the Journal of Machine Learning Research, and really the ultimate smackdown, all done very politely: https://blogs.harvard.edu/pamphlet/2012/03/06/an-efficient-j...

Peer reviewers are not ever paid, making this whole arrangement somewhat bizarre and unsavory to reviewers and authors alike.

Peer review is not paid work. Mostly it is done as an overhead that is understood as part of being an active researcher.

I would even venture to say that most researchers will be against the idea of paid reviews, just like Amazon found out that paid product reviews are a very bad idea.

That's the problem: publishers don't pay reviewers but charge for their work.

>Or maybe I’m just taking crazy pills..

You are. Where did you ever get the idea that nature pays for reviews?

Haha, reviewers getting paid. That's a good one. This journal charges the authors for Pete's sake.

Have any alternative journals in that area popped up that are both open-access and without submission fees?

The premiere journal in Machine Learning, the Journal of Machine Learning Research (JMLR), has existed in this format since 2001. It is explicitly mentioned in the petition for the boycott: https://openaccess.engineering.oregonstate.edu/

Which would also explain why Nature wrote this publicity piece on it. I can only hope that ML researchers show the rest of academia the way out of publishers' hands.

One doesn't need to go back to Gödel for this to make it a "huh" moment. Instead, go back to 2006 to make it a "duh" moment.

Aggregability is NP-Hard: https://www.google.com/url?sa=t&source=web&rct=j&url=http://...

That is, even for linear systems, determining whether or not macrovariables (e.g. complete eigenvector sets, complete embeddings) exist for a given space is an NP-Hard problem. With linear systems serving as, effectively, a lower bound for ML problems (because, otherwise, why are you even ML'ing the thing?), whether or not a problem is "learnable" is, unsurprisingly, probably NP-Hard. Aggregability and learnability look staggeringly similar to me.

Demonstrating otherwise would be a huge result. Writing a paper confirming Kreinovich and Shpak in a slightly different domain is basically a "Water is Wet" paper.

Having not yet read the paper, I'm unaware of any new ground here.

NP-hard and undecidability are completely different things. They’re barely on the same planet.

I would argue that for practitioners it is the same planet. But I agree that they are not the same thing.

That strongly depends on the practitioner and the problem. We solve (or approximate) NP hard problems all the time. SAT solvers are really good, so are ILP solvers and a host of other tools that solve a ton of instances of NP-complete problems that arise in practice. If you're into software verification, people solve undecidable problems there every day.

Complexity results are always worst-case results. In practice you can often solve all instances you actually care about.

disagree, some small size NP-hard problems can be solved in practise.

or, given enough effort / luck / redefining the problem, sometimes extra structure can be found for the particular problem instances that you want to solve, meaning that they actually belong to an easier problem class that can be solved in practise -- e.g. you've got extra information or constraints that you're not actually using, so you don't need to solve the NP-hard problem in general, just solve the specific problem you've got.

i'll willing to change my position if someone can argue it is similar for undecidable problems.

Well: the halting problem is undecidable. That does not prevent us to try and prove (many many times) that an algorithm does or does not stop.

I guess the parent meant something like this.

Of course the concepts are totally dissimilar but the practical consequences are less apart.

Read the paper. I have (now).

They’re discussing “learnability” as related to VC dimensionality and compressibility.

It’s absolutely related to aggregability.

I mean, haven't the authors shown that the problem is "worse" than NP-hard? They have shown it to be undecidable?

Aren't some undecidable problems "better" with regards to practical needs because we can achieve arbitrarily precise results more cheaply?

How is worse than NP-hard named in the field? NP-impossible?

Disclaimer: I'm really noob, not asking it sarcastically.

NP-hard problems can be solved.

The Continuum Hypothesis [0] (which the authors are saying the learning problem is isomorphic [1] to) is not provable from the axioms of set theory. You can add "The Continuum Hypothesis is true" OR "The Continuum Hypothesis is false" to the axioms, and still have consistent mathematics.

[0] Continuum hypothesis is that the set of Integers (0, -1, 1, -2, 2, ...) is infinite but smaller than the set of Reals (Integers + Rationals + Irrationals), and there are no infinite sets with size smaller than the Reals, but larger than the Integers.

[1] not sure of correct term here - the point is that they have shown the problems are the same.

Actually, some undecidable problems are NP hard, like the halting problem. That is, a halting oracle gives a polynomial time algorithm for any NP problem.

I'm not sure what you mean with "can be solved", but I would spell it out a bit different:

There are NP-hard problems that are undecidable, that means, there is no algorithm that can decide the question for every input. However, in some instances we are able to solve these problems (even quite easy). For example we know that an algorithm like "while TRUE DO (nothing) END" will never terminate, even though the halting problem is undecidable.

However, if a NP-hard problem is also in NP, than it can be solvend. But it will take exponential time in the worst case. That, too, does not mean that in some instance we are able to solve them in reasonable time.

This is wrong. NP-hard problems can be solved, they just appear to hard to solve efficiently. NP-hard problems are always decidable, like everything in polynomial hierarchy (P, NP, co-NP, and higher classes that have oracle access to lower classes).

Undecidable problems e.g. the Halting problem cannot generally be solved using an algorithm (so it has to be solved on a case-by-case basis and requires "creativity").

Sorry, but you are flat out wrong. For example, the halting problem is NP-hard and undecidable [1]

I think you might confuse NP-hard with NP-complete. There are problems that are NP-hard, not in NP and unsolvable. If a problem is NP-hard _and_ in NP, then they can always be solved.

[1] https://en.wikipedia.org/wiki/NP-hardness

There are an infinite number of complexity classes that are (probably) harder than NP. Popular ones include PSPACE and EXPTIME. You might want to google for "polynomial hierachy".

Undecidable means there is no logical way to have an answer. Turing’s halting problem is one such.

There's got to be more to it than that. The halting problem itself isn't some on-off switch, it can be studied and broken down into different problem classes, some of which are indeed decidable.

It's just impossible to write a general-purpose algorithm to solve it in all cases.

“It's just impossible to write a general-purpose algorithm to solve it in all cases.” That is the definition of an undecidable problem. Independence, as in Euclid’s fifth postulate, is not what we typically think of as undecidability, though can be referred to as such.

That is incorrect. The halting problem can be answered -- every Turing Machine either halts or does not halt on a given input. Undecidable only means that we cannot have a single algorithm that outputs the correct answer in every case.

This is the first paragraph from the Wikipedia article on undecidable problems: “In computability theory and computational complexity theory, an undecidable problem is a decision problem for which it is proved to be impossible to construct an algorithm that always leads to a correct yes-or-no answer. The halting problem is an example: there is no algorithm that correctly determines whether arbitrary programs eventually halt when run.” https://en.m.wikipedia.org/wiki/Undecidable_problem

PSPACE-complete is an example. EXPSPACE is another.

An NP-hard problem is still possible to be solved. Impractical but possible.

An undecideable solution cannot be proven true.

Yes but there are very fee practical instances of undecidable problems (by instance I mean a specific concrete here-are-the-numbers problem, not a general statement about “equations”).

Are there any at all? If you have exact numbers you can decide in finite time. Always, if I'm not mistaken.

So, unless there is an infinity encoded in the exact problem somehow, you can always solve it by simple brute force enumeration.

Look at the busy beaver, it's completely possible theoretically to simply enumerate all the possible states of a tape for a given N, and play the corresponding Turing machines. Even if calculating this is physically impossible, and proven to be impossible to generalize for any N. (That is the BB functions are uncomputable, but of course not exact values.)

Gödel’s sentence comes to mind. Yes, it is not “practical” but it is undecidable.

Godel's sentence is not an exact problem. It's a big number representing a claim (a theorem) about Godel numbers and the sentence itself.

There's nothing to calculate there. It's a proof by construction for Godel's incompleteness theorem.

A non-computable problem might still be practically "better" because its solution might be approximated precisely and more cheaply, such as the approximation of a real number.

If you didn't read the paper, at least read the title: "unsolvable" does not mean NP hard

I did read the paper (now). Their question is of “learnability”, which relates to aggregability/compressibility and dimensionality.

This stuff is all in the same ballpark.

Having read their paper, I do appreciate the formulation and approach. I just don’t find the result “surprising” in the slightest.

The theorem is proved for only for probability distributions with support at a finite number of points on [0,1], and the sigma-algebra of all subsets of [0,1]. This is an utterly bizarre choice and makes the whole thing basically uninteresting.

One of the first things you see in a graduate course on real analysis that covers measure theory is the construction of an unmeasurable set using the axiom of choice (equivalent to the continuum hypothesis). Then you say, "that's useless," and restrict the subsets you work with to those that don't depend on this problem, or you go work with intuitionist, finitist, ultrafinitist, or some other logic, which gives basically the same effect. Then you build all of probability theory and machine learning in this actually useful world.

The unmeasurable set construction that everyone uses is to consider all subsets of [0,1] of the form {x + r : x in [0,1], r is rational}. Since there are uncountably many irrational numbers, you end up with an uncountable number of these subsets, and then use the axiom of choice to select an element of each.

For their probability distributions with finite support, consider the sets of points that form the support of the distributions. I can form a sequence of such sets that heads towards that unmeasurable set, and a corresponding sequence of probability distributions. They completely ignore whether their set of distributions is complete (that is, contains all limits of sequences—hint, it isn't), but they're taking suprema, so they effectively are working in the closure anyway, where the limits are included.

So basically they have found an obscure way of showing the first result taught in measure theory and dressed it up in the clothes of machine learning, even though no machine learning work ever conducted or that will ever be conducted with work on this mathematical structure because of this exact reason.

This is not adding to the conversation. Your comment was my impediment in moving past the second page of the paper. Problem is, I never got to study foundations of measure theory, and presumed I could be wrong. Do you have any self-study recommendations, paper or concise books?

I really don't have a good recommendation. :( I learned it from a person, not a book, and the textbook he was assigning exercises from (Folland) is essentially useless for self study to my mind.

Earlier discussion: https://news.ycombinator.com/item?id=18858724 (Same research, but different news coverage)

The actual research, which I find more understandable than the news coverage: https://www.nature.com/articles/s42256-018-0002-3

Isn't all this press by the Nature publishing group just about hyping up one of its newer magazines that the larger research community is boycotting?

By the way, Paul Cohen, who proved the independence of the Continuum Hypothesis, says that Godel did not express much interest in independence results (I think Godel believed in Platonism):


An article by Godel, taking a philosophical viewpoint about set theory and independence, is "The modern development of the foundations of mathematics in the light of philosophy":


Godel believed that for well-defined areas of mathematics, we can add more _intuitive_ axioms to make the theory decidable: The appropriate quote from the article: "I would like to point out that this intuitive grasping of ever newer axioms that are logically independent from the earlier ones, which is necessary for the solvability of all problems even within a very limited domain, agrees in principle with the Kantian conception of mathematics. "

> By the way, Paul Cohen, who proved the independence of the Continuum Hypothesis, says that Godel did not express much interest in independence results (I think Godel believed in Platonism):

Gödel proved the other direction of the independence of CH (and the Axiom of Choice), and had been trying to prove independence some years after that. He had actually succeeded in proving the independence of AC by 1943, and later said in an interview with Hao Wang that his methods could have probably been extended to prove the independence of CH by 1950, but he stopped working on it because he developed a distaste for the subject. In retrospect, Gödel wished that he had continued his work so that set theory progressed faster.

The article The Origins of Forcing by G. H. Moore in the proceedings of the 1986 Logic Colloquium is a good discussion of the relevant history.

> Godel believed that for well-defined areas of mathematics, we can add more _intuitive_ axioms to make the theory decidable: The appropriate quote from the article: "I would like to point out that this intuitive grasping of ever newer axioms that are logically independent from the earlier ones, which is necessary for the solvability of all problems even within a very limited domain, agrees in principle with the Kantian conception of mathematics. "

This line of research hasn't gone very well in practice since Gödel's time. It has produced a lot of interesting set theory, but has not really impacted mathematics at large.

Thank you for the reference on forcing. Will read it carefully.

> This line of research hasn't gone very well in practice since Gödel's time. It has produced a lot of interesting set theory, but has not really impacted mathematics at large.

What about Hugh Woodin's work? I agree that it has not had impact yet, but do you think it might lead along this suggested line?

"In the latest paper, Yehudayoff and his collaborators define learnability as the ability to make predictions about a large data set by sampling a small number of data points. The link with Cantor’s problem is that there are infinitely many ways of choosing the smaller set, but the size of that infinity is unknown."

Reading this got me wondering about more philosophical aspects, such as the relation of this conclusion to to the process of human learning and prediction (which obviously works) and whether it's somehow tied in some way to the problem of induction. Could be nonsense, but I'll be putting some more thought into the meaning of this

The problem with these papers is, that if you want to do practical ML, you can simply fix some upper bounds on various numbers that arise during the analysis of the problem, and thus make it mathematically finite, and trivial. Not easy, of course, but this frees you from the infinite regress that's the problem of induction.

Sure, you can the start to ask what are the good upper bounds. How many epochs to use for training, start to quantify the training set quality, and then ask how good that must be to get a good enough model. Etc. But all of these then can be limited by estimation.

Don't read too in this result. This is about perfect classifiers, if it is enough to train a perfect classifier using only a finite amount of samples of the set.

People learn, but sometimes learn with mistakes. For example:

* Is this mushroom edible?

It's much easier to make a good enough classifier than a perfect classifier, if you allow a broad definition of good enough.

Not completely on the subject, but I had to read the first sentence 5 times, before I was able to understand its meaning. Then I realized it was too long, so modified it to following version. Is it better or worse?

"A team of researchers has stumbled on a question that is mathematically unanswerable. It is linked to logical paradoxes, that were discovered by Austrian mathematician Kurt Gödel in the 1930s and it can’t be solved using standard mathematics."

To further improve your last sentence, drop the comma after "paradoxes" and add one after "1930s".

Yeah, it’s a bit better. Not sure how much, maybe more for non native speakers?

Thanks! English is my 3rd language, so I do struggle with reading sometimes.

If I understood the article correctly (and I think I didn't) it says learnability of a given data-set is like the halting problem, there is no general algorithm for deciding whether an arbitrary data-set is learnable. Am I close?

This was my takeaway from the article as well. Even though I have no academic knowledge on the subject, can't say I was surprised to learn this. To me it seems obvious in hindsight, though might be Dunning-Kruger.

Can someone explain why/if this is a big deal? It seems to me that they just rehashed undecidability and logical incompleteness.

I don't understand how a result concerning infinities can apply to a finite number of data points represented by finite (bounded - probably by 2^32) integers processed in finite time.

i think you're alluding to an interesting question: in what cases does this "undecidability of learnability" result apply?

the paper seems to talk about binary classifiers `h : X --> {0, 1}`, where each classifier h belongs to some family `F` of functions subseteq `{h : X --> {0, 1}}`, and where observed samples are drawn from some unknown distribution P over X.

so, it probably depends upon the choice of `X` and `F`, but i dont have enough of a handle on the theory to understand this.

The reals are infinite. Float/double

So, while I think your question is interesting and good, I don’t think integer size is a strong counterexample in support of a different conclusion.

Float/doubles are finite. They only approximately model the real. And that's key. Because the fact that approximation works for a problem means it must be reasonably smooth, which means everything will be more well behaved and rules out the worst pathologies.

float/double can store a subset of the rationals plus some other values which aren’t found in the reals

And those other values are finite (NaNs, infinities, unequal zeros). I don't know why you're downvoted. What you said is absolutely correct.

It's an absurd thing for GP to say integers aren't infinite (apparently talking about fixed width integer types) and then turn around and say float/double are infinite (talking about single precision and double precision floating point types).

Can you elaborate on this? Specifically, why isn’t integer size a good counterexample?

Question for someone with a more theoretical background: The paper shows that the EMX learnability of some class of functions with respect to some set of probability distributions is undecidable. Does EMX learnability encompass all notions of learnability (or is it equivalent to other notions)? Conversely are there or could there be notions of learnability different from EMX that are not undecidable? Maybe I missed this in the paper but clarification would be appreciated.

I mean, c'mon we all saw this coming...

Where is mathematics in ML today? Most of that are happening under the hood and all algorithms are just a black box.

It's all around. The math portion just doesn't get as much hype.

For example, Goodfellow and Bengio's book on Deep Learning talks a lot about connections to Bayesian Inference, Classical Statistics, Information Theory, and a bit about Topology.

Christopher Oolah's blog gives plenty of great explanations of mathematical topics.

Personally, I use Math in my work all the time. Thinking in terms of Information Theory has let me quantify and compare algorithms that seemed difficult to evaluate at first. And several times, I've seen a business make the wrong decision due to lack of Math/Stats background.

I don't mean to be a Math snob – you can absolutely do a lot of valuable work treating algorithms as black boxes. But the Math is there, and it is being used.

There's a very neat and often forgotten piece of math in learning, VC dimension (unrelated to venture capital):


Also, the reason why black box methods are such a big deal now is precisely because controlled/engineered methods turned out to be inferior (obvious example: image recognition; look no further than into the story of the dropout method and Alex Krizhevsky).

Edit: s/basically forgotten/often forgotten/ in the first sentence.

It's not very forgotten if it's in the Learning from Data course, to say the least

There are some good combinatorics notes by Chernikov[0, pdf file] talking about VC theory if anyone is interested in learning more

[0] http://www.math.ucla.edu/~chernikov/teaching/Combinatorics28...

The ML field today is all about results. Get the high scores and figure out the math later. Not that there is anything wrong with this, we still at the stage where we're banging rocks together wondering what works and what doesn't, and the theories will come later.

So what is the meaning of "ML experts" if all they do is trial and loss experiments! Is Math PhD just used for hiring signal rather than actual requirements to do ML projects?

It’s an exciting time because the field is getting to the point where ‘complexity is unbounded’

I.e. in many materials or chemistry fields where you start operating with over 100 variables, you begin to develop a dark arts of understanding because what is being attempted is beyond the ability of computers to model.

You can do chemistry without a PhD, but your ability to systematically try to address the complexity may be hindered without the training. Likewise with ‘ML experts’ (I hope they get a cool word some day to describe their profession).

the example you gave is unfortunately a counterexample!

If we have a function f(x1, x2, ...) of 100 variables and you wonder about the gradient, theres multiple ways of calculating it.

Theres symbolic differentiation, but due to the chain rule, the number of terms grows rapidly and the expression can not be stored.

Then theres the finite difference method, whereby you calculate for each of 100 variables x_i:

f(x1, x2, ..., (x_i +epsilon), ..., x100) - f(x1, ..., x100)

the term on the right is the same constant so in total you need 100+1 forward function evaluations. And theres the issue of precision for small differences (mantissa).

One of the main reasons machine learning took off is because of the mathematical realization (Automatic/Algorithmic Differentiation) on how a 1 forward and 1 backward pass is more mathematically rigorous (calculates gradient vs finite differences) and much more efficient.

With the blackboxing of the algorithms, many endusers of the ML libraries end up using ML when they don't know the functional form of a map, but will refuse to apply automatic differentiation of a known complex function with large number of parameters. In contrast those endusers that made sure to understand Automatic Differentiation as a tool orthogonal to arbitrary function approximation (i.e. everyone who realizes the math part of ML is very important) will be able to apply AD (or any other tricks learnt through a mathematical perspective) in situatins where there is no need for arbitrary function approximation...

EDIT: woops I thought you were arguing for blackboxing, against mathematical interpretation upvoted

May be Ml is an empirical subject more than a theoretical subject. It is more biology than physics. More astronomy ... even is the subject is created does not meant it follows rules.

After all if intelligence comes out artificially, I hope it does not have rule.

Great point ML takes much of its inspiration from simulating brains. Therefore studying such artificial brains is much like studying biology and thus to a large degree empirical I would assume.

Yes, there is no need to have a PhD to do ML, either applied or research. Not to say there isn't good research using advanced mathematical/statistical methods, but most of it is not. It's entirely a signaling game.

Professor Peter O'Hearn (UCL Computer Science) says latest research findings on 'unsolvable' mathematical problems is “of a rare kind”, and will probably be important for the theory of machine learning.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact