Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> That survives in the culture of mathematics where we continue to see a high regard for truth, beauty, and goodness

As a non-mathematician, I've noticed this as well, and I have a suspicion the historical "culture" is holding the field back. Gödel proved there are an infinite number of true arithmetic statements unprovable within any (consistent, sufficiently powerful) formal system. But our "gold standard" formal system, ZFC, has about as many axioms as we have fingers — why is finding more axioms not the absolute highest priority of the field?

We struggle to prove facts about Turing machines with only six states, and it's not obvious to me that ZFC is even capable of resolving all questions about the behavior of six state Turing machines (well, specifically just ZF, as C has no bearing on these questions).

Yet Turing machines are about as far from abstract mathematics as one can get, because you can actually build these things in our physical universe and observe their behavior over time (except for the whole "infinite tape" part). If we can't predict the behavior of the majority of tiny, deterministic systems with ZFC, what does that say about our ability to understand and predict real world data, particularly considering that this data likely has an underlying algorithmic structure vastly more complex than that of a six state Turing machine?

More formally, my complaint with the culture of mathematics is:

1) We know that for any string of data, I(data : ZFC) ≤ min(K(data), K(ZFC)) + O(1)

2) K(ZFC) is likely no more than a few bytes. I think the best current upper bound is the description length of a Turing machine with a few hundred states, but I suspect the true value of K(ZFC) is far lower than that

3) Thus K(data) - K(data | ZFC) ≤ "a few bytes"

Consider the massive amounts of data that we collect to train LLMs. The totality of modern mathematics can provide no more than a few bytes of insight into the "essence" of this data (i.e., the maximally compressed version of the data). Which directly translates to limited predictability of the data via Solomonoff induction. And that's in principle — this doesn't even consider the amount of time involved. If we want to do better, we need more axioms, full stop.

One might counter, "well sure, but mathematicians don't necessarily care about real world problems". Ok, just apply the same argument to the set of all arithmetic truths. Or the set of unprovable statements in the language of a formal system (that are true within some model). That's some interesting data. Surely ZFC can discover most "deep" mathematical truths? Not very likely. The deeper truths tend to occur at higher levels of the arithmetic hierarchy. The higher in the hierarchy, the more interesting the statement. And these are tiny statements too: ∀x ∃y ∀z [...]. Well we're already in trouble because ZFC can only decide a small fraction of the Π_2 statements that can fit on a napkin and it drops off very quickly at higher levels than that. Again, we need more axioms.



> Yet Turing machines are about as far from abstract mathematics as one can get, because you can actually build these things in our physical universe and observe their behavior over time (except for the whole "infinite tape" part)

The infinite tape part isn't some minor detail, it's the source of all the difficulty. A "finite-tape Turing machine" is just a DFA.


> is just a DFA

Oh is that all? If resource bounded Kolmogorov complexity is that simple, we should have solved P vs NP by now!

I debated adding a bunch of disclaimers to that parenthetical about when the infinite tape starts to matter, but thought, nah, surely that won’t be the contention of the larger discussion point here haha


It’s an LBA, a Linear Bounded Automata.


No, an LBA in general doesn't have a finite tape. It still has an infinite tape, to accommodate arbitrary length inputs, it's just that the tape cannot grow beyond the length of its input (or a constant multiple of it, which is equivalent by the linear speedup trick).


Arbitrarily long and infinite are very different things.


No, they're not.

The point is that a finite tape isn't enough for an LBA.


No they really are. Arbitrarily long means "pick a number any number." Infinite means, well infinite. And infinity is always bigger than "pick a number, any number."

I suggest that you read Michael Sipser's Introduction to the Theory of Computation. It's absolutely lovely and an easy read, considering the subject matter. It will help you to understand cardinality (and many other things) in a pragmatic computing science context.


It may be surprising to you that I've actually read a good portion of that text as well as other books on the subject. Or that I actually do understand what a cardinality is in a "pragmatic computing science context" (whatever that's supposed to mean).

> Infinite means, well infinite

There's a technical definition of "infinite cardinality" which is a bit more rigorous than "well infinite". If |S| > n for every natural number n, then S has infinite cardinality basically by definition.

Why don't you try implementing an LBA simulator in your favourite programming language with a finite-sized array and tell me how it goes? Anyway, the original point was that a Turing Machine with a finite tape is an FSM. This is true (well, or more technically: it's equivalent to an FSM) because you can encode all possible configurations of the tape with a finite set of states and thus don't need any extra memory.


> > That survives in the culture of mathematics where we continue to see a high regard for truth, beauty, and goodness

> As a non-mathematician, I've noticed this as well, and I have a suspicion the historical "culture" is holding the field back.

Holding the field back from what? If the goal of the practitioners of the field is to seek mathematically beauty, then well, that is what they will focus on.

Besides that, I don't really follow your argument about Godel & information theory & that adding more axioms is the key to moving math forwards. In the vast majority of cases, the difficulty in finding a proof of a statement is not that the statement isn't provable under a given formal system, it's that we simply can't find it. But maybe I misunderstand you?


> Holding the field back from what? If the goal of the practitioners of the field is to seek mathematically beauty, then well, that is what they will focus on.

Holding the field back from answering questions about the behavior of simple deterministic systems that arguably have an outsized influence on science, machine learning, and mathematics itself. I don't disagree that "beauty" is certainly subjective, but the mathematics research that gets funded by universities and research labs is only tangentially related to such an aesthetic notion.

> In the vast majority of cases, the difficulty in finding a proof of a statement is not that the statement isn't provable under a given formal system, it's that we simply can't find it

Finding a proof is an inherently algorithmic process. Presumably, the human brain uses heuristics (intuition) that have thus far managed to beat simple algorithmic approaches. But it's hard to believe that's going to last much longer. As an extreme example, simply knowing more values of the Busy Beaver function allows one to find proofs more easily. For example, encode the veracity of the statement corresponding to Fermat's Last Theorem into a program (a k-state Turing machine) that dovetails enumeration over all possible values for x, y, z, and n > 2. This program halts if the statement is satisfied by some set of natural numbers and never halts otherwise. But if we know which k-state Turing machine corresponds to BB(k), then we just run the two programs in parallel and if the Fermat program hasn't halted by the time the program for BB(k) has, then the conjecture is true (necessarily false otherwise). Scott Aaronson has written some articles and blog posts about this.

Now the argument I'm making doesn't necessarily depend on trying to determine more values of the uncomputable busy beaver function. You don't have to know the fastest growing computable sequence in order to derive some benefit from greater knowledge of the halting sequence. If we presume that an interesting proof exists within ZFC, then more axioms (that constitute a more powerful system) can lead to much shorter proofs. This is a result obtained from the work of Gödel, Feferman, Jean-Yves Girard, et al. So while you're right that a conjecture we are interested in might have a proof that "exists" within ZFC, whether the minimal such proof can be written down in a human lifespan is a different question. Considering there are many famous conjectures that have been open for centuries now, perhaps the next step isn't having more people stare harder at the problem but rather enriching our foundational formal systems with additional axioms.


I can’t tell if this is crazy or brilliant. Math has been working diligently for a long time to reduce the axioms. Most of the obvious Gödel sentences are stupid things like there is a number that is the proof of itself. The whole project is to derive all of the structure of mathematics, with a high information complexity, from basic axioms but also from comp,ex definitions. I think the definitions (natural numbers as sets, integers as equivalence sets of pairs of natural numbers, etc.) pump up the information complexity from the axioms. Like the initial state of Life allowing arbitrary computation from the simple Life rules.

The idea that there might be more axioms that would let one deduce more about computable complexity classes or the like seems pretty unlikely.

The number of provable statements and unprovable statements is countably infinite and we aren’t lacking the ability to prove things due to obviously true missing axioms.


> 1) We know that for any string of data, I(data : ZFC) ≤ min(K(data), K(ZFC)) + O(1)

What are I and K? What does "data : ZFC" mean?


K(x) is the Kolmogorov complexity of x, i.e., the length of the shortest program that halts and outputs x, typically defined with respect to a universal Turing machine.

I(x : y) = K(x) + K(y) - K(x, y) is the mutual algorithmic information of two strings. It is the algorithmic equivalent to Shannon's mutual information. No algorithm f can increase the mutual algorithmic information between two strings, i.e. ∀f I(x : f(y)) ≤ I(x : y) + O(log(K(f))).

I used "data" to denote any collection of interesting data at all. For example, it could be data obtained from the real world, the observed behavior of Turing machines at very long timescales, or a collection of mathematical proofs. By "ZFC" I just mean the description of any program that enumerates all proofs of ZFC.

The takeaway here is that ZFC is very limited in what it can say about real world data or even what it can say about almost all deterministic computations, including those with very short programs. Levin describes this much better than I could in his famous "Forbidden Information" paper: https://arxiv.org/pdf/cs/0203029.

Levin further argues that extending PA is unlikely... I tend to disagree on that point. How did we get the more powerful theory of ZFC or any of these axioms in the first place then? They are coming from somewhere. But I'm not even going to attempt to debate Levin's case on Hacker News haha


There are plenty of mathematicians - mostly set theorists - who are actively working on finding new axioms of mathematics to resolve questions which can't be resolved by ZFC. Projective Determinacy is probably the most important example of a new axiom of mathematics which goes far beyond what can be proved in ZFC, but which has become widely accepted by the experts. (See [1] for some discussion about the arguments in favor of projective determinacy, and [2] for a taste of Steel's position on the subject.)

I suggest reading Kanamori's book [3] if you want to learn more about this direction. (There are plenty of recent developments in the field which occured after the publication of that book - for an example of cutting edge research into new axioms, see the paper [4] mentioned in one of the answers to [5].)

If you are only interested in arithmetic consequences of the new axioms, and if you feel that consistency statements are not too interesting (even though they can be directly interpreted as statements about whether or not certain Turing machines halt), you should check out some of the research into Laver tables [6], [7], [8], [9]. Harvey Friedman has also put a lot of work into finding concrete connections between advanced set-theoretic axioms and more concrete arithmetic statements, for instance see [10].

[1] https://mathoverflow.net/questions/479079/why-believe-in-the... [2] https://cs.nyu.edu/pipermail/fom/2000-January/003643.html [3] "The Higher Infinite: Large Cardinals in Set Theory from their Beginnings" by Akihiro Kanamori [4] "Large cardinals, structural reflection, and the HOD Conjecture" by Juan P. Aguilera, Joan Bagaria, Philipp Lücke, https://arxiv.org/abs/2411.11568 [5] https://mathoverflow.net/questions/449825/what-is-the-eviden... [6] https://en.wikipedia.org/wiki/Laver_table [7] "On the algebra of elementary embeddings of a rank into itself" by Richard Laver, https://doi.org/10.1006%2Faima.1995.1014 [8] "Critical points in an algebra of elementary embeddings" by Randall Dougherty, https://arxiv.org/abs/math/9205202 [9] "Braids and Self-Distributivity" by Patrick Dehornoy [10] "Issues in the Foundations of Mathematics" by Harvey M. Friedman, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3137697




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: