Ask HN: Question for the Math and Engineering Community

Jtsummers · 2025-11-14T19:51:51 1763149911

> the formal proof is generated (and even cross-verified) by multiple LLMs

Can you point examples of this out? So far what's been getting discussed and used is LLMs to generate Lean (or possibly other languages) which is machine checkable, without the need for another LLM. This isn't LLM peer reviewing LLM work, it's people and deterministic (hopefully correct) machines checking LLM work.

The only LLM proof-checking LLM stuff I've seen are from loons, like the people who think they've figured out life, the universe, and everything after a few Adderall filled days with ChatGPT.

WCSTombs · 2025-11-14T19:38:34 1763149114

First off, I've not seen anything to suggest that LLMs are capable of generating novel proofs that are interesting to mathematicians, and personally I suspect they're not. I think AI could potentially get to that point, but I don't think these LLMs are the solution.

But to answer your questions, there's a lot to unpack there, so let me try:

> ...but the formal proof is generated (and even cross-verified) by multiple LLMs...

The idea of LLMs "verifying" a proof is worthless in mathematics. To explain that, there are basically two distinct types of proofs: normal proofs that are published or explained by people, and formally verifiable proofs. The former are written in a combination of prose, diagrams, and formulas, and are read by people. The latter are written in a type of programming language, and they are executed by formal proof verifiers (other computer programs) in order to verify them.

These computer-checkable proofs are controversial because sometimes they literally cannot be checked by humans, since the languages used to write them are expressive enough to elide crazy amounts of computation. The only reason they're somewhat accepted, though, is that the act of checking them is purely mechanical and requires no advanced reasoning or interpretation whatsoever, the perfect type of thing for a computer to do, and this has been inherent to the nature of mathematical proofs ever since rigorous proofs have existed, since the ancient Greeks. Nevertheless, I think the majority of mathematicians, even those who are relatively enthusiastic about formal proof verification, would tell you that these computer-checkable-only proofs have a different and lower status from the normal proofs that can be understood and checked by humans.

To really understand that, you should know that the basic rule in mathematics is that you shouldn't accept as true any mathematical statement until you've verified its proof. That's why if you take courses in higher mathematics in university or beyond, you prove literally everything starting from first principles, and you never build upon results you haven't proved yet. Checking proofs by computer obviously violates that basic rule, but at least proof checkers are not black boxes, so you could check the implementation to convince yourself that it's doing the right thing. Thus, I would call the result a "second-order" proof of sorts: the proof itself isn't checkable by humans, but the implementation of the proof checker is (although we still have to trust the hardware running it, which is why computer verification of proofs will never enjoy the same status as human verification). And thankfully, we haven't (yet?) gotten to the point where mathematics courses need to build upon results that are only computer-checkable.

The point is that a LLM is neither of those things: it's not a human reading and understanding the proofs, and it's not a proof checker whose implementation can be checked by humans. So it's worthless.

> Should such AI-generated proofs be considered valid and publishable?

> What standards should apply when the idea is human-created, but the proof is AI-derived?

Like I said at the beginnning, I doubt LLMs are capable of generating the types of proofs that would be considered publishable. However, if they are, then they are, because that's how math works. A proof is either valid or it isn't, and that goes for both human verification and computer verification (where of course the verification is done by a proper proof system and not another LLM). Yeah, there's the question of authorship and credit, but that's pretty secondary. (It's already kind of a meme TBH, how rarely mathematical results are properly credited to the people who discovered them, but that's a digression.)