Formalizing the proof of PFR in Lean4 using Blueprint: a short tour

t_mann · 2023-12-05T21:27:49

I find it almost scary how casual Terry is about all this. Comparing this to my own humble attempts at graduate math, I feel like a caveman trying to rock a basic wheelbarrow who just gets a visit from a guy in a helicopter. And then that guy is stretching out his hand 'Here, try it out, it's fun, useful and not scary at all'.

It was always clear to me that formalization was the future of math ever since I heard about the four-color theorem. What I did not anticipate, and what Terry seems to take in stride here, was that we'd get an AI that can almost do the job of writing the math as well around the same time as formalization starts to gain traction. The fact that I get to learn about it from the most friendly and humble generational genius is just the cherry on top.

jhanschoo · 2023-12-06T12:13:24

My experience with copilot in lean is that it's just as useful as in programming. It guesses alright and works well for boilerplate, and structure and syntax, but you need to tune by hand whenever you need to do something significant or clever.

rq1 · 2023-12-05T22:19:26

Formalism and its “derivatives” died a long time ago already. Even Hilbert couldn’t do much about it.

That being said, constructive math is still a great idea.

t_mann · 2023-12-05T22:31:41

Not sure what you mean by that - what I meant by formalization was to really deduce every statement in a proof from basic axioms and rules. So basically what Prof Tao did in this blog post (except that the hard rote labor has all been abstracted away and handed off to a machine). This kind of formalization is hardly dead I'd say, in fact it's just starting to gain traction in mainstream mathematics, as Prof Tao's recent work shows (if it's not too much work for a Fields medal winner to formalize their proofs, then what excuse do mere mortal mathematicians have?). So what did you mean?

Edit: it's clear that the original four-color proof had nothing to do with this kind of formalization, it was just the debate that got me thinking about it. And tools like Lean4 aren't limited to constructive math afaik.

ykonstant · 2023-12-06T06:22:12

>And tools like Lean4 aren't limited to constructive math afaik.

In fact, Lean is strongly geared towards classical logic: most of its tactics assume and use classical logic, especially the excluded middle.

staunton · 2023-12-06T07:30:10

> what excuse do mere mortal mathematicians have?

As an ardent proponent of formalizing proofs, I still have to disagree with this framing. Mortal mathematicians don't need any excuse not to formalize their stuff. They rather need a reason to do it. One can give lots of reasons but mathematicians often don't know them or don't agree with them.

An important point is, for example, to understand that most formalizing effort is not primarily motivated by a desire to make sure there are no errors in the math.

trevyn · 2023-12-06T09:21:15

>most formalizing effort is not primarily motivated by a desire to make sure there are no errors in the math.

Is this because the proof author is generally confident that the proof is essentially correct?

staunton · 2023-12-06T11:58:41

Here's people who are working on Lean's Mathlib discussing this https://leanprover.zulipchat.com/#narrow/stream/270676-lean4...

t_mann · 2023-12-06T11:37:30

Mathematicians might not like the insinuation, but the best reason to formalize mathematics is indeed to check its correctness:

https://www.andrew.cmu.edu/user/avigad/meetings/fomm2020/sli...

tl;dr: contradictory stuff even gets published in the likes of Annals of Mathematics, and even top journals are reluctant to retract published proofs that have later been shown to be false (which in itself shouldn't be possible).

Like it or not, the reasons for formalizing mathematics abound, and they have a lot to do with the fact that humans are fallible, but mathematics shouldn't be. The incredibly tedious work of tracking down a proof's correctness to the last axiom is, while overwhelming for humans, exactly the kind of work that classical computers excel at.

staunton · 2023-12-06T11:54:47

Of course there's mistakes in papers. It's not that avoiding them isn't the primary motivation for formalization because they don't happen. It's not the primary motivation because mistakes in math papers happening sometimes is OK.

If some result is wrong (as opposed to just details in a proof which could be fixed) and it turns out to be an important result, the mistake is found eventually. This fairly rare occurrence wastes some time for people who maybe used the result and build upon it, but arguably a lot less time than it would take to formalize all papers with existing tools. This might change in the coming decades as formalization becomes easier.

The perspective that this is not the main motivation is also the position of people working on Lean's Mathlib. E.g. Kevin Buzzard has said this at https://leanprover.zulipchat.com/#narrow/stream/270676-lean4...

t_mann · 2023-12-06T13:52:11

Well, it appears that Kevin Buzzard hasn't fully made up his mind on this point (from the presentation by him that I linked):

> If my work in pure mathematics is neither useful nor 100 percent guaranteed to be correct, it is surely a waste of time. So I have decided to stop attempting to generate new mathematics, and concentrate instead on carefully checking "known" mathematics on a computer

I also remember him saying sth along the lines that his concern about how much of recent math is reliable being a major inspiration for his initial involvement in an interview (could have been for Quanta Magazine or so), but couldn't find it now, so can't verify.

staunton · 2023-12-06T17:41:36

That's actually quite interesting. I hadn't read the entire presentation, to be honest. Maybe it's a case of communicating priorities differently to different stakeholders... After all, it's about what the primary motivation is.

Aside from that, the statement you're quoting is deeply puzzling to me. No work is ever "100% guaranteed to be correct", not even something formally verified by a computer, so that statement is either false, vacuous concerning correctness or he didn't mean it literally... (Maybe 100% really means 99.99999% of the time, or something)

t_mann · 2023-12-06T23:25:20

Not clear to me from when the chat statement is, but given that the presentation is from 2020, it would actually be funny if he went from 'this is going to get us 100% certainty' to 'well, we might never be fully sure, but at least it's a cool library'.

Jokes aside, I'm pretty sure he was aware of the limitations at all times, and I'm reading this statement in the context of his examples in the presentation. There's a nice story in a Quanta interview about how he was approached by Peter Scholze to help formalize part of a proof. Scholze had asked whether they'd gone through his recent work, and when he said Yes, Scholze asked whether they'd carefully checked theorem 9.1. Buzzard said no, they didn't have enough time left when they got to chapter 9. And Scholze said, see, that's the problem, is anyone in the world really going to check 9.1? And then there are the stories in the presentation, where entire published proofs are put in question, not because of any direct mistake by the authors, but because of a problem in a lemma they used from previous work. This kind of transitive dependency chain could go down multiple papers, and there's really no guarantee that the later results are salvageable. So it seems to me that formalization would be a great practical tool to mitigate those problems.

And then yes, it's also super cool in its own right (imho), just the fact of collecting the knowledge, and then the related work might also prove useful for formal verification (of hard/software systems), which has practical relevance, and mathlib might also just happen to become a great enabler for developing math-focused AI. So lots of good reasons to do it, but correctness might still be the most direct payoff: Terry Tao has a different blog post about how he found a (fixable) error in a recent proof of his, thanks to Lean. If that happens to him on one of the first occasions where he uses the tool seriously, then we can only guess what's waiting to be fixed (or thrown out) in the whole body of published mathematics.

staunton · 2023-12-07T22:25:21

I would strongly assume Buzzard believes those two positions to be perfectly compatible and consistent. I'm sure it would be very insightful to hear him explain how they are. Hope someone asks him about it.

> formalization would be a great practical tool to mitigate those problems

Yes, I agree. V. Voevodsky's story is a great example how things can go wrong and I very much admire him for the consequences he took from that.

However, if some mathematician has the goal of advancing a certain field, I don't think you can argue convincingly in 2023 that they should formalize their papers because making extra sure there's no errors is worth the time investment in terms of advancing the field. Again, I expect to see this slowly change.

t_mann · 2023-12-12T18:10:42

I'm not saying that starting 2024, journals should only start accepting formalized proofs. I'm well aware that we're probably many thousands of person years away from being able to formalize research involving modern mathematical objects like rough paths or perfectoid spaces. But I do think that we should be moving in that direction. We're halfway through formalizing undergrad education, once we're there we could start including it in graduate courses, and at some point we'll have caught up. Many thousands of years of work divided by many thousands of active grad students maybe isn't prohibitively much. I don't see a reason why in some future, something like Lean couldn't replace something like LaTeX as the de-facto standard language for communicating mathematics. We could build the papers directly from source with docstrings. I don't even think that it would replace peer review, since this considers at least three things, correctness, novelty and relevance, roughly in that order, but it could make it substantially easier by automating much of the first part. And I think it might make the paper writing itself more efficient and enjoyable as well.

Epa095 · 2023-12-05T18:09:42

In the last comment below he writes

> The project has now completed its primary goals; the entire dependency graph is now green.

lanstin · 2023-12-05T18:01:16

Ok, I'll download lean4 and read the tutorials. Everything else I've tried that Prof. Tao has suggested has worked well. I didn't really realize that proof software had advanced so far. I tried to do a project in undergrad to put a lot of ZFC into Prolog but it was just too early to work.

swagmoney1606 · 2023-12-05T21:45:14

Lean4 has been a pleasure to learn so far! Starting with the natural numbers game was great for me: https://adam.math.hhu.de/

haskellandchill · 2023-12-05T18:40:27

I don't follow how the proof text is generated from lean? Or is it not, they are doing the verification in lean and the proof text is completely separate.

Looking at the project more I see it's the latter. Turning lean tactics into readable proof text would be hard, as the lean phrase book he is keeping shows: https://docs.google.com/spreadsheets/d/1Gsn5al4hlpNc_xKoXdU6...

zozbot234 · 2023-12-05T19:42:36

There are declarative systems like Mizar and Isar that do take something a bit closer to natural language proof text as input. Outputing actual text in natural language is just an exercise in pretty-printing (and of course the system must know and work with the natural-language name for every object and statement you're working with, otherwise the output gets clunky).

alexpeattie · 2023-12-05T22:14:58

The proof text is separate. Currently it's up to humans to ensure the proof text (written in LaTeX) properly matches the Lean proof.

Indeed, you can see in the article that the text of Theorem 7.2 omits for brevity details which are present in the non-pretty printed Lean proof (for example H is non-empty, K is a real number, etc.)

t_mann · 2023-12-05T21:49:00

> I don't follow how the proof text is generated from lean? Or is it not, they are doing the verification in lean and the proof text is completely separate.

My understanding was that the Blueprint tool that he linked is what's responsible for the human-readable math sections shown in the screenshots. I'm not saying it's easy, but it should definitely be doable, basically a spreadsheet like that one on steroids.

empath-nirvana · 2023-12-05T19:25:40

There will probably be a time where most proofs are just done in Lean or some similar system and nobody bothers writing a "human readable" paper. Math will just become a kind of programming.

lanstin · 2023-12-05T20:52:13

I'm taking a masters in mathematics, and for me the steps are at least three-fold: understanding what you are trying to prove, which is mostly thinking about it and maybe reading or talking to people; picking up pen and paper and proceeding with a sketch of the essential points; typing the proof into LaTeX for a human readable homework :)

I suspect lean will change the thoughts a bit but not eliminate step one. Maybe step three becomes interactive with lean4 etc. and ends in a PR with green build, but the work does not seem likely to end there.

Put it this way: Terrence Tao is unbelievably better at maths than I am; Terrence Tao with lean4 is also unbelievably better at maths than I am with lean4. As far as the value embedded in all already proven mathematics, having it in mathlib is going to make it easy to use, but extending it in the way Prof. Tao can is still pretty hard.

As a side note, I've extended my experimenting with GPT4 from writing Go and emacs lisp to asking it questions about my classes and it does sound smart but will happily give fractional results for an application of Burnside's Lemma, and say things like:

"An example of a simple group without an element of order 2 is any non-abelian simple group of odd order. By the Feit-Thompson theorem, every finite group of odd order is solvable, which means that non-abelian simple groups of odd order do not exist. However, this theorem does not rule out the existence of infinite simple groups of odd order."

Now this answer is correct in describing how (Feit-Thompson theorem) to show that there are no simple groups of odd order, but "infinite simple groups of odd order" is pretty non-sensical. If the order is infinite, it is not odd.

One hopes future iterations will develop understanding in a way that this text lacks.

t_mann · 2023-12-05T21:42:43

You don't really need understanding if you ask (or train, or fine-tune) the AI to produce a formalized answer, and then write a loop that feeds back the compiler output until it's correct. That doesn't guarantee that it's the answer that you want, but it's guaranteed to be correct, so it is something. You can also use this approach to produce more training examples, if you have time you can manually check and annotate whether the result is what was originally asked for, and then hope that the training improves the accuracy of the initial answer (avoiding the loaded term 'understanding' here). This might substantially reduce the amount of work that needs to be done manually after the first step.

If I was doing a Math degree right now, that's exactly the kind of stuff I'd want to do for my thesis, kind of jealous tbh.

jhanschoo · 2023-12-06T12:24:00

This suffers from the problem of sparse rewards. You need a better reward that gives value to intermediate results or special-case answers, since too few states are correct.

t_mann · 2023-12-07T17:55:40

It's not about training the model. You just input the error messages together with an appropriate prompt and ask it to fix it. Iterate until it verifies successfully (or stop after you've reached ~1-2x content window length).

Once you have produced examples of correct math that way, you can use those to fine-tune the model.

da39a3ee · 2023-12-06T01:07:10

No, I don't think that is so. For one, you still have to decide what is worthwhile to prove. The subject matter of mathematics is very different from that of programming.

082349872349872 · 2023-12-05T19:38:54

I had been under the impression that mechanised proof assistants were much too low level for actual mathematicians (like asking programmers to explicitly allocate functional units and issue slots?) so it's interesting news that Prof. Tao finds mucking about with them worthwhile.

trealira · 2023-12-06T01:11:40

This isn't about Terry Tao specifically, but proof assistants can help you manage a proof when it's gotten too big or complex to manage in your head without making mistakes.

There was this project to use Lean to prove a theorem of Peter Scholze's (Scholze is a well-respected professor and a famous mathemtician). Apparently, it was so big and complex that he could barely keep it in his head, so after writing the proof, he and Kevin Buzzard formalized it with Lean.

https://www.quantamagazine.org/lean-computer-program-confirm...

Then, 6 months later, he was interviewed, where he talks about using Lean for proving his theorem:

https://xenaproject.wordpress.com/2021/06/05/half-a-year-of-...

And the experiment was finished in 2022. Announcement: https://leanprover-community.github.io/blog/posts/lte-final/

Someone · 2023-12-05T20:20:05

From what I understand, it’s more like requiring each programmer to write implementations of arrays, dictionaries, date/time handling code, float parsing and printing, etc.

Once the language designers had figured out the hardware was now capable enough to make libraries useful, it only took one programmer to do that tedious work. After that all other programmers could reuse that work.

This will be similar: once a large body of definitions and proofs has been built, mathematicians will be able to reuse higher-level constructs to build proofs for their theorems.

It likely also will be similar in that there will be multiple approaches to this with one (or a few; it could be that one system works best/will have the largest set of definitions for one branch of mathematics, while another ‘wins’ in another) ending up victorious.

t_mann · 2023-12-05T22:24:52

Exactly. There are multiple ongoing projects to formalize existing math, there seems to be some convergence on Lean4 at the moment, but the best analogy is probably programming languages, where there's a constant battle between the old guards with huge staying power and newer entrants with just too many good ideas to ignore. Except that here a lot of the attractiveness of a language will come from how much math has been formalized in that language, which gives a huge advantage to the early leaders. So there might be a bit more emphasis on 'superset-languages' like Kotlin or TypeScript here, that can make use of the libraries from existing languages.

staunton · 2023-12-06T08:01:46

I think porting automation will eventually solve this problem. It's really easy to write "amazingly good test-suites" for this stuff. So a computer will do the porting and you essentially "only need to read mehod signatures" to make sure the port is correct. (In which case it is completely correct with all edge cases included). This is still some work but much more manageable than e.g. porting JS to Python...

jhanschoo · 2023-12-06T12:19:33

Informal proof will always be a thing like how project specifications are always thing. It helps to have a traditional proof to outline the major milestones of your formal proof before actually delving into the formal proof.

laichzeit0 · 2023-12-06T05:07:53

I find his comment on Copilot to be enlightening. He says it’s not always right, but he still finds it useful. Kinda like that old saying in stats “all models are wrong, but some models are useful”. Language models are really helpful when you have enough experience to be able to judge when it fails, I do feel sorry for novices who don’t have enough experience yet to do that.

t_mann · 2023-12-06T06:58:40

It arguably matters a lot less in this context - you don't need experience to see when Copilot has failed at Lean, the compiler will tell you anyway. But I'm not saying that understanding it yourself won't matter. In fact, I wouldn't be surprised if the gap between him and more mortal mathematicians actually widens with the availability of those tools.

InfoSecErik · 2023-12-05T19:13:38

Lean has been a really exciting thing for me to watch (as an amateur observer), it's gonna go right up to the limits that Godel found.