>> Theorem proving and program fuzzing seem like good candidates for combining search with LLM's, due to automated, fast, good evaluation functions.
The problem with that is that search procedures and "evaluation functions" known to e.g. the theorem proving or planning communities are already at the limit of what is theoretically optimal, so what you need is not a new evaluation or search procedure but new maths, to know that there's a reason to try in the first place.
Take theorem proving, as a for instance (because that's my schtick). SLD-Resolution is a sound and complete automated theorem proving procedure for inductive inference that can be implemented by Depth-First Search, for a space-efficient implementation (but is susceptible to looping on left-recursions), or Breadth-First Search with memoization for a time-efficient implementation (but comes with exponential space complexity). "Evaluation functions" are not applicable- Resolution itself is a kind of "evaluation" function for the truth, or you could say the certainty of truth valuations, of sentences in formal logic; and, like I say, it's sound and complete, and semi-decidable for definite logic, and that's the best you can do short of violating Church-Turing. You could perhaps improve the efficiency by some kind of heuristic search (people for example have tried that to get around the NP-hardness of subsumption, an important part of SLD-Resolution in practice) which is where an "evaluation function" (i.e. a heuristic cost function more broadly) comes in, but there are two problems with this: a) if you're using heuristic search it means you're sacrificing completeness, and, b) there are already pretty solid methods to derive heuristic functions that are used in planning (from relaxations of a planning problem).
The lesson is: soundness, completeness, efficiency; choose two. At best a statistical machine learning approach, like an LLM, will choose a different two than the established techniques. Basically, we're at the point where only marginal gains, at the very limits of overall performance can be achieved when it comes to search-based AI. And that's were we'll stay at least until someone comes up with better maths.
I’m wondering how those proofs work and in which problems their conclusions are relevant.
Trying more promising branches first improves efficiency in cases where you guess right, and wouldn’t sacrifice completeness if you would eventually get to the less promising choices. But in the case of something like a game engine, there is a deadline and you can’t search the whole tree anyway. For tough problems, it’s always a heuristic, incomplete search, and we’re not looking for perfect play anyway, just better play.
So for games, that trilemma is easily resolved. And who says you can’t improve heuristics with better guesses?
But in a game engine, it gets tricky because everything is a performance tradeoff. A smarter but slower evaluation of a position will reduce the size of the tree searched before the deadline, so it has to be enough of an improvement that it pays for itself. So it becomes a performance tuning problem, which breaks most abstractions. You need to do a lot of testing on realistic hardware to know if a tweak helped.
And that’s where things stood before AlphaGo came along and was able to train slower but much better evaluation functions.
The reason for evaluation functions is that you can’t search the whole subtree to see if a position is won or lost, so you search part way and then see if it looks promising. Is there anything like that in theorem proving?
The problem with that is that search procedures and "evaluation functions" known to e.g. the theorem proving or planning communities are already at the limit of what is theoretically optimal, so what you need is not a new evaluation or search procedure but new maths, to know that there's a reason to try in the first place.
Take theorem proving, as a for instance (because that's my schtick). SLD-Resolution is a sound and complete automated theorem proving procedure for inductive inference that can be implemented by Depth-First Search, for a space-efficient implementation (but is susceptible to looping on left-recursions), or Breadth-First Search with memoization for a time-efficient implementation (but comes with exponential space complexity). "Evaluation functions" are not applicable- Resolution itself is a kind of "evaluation" function for the truth, or you could say the certainty of truth valuations, of sentences in formal logic; and, like I say, it's sound and complete, and semi-decidable for definite logic, and that's the best you can do short of violating Church-Turing. You could perhaps improve the efficiency by some kind of heuristic search (people for example have tried that to get around the NP-hardness of subsumption, an important part of SLD-Resolution in practice) which is where an "evaluation function" (i.e. a heuristic cost function more broadly) comes in, but there are two problems with this: a) if you're using heuristic search it means you're sacrificing completeness, and, b) there are already pretty solid methods to derive heuristic functions that are used in planning (from relaxations of a planning problem).
The lesson is: soundness, completeness, efficiency; choose two. At best a statistical machine learning approach, like an LLM, will choose a different two than the established techniques. Basically, we're at the point where only marginal gains, at the very limits of overall performance can be achieved when it comes to search-based AI. And that's were we'll stay at least until someone comes up with better maths.