The article is basic mathematical research and is not intended to be implemented, but rather to guide further applied research.
The deeper motivation for this article is to allow future Artificial General Intelligences to prove facts about themselves; specifically, the fact that the AGI, or a next-gen that it developers, has the same (safe) utility function that the designers originally specified.
A formal system cannot achieve this by simulating itself.
> A formal system cannot in general prove that it is reliable, i.e. that if it proves statement P then P is true.
I'm not entirely sure what you mean by this (I haven't read the article yet, and I'm already familiar with Loeb's theorem).
It's true that you cannot show that, say, second order arithmetic is consistent in the same system.
In fact, every soundness proof (for a sufficiently strong logic) will have to be carried out in a stronger system.
There is a standard way around this, which has existed for a long time.
You stratify the system by introducing universes.
E.g. in type theory, a universe is a type of (codes for) small types.
This allows you to state and show meta theorems "for all (small) types", by quantifying over a universe.
In the concrete example of Martin-Loef type theory (MLTT) you can show that MLTT with n+1 universes contains a model of MLTT with n universes.
On the other hand, adding more universes seems to be harmless as far as anyone knows.
Under the assumption that MLTT with a countably infinite number of universes is consistent and you restrict your formal system to only use a bounded number of universes, it is still possible to show that it is "reliable".
It is "at least as reliable" as MLTT with countably many universes.
I will read the article later and update this post if there is something compelling in the paper.
At the moment I'm just confused what the problem is, and would really appreciate it if you could expand on this.
> This allows you to state and show meta theorems "for all (small) types", by quantifying over a universe.
As you say, this isn't quite self-trust. Another natural move is to relax the criterion of self-trust away from "it proves itself consistent" (impossible by incompleteness) towards "it assigns high probability that it has good beliefs". (Formalizations of this are in the paper.)
> At the moment I'm just confused what the problem is
In short, it would be nice to have a model of "good reasoning under deductive limitation", where "good reasoning" means something like "has accurate beliefs about all questions of interest" (for example, facts about the outputs of long-running computations), and where "deductive limitation" rules out the reasoning process "just wait for your theorem prover to decide the question".
Examples of long-running computations that are hard to compute exactly, but that we can sometimes still have reasonable beliefs about: optimal moves in chess, go, etc.; the accuracy of some ML system after a given training regimen; a weather-forecasting program that runs a gigantic series of simulations; and so on.
So by now I've actually read the (abridged) paper and skimmed the full paper. And the paper has nothing to do with consistency problems, since they only consider boolean propositional logic... Assigning consistent probabilities in the limit is indeed a much more interesting problem anyway.
Self-trust seems like a problem on paper, but the reality is that normal mathematics uses very few universes.
Concretely, the proof of the Feit-Thompson theorem in Coq, with all the related theories, uses 4 universes.
If you have a system certified in type theory, then you can still reason about it.
Possibly one universe higher, but that is not a problem (as far as anybody knows).
>Self-trust seems like a problem on paper, but the reality is that normal mathematics uses very few universes.
Self-trust (broadly construed) is interesting to me because it seems relevant to designing goal-based agents that are "stable", in the sense that they trust that future versions of themselves will have accurate beliefs (and therefore don't have an incentive to mess around with their systems for forming beliefs). If we try to formalize this intuition with "beliefs" as theorems proven by a formal system, we run into reflection problems; having your theorem prover assert that it will keep outputting only true statements feels awfully close to asserting its own soundness. So even if your agent can perform all the usual mathematical reasoning it needs, it still can't do all the useful reasoning about itself (it would need another large cardinal... and then another...).
The self-trust property in the paper says that it's possible to "learn from experience" that your future self is probably going to have pretty good beliefs. Specifically, a logical inductor P_n learns (roughly speaking) that "if P_f(n) thinks Phi is likely, then Phi is likely", where f(n) can be a fast-growing computable function. That is, on day n, P_n believes a sort of "probabilistic soundness" condition for its future self P_f(n). This is weaker than full soundness in at least two ways, but it is fully "reflective" in the sense that P believes this of itself.
The expressiveness of the tower of universes limits to that of Peano arithmetic, correct? Otherwise, there would exist a universe k that contain theorems that could not be computably proven in universe k+1.
On a separate matter, are there any principled meta-universal rules for incrementing the universe index? That is, if one suspects that one's current universe lacks proving power, is it possible to generate the axioms for a more powerful universe?
Peano arithmetic, as in classical first-order logic with the peano axioms is a small subsystem of MLTT with natural numbers and no universes.
If you come from a set theory background, then universes are really akin to Grothendieck universes, or large cardinal axioms.
You start out with a powerful theory and then improve it by repeatedly adding statements of the form "and the theory so far is consistent", by giving an internal model of "the theory so far".
There are type theories with universe variables, which essentially allow you to add an arbitrary (but finite) number of additional universes.
The rules for this are straightforward, even though a consistency proof is of course only possible relative to type theory or set theory with more universes...
In set theory, one typically adds an axiom scheme that states something like "There is a Grothendieck universe containing this set".
Iterating this gives you a similar tower of universes.
There are a lot more constructions that go beyond this and the funny thing is that as far as anyone knows they are all consistent.
For instance, type theories with induction-recursion allow you to generate internal universes closed under certain operations while staying in the same universe.
So one universe with induction-recursion allows you to show MLTT with an arbitrary finite number of universes consistent.
But can it show that it's actually running this correct algorithm to determine whether the statement about itself is probably true?
There is always an external context that runs the computation. A computation never exists in "abstract space", it always happens on a physical level. Any computation is only good if it runs in the context that it was designed for.
What I mean is that the AI can never "know" if it's still following the "righteous goals", because a new environment may have grown around it, that "transforms" its inputs and outputs so that to the core machine it seems like nothing has changed, but actually its real-world effects have become very different. And at this point you could actually also say that this part of the environment has grown onto the AI machine and it now effectively has new goals that are incompatible with the ones designed into the core part of the machine.
My point is, there is no isolated system, reasoning by itself in an isolated fashion. Subject and environment are not clearly divided.
> at this point you could actually also say that this part of the environment has grown onto the AI machine
No, you cannot, if the environment is part of the definition of the machine, as you put it, A computation never exists in "abstract space", because if the environment changed, you'd have a new machine and all bets are off. You had that right at first.
Of course, that definition is unwieldy and we define interfaces as abstract and general as possible, to separate concerns. With involved machinery, that separation is not that clear cut, as their always two sides of the coin, to put it trivially.
This is a good point. I think it'd be pretty interesting and useful to get a better grasp on when/whether/how a system can reason about itself as embedded in the world in a sane way.
> A formal system cannot in general prove that it is reliable, i.e. that if it proves statement P then P is true
This implies second order logic. It is not clear to me, whether the loss of consistency is warranted. Type Theories are tried to avoid inconsistence. However high the order of the processing logic is, its only reason to exist is to output first order theorems, those we can prove decidable.
> But with this article, the hope is that a formal system can show that the statement about itself is probably true.
Probabilty is not good enough. The Bayesian Conspiracy sure is strong, but I'd prefer to stay with first order logic and finite state machines that are provably correct.
> > The short answer is that this theorem illustrates the basic kind of self-reference involved when an algorithm considers its own output as part of the
universe
Isn't that what differential equations are for? I'm tired of the liars paradoxon. Intuitively, I've settled on the presumption that paradoxa always rest on wrong assumptions.
I'm no mathematician, but I refuse the notion that I am inherently unable to be certain. That's why algorithms are by my preferred definition bound to be deterministic. I'd like to be able to tie this in with the Chomsky Hierarchy, albeit I am not that advanced. I'm no mathematician and I'm impressed by quines. I guess, the halting problem implies that quines cannot always be predicted. Heuristics help there and that's what stochastic is all about.
Let me be clear. The paradoxon "this sentence is a lie" is neither a sentence, nor a lie. That's a question of definition. Of course I'm in no position to say a similar thing about Goedels incompleteness theorem, and I even referred to it's result in higher order logics, but I still doubt the relevance, as many seem to be ignorant of his former completeness theorem.
The higher order logic and self reference is related to recursively enumerable grammars low in the Chomsky Hierarchy. Though, if ordered by magnitude, I'd call it higher. I hope, if the goal is natural language, we don't need to aim that high. If you want a computer that computes computers, though, go for it.
> A formal system cannot achieve this by simulating itself.
The type of self similarity used in quines or the liars paradox seems to play an important role in what we perceive as intelligent. Although, I stipulate, the intelligence involved is the ability to tell the difference between a misleading liars paradox and constructively provable quines. Of course, a machine that doesn't need verification from the supervising developer would be akin to a perpetuum mobile.
It is easy to supervise the AGI's output by the less intelligent machines that have to do the output. The focus is on optimizing the processes, not the inability to assert safety guidelines.
Edit: mixed up the order of the Chomsky Hierarchy.
No, it does not. The second incompleteness theorem is provable in first-order Peano Arithmetic.
> Of course I'm in no position to say a similar thing about Goedels incompleteness theorem, and I even referred to it's result in higher order logics, but I still doubt the relevance, as many seem to be ignorant of his former completeness theorem.
I have no idea what you're trying to say, but I assure you that people who do this kind of research are aware of the completeness theorem.
The inability to be certain comes from the fact that we always can ask arbitrarily complex questions about the subject matter. To prevent the halting problem, you must prevent complex questions.
Because there will always be statements that are true which the original system is oblivious to, and thus the corresponding questions will be outside its scope (incompleteness theorem).
Come to think of it, intelligence by a different definition, almost a homonym, is concerned with information. That fits with my demand for zeroth order logical values.
If you're interested in that line of thought and would like perspective on the way perhaps Ramanujan might have perceived it, perhaps you may enjoy this transcript of a great lecture from an Indian holy man, mathematician and poet (Swami Rama Thirtha) in SF in 1903: http://www.ramatirtha.org/vol1/inspiration.htm
>It has kept me thinking what "intuition" really is, how it develops and happens in the brain, and what would be needed to build AI that has intuition.
Douglas R. Hofstadter has written multiple books on the subject. He most directly addresses it in "Surfaces and Essences" co-written by Emmanuel Sander.
It is enjoyable reading and very thorough. Pending revolutionary new insights I might even regard it as conclusive.
So does this mean that the era of automated theorem proving and AI-driven mathematics is finally upon us? I'm no expert, but this seems pretty groundbreaking.
My understanding from the article (while I would like to read the paper, I don't have time at the moment) is that it assigns probabilities that a conjecture is correct and improves these estimates over time. As such, something will only really be proven true when the probability hits 1. The summary says that this will occur in the limit, but that might take as long as proving things the traditional way and mathematicians don't like things that are probably true but not proven.
That said, I can think of a number of uses for such an algorithm. If you load it full of conjectures in your field that are known to be true, it will might help you hone what problems are worth exploring by providing guess at how likely it is you can prove a statement you are pondering.
It's a pretty cool idea but nothing groundbreaking. There is plenty of AI-driven mathematics already, e.g. the Mizar theorem prover. Pollock used to advocate the idea of continuously trying to prove theorems that might be useful later. It's an underrated idea, in my opinion, and his system of defeasible reasoning based on graph theory is still one of the best.
What's new in this paper is the probabilistic component, trying to guess the outcome of complicated proofs. That's a neat idea, but nothing revolutionary. It may give rise to nice shortcuts for better efficiency.
The real problem is making the machine get a good hunch what to prove, so it doesn't find useful theorems just randomly. I'm not working in this field, so correct me if I'm wrong, but that seems to be rather hard. In any case, as far as I know most automated theorem provers are only semi-automatic, you have to give them an idea about which direction to go and which proof strategy to use.
> The real problem is making the machine get a good hunch what to prove, so it doesn't find useful theorems just randomly. I'm not working in this field, so correct me if I'm wrong, but that seems to be rather hard. In any case, as far as I know most automated theorem provers are only semi-automatic, you have to give them an idea about which direction to go and which proof strategy to use
I'm working in exactly this area, and it's very nice to see it mentioned occasionally as a useful direction!
There's a bunch of nice work being done on this problem; I'm mostly familiar with (roughly chronologically) IsaScheme, IsaCoSy, QuickSpec, Hipspec and Hipster. These take in a bunch of function definitions and output equations about them; they work by enumerating (type-correct) terms and using random testing (QuickCheck) to quickly separate unequal terms from each other, then they apply automated theorem provers to the remainder.
There are also more first-order, less computationally-focused systems for generating theorems out there, like HR and Graffiti.
> AI-driven mathematics already, e.g. the Mizar theorem prover
Can you tell us more about how Mizar is AI-driven? I have never worked with it, but my understanding was that it was a fairly normal proof assistant. That is, proofs are written by humans, and some smallish boring intermediate steps are done using a regular first-order prover. Like with Coq or Isabelle.
Does Mizar do something else as well? Does it use AI to make conjectures?
Not really, I know people who work on Mizar but I'm not myself involved. Anyway it's also subjective what you call AI-driven and what not.
I can comment on another thing, though. Even very simple concepts like well-foundedness conditions go beyond first-order logic and these provers are based on pretty expressive higher-order type systems. AFAIK, they can prove fairly substantial theorems.
I didn't say that the proof assistants were first order! I'm well aware that that's not the case. But AFAIK all the automation they use is first order. Higher order proof steps like "do this by induction on n" are always introduced by hand (at least in Coq and Isabelle), even if the rest of the proof is automatic.
I don't understand what you mean by "all the automation they do is first order". They are based on higher order proof theories, e.g. higher order tablaux, and they implement higher order unification. Otherwise they would be first-order provers and thus much more limited.
But Yes, most common higher-order provers are semi-automatic, you need to give them a hint about which proof strategy to use. That's mainly because they are used that way, not any principal limitation. You won't find many mathematicians who are interested in a theorem prover to spit out some (alleged) theorem by itself, and then let the mathematician check whether it's useful.
The only fully automatic higher-order theorem prover that I know of is ETPS, it will select proof strategies by itself if you don't indicate them. But it's also one of the oldest and slowest and mainly just used for teaching logic.
> I don't understand what you mean by "all the automation they do is first order".
I meant that the tactics of Coq that do reasoning for you, and the internal/external provers of Isabelle, are first order. I was probably partly wrong: you are right that some of them do use higher-order unification. But when in Coq I use "auto" or "omega" or whatever tactic to solve a goal, no higher-order tableaux are in use as far as I know. I have to massage the goal until I get it into a form that is palatable to the first-order automatic provers. Similarly, when I write an Isabelle proof like "from A have B by X; from this have C by Y; hence D by Z", the proof methods X, Y, Z are first order, often off-the-shelf SMT provers. Alternatively, there are also some built-in methods that use simple equational reasoning with higher-order unification, yes.
Let me know if I'm wrong about the details of this! Anyway, none of this means that you cannot prove complex higher-order stuff in these systems. You just can't do it automatically.
And, coming back to the start of this subthread, I don't think Mizar is really different in this regard.
Keep in mind that theorem proving was one of the very first applications of AI (e.g. Newell and Simon's "Logic Theorist" in the 1950s), so I would take any excitement with as much salt as any other AI claim ;)
There are a few hurdles to overcome before computer/AI-assisted mathematics really 'takes off', for example:
Almost all mathematics is aimed at a human reader; arguments are written in prose, and formula markup only exists to guide the appearance when rendered, i.e. LaTeX; just like HTML, it's technically all marked up and machine readable, but the semantic information we can extract is very low.
Whilst OCR, etc. will keep progressing, I think the real solution is to have people (or their tools) place semantics first and rendering second, e.g. with formats like OpenMath; to do this, we need to provide compelling reasons, e.g. automated assistance, inclusion in repositories, automated citations for those who use your results, etc.
Another problem is that there are many incompatible systems; if some result is formalised in a different system to the one you're using, your best option is to either switch system or attempt to re-prove it yourself. There are ongoing efforts to provide a more abstract overlay, so that results from one system can be re-used in another (providing their logics are somehow compatible), e.g. https://kwarc.info/projects
Another is how low-level automated reasoning currently is; even something which looks like a pretty clear instruction, like a step which says "by induction", involves such a huge search space that existing algorithms blow up. Working mathematicians, quite rightly, get fed up of the tedium of spelling out each individual step in such excruciating detail. It's just like with software, but imagine that you've spent your career working with a super fast Prolog system with a well-organised standard library built up over a thousand years, and you're then asked to program machine code by flipping switches on a slow machine with no existing software ;)
It's not clear if there are any practical applications to automated theorem proving (and that's not the goal of the work). This is primarily motivated by very "theoretical" decision theory.
The universal semimeasure is "computably approximable from below", aka (lower) semicomputable, meaning that you can computably list out all the rational numbers below a given value assigned by the measure. The probabilities assigned by a logical inductor are computably approximable reals, which (confusingly) is weaker than lower semicomputable; it just means that you can compute a sequence of rationals that converges to the real, with a possibly uncomputable convergence rate.
I'm not an author but this property with respect to a universal 'semimeasure' holds in the limit as time goes to infinity, as do most stated properties of logical inductors. The fact that they're computable is academic.
This is right, but perhaps misleading; most of the properties are "asymptotic", meaning that they may take an extremely long time to hold, but they hold at finite times. For example, "provability induction" says that if you have a (polytime computable) sequence of sentences phi_n, all of which happen to be provable (possibly with fast-growing proof lengths), then P_n(phi_n) limits to 1. This means that on day n the logical inductor P is confident of phi_n, even though phi_n may take much longer than n steps to prove.
The key is that the generator of such sequences has limited resources; once the inductor has learned the (implicit or explicit) program generating these sentences (perhaps by simulation, as in Solomonoff induction), it can apply that reasoning to the individual sentences; e.g. if it believes it's verified the correctness of the generator, that is enough justification to believe the sentences being produced.
This is in the vein of "prediction using ensembles of experts" methods such as SI, with a twist that the experts are traders, not forecasters; they don't have to have opinions on everything the logical inductor has to predict, the traders just have to point out particular ways that the logical inductor is being silly (and then the logical inductor corrects those problems).
Yeah. It's still surprising to me, though; P_n can predict extremely long-running computations, even ones with a much longer runtime than P_n, at least as well as any quickly computable "pattern". (The algorithm in the paper uses a (roughly) double-exponential-time algorithm to predict arbitrarily long-running programs, in a way that can't be improved upon by any polytime computable method.)
At least public practical research. But I've long wondered what their bank account would look like if they had a side venture into exploiting narrow AI for commercial use rather than rely on donations.
MIRI is entirely Blue Team - they work to create theoretical [0] safeguards on AI agents yet to be developed. I've long envisioned a counterpart Red Team that does nothing but build AIs that attempt to subvert these safety features, since the rest of the world of non Friendly AI [1] research is only unco-ordinated para-red behavior.
[0] In the sense of "valid under these known precepts", not "speculative".
I find that perception fairly surprising, as for a very long time it felt like we did more red team than blue team. I do acknowledge that this has been changing recently, but only significantly in the context of building on the results in this paper.
Would you please direct me to an example of MIRI's Red Team efforts that isn't the recent "Malevolent AI" paper [0]? Adherence to the belief that UFAI is a threshold-grade existential risk seems to compel a "define first, delay implementation" strategy, lest any step forward be the irrevocable wrong one.
This makes me think the only thing we disagree on is the meaning of the words "red team" and "blue team" :)
When I say it feels like we spend a lot of time red teaming, that means I think we spend somewhere between 30 and 60% of research time trying to break things and see how they fail.
This is fully compatible with not immediately implementing things - it's much less expensive to break something /before/ you build it.
It is refreshing when only the maps, and not the objects, are under serious contention. I suspect I still might prefer walking a shade closer to the line dividing definitely intra and potentially extra boxed agents, but you are the ones actually in the arena - do keep up the interesting work.
My concept of practical is stuff like: implementing things, doing experiments with prototypes, producing stuff that is imperfect but does something and can be improved upon.
Theoretical stuff is like: proving theorems, conceptualizing the task at hand, philosophical inquiry into the nature of agents/intelligence/reasoning/goals/human values
I'm not trying to argue which is more important, but surely MIRI focuses more on the theoretical.
Didn't the CIA have some experiment where they pooled a bunch of folks that were really good at guessing world events? Their constructions sounds awfully familiar to that experiment.
This paper is a theoretical contribution, not a practical one, similar to Solomonoff Induction. Solomonoff Induction's contribution is essentially "there exists a formal mathematical process that inductive reasoning on evidence corresponds to" which, among other things, pretty much solves epistemological puzzles like the https://en.wikipedia.org/wiki/Raven_paradox from philosophy.
Yeah, the actual performance of Solomonoff Induction is uncomputable, but to me the useful point is that "induction can be done mathematically", and then what we do heuristically in our brains can be thought of as a low-fidelity analog of that. If I'm understanding the page correctly, this is the same idea but for statements based on proofs and logical theorems. Which seems to expand the scope somewhat.
(I'm really excited about this, actually, just as a person who enjoys learning about this stuff from Wikipedia. I feel like I've vaguely thought about how Solomonoff induction would work on statements that are derived from each other (or when combined with type-checking, since type-checking is closely related to theorem-proving), but had no idea how to even ask a precise question much less make anything of it.)
That's interesting I had never heard of Solomonoff Induction until I read your comment. But how important, philosophically, is the existence of such a system, given that the brain already does exist as the "low fidelity analogue".
The limits on what computers can do are related to logical no-go theorems such as Gödel's which are about proofs -- examples of certain knowledge. But once you have accepted the fallibility or "low fidelity" of human reasoning, then all those no-go theorems are no longer relevant in the first place.
Solomonoff induction is useful because without it there's no model of induction with infinite computational resources. "Logical induction" is not useful in the same way because without it we already have such a model: simply prove/disprove the propositions.
And what if the propositions we want to reason about are self-referencing or say something about the system itself in a Godelian way? You need probabilistic reasoning for that to actually work.
Also, "simply prove/disprove the propositions" requires infinite computational resources (we don't know how long the proofs will be or if there are any). Logical induction does not.
I think the purpose of this is to predict proofs ahead of time. Which could be useful in the management of people. As they say in the intro, given a group of specialists, this algorithm can guess, on a daily basis, which of the things they are working on are more likely to be true, and which are more likely to be false. Before the proof is finished.
(Primality is also a logical (analytic) truth and we are satisfied with probabilistic proofs - of course only because the risk is known and controllable.)
That's not quite correct, most prime proving algorithms are non-deterministic in their computation, but they don't necessarily produce false positives (like miller-rabin does), algorithms that only produce false negatives also exist (e.g. ECPP)
Could you elaborate on ECPP false negatives? A properly working ECPP should never give false negatives, e.g. return "prime" for a composite. Since ECPP can give a certificate unlike AKS or APR-CL, the caller could do a relatively quick check of the math.
Some proof methods:
- BPSW. Deterministic, completely correct for all 64-bit values. Purely a compositeness test above, though no counterexamples known. This matches the false-positive idea -- above 64-bit it returns one of "definitely composite" or "probably prime."
- BLS 1975 methods. Relies on partial factoring N-1 and/or N+1 so unless the input is a special form, only practical to ~100 digits. No false results if the partial factoring can be done, and even gives a certificate of primality.
- APR-CL. Deterministic. No false results. Fast and practical up to ~5000 digits (one can debate where the impractical size line is). No certificate.
- AKS. Deterministic. No certificate. No false results. Very slow, so not generally used.
- ECPP. Non-deterministic (randomness is used internally), but no false results. Generates a certificate. Primo is practical up to ~30k digits (depends on your hardware and patience, but 10k digits on modern computers is quite practical). Open source implementations aren't as efficient, but still 1k+ digits is very reasonable. It is possible an implementation might be unable to proceed for various reasons and could return "gave up - no primality decision made" in addition to the choices "definitely composite" or "definitely prime (certificate included)". That's really a limitation of the implementation or the caller's patience.
The deeper motivation for this article is to allow future Artificial General Intelligences to prove facts about themselves; specifically, the fact that the AGI, or a next-gen that it developers, has the same (safe) utility function that the designers originally specified.
A formal system cannot achieve this by simulating itself.
A formal system cannot in general prove that it is reliable, i.e. that if it proves statement P then P is true. http://intelligence.org/files/lob-notes-IAFF.pdf
But with this article, the hope is that a formal system can show that the statement about itself is probably true.