Seems to me that at some point we want theorem provers to be self hosting, as it were.
That of course won't fix problems with bugs in hardware, and hardware that behaves like analog devices (rowhammer, cold boot, data recovery, stray gamma rays...)
However, prover[p-1] can prove prover[p], modulo the bugs prover[p-1]. Then prover[p-2] can prove prover[p-1], modulo the bugs in prover[p-2]. If you can have prover[i-1] is simplier than prover[i], and prover[0] is so simple that any 5-years old can prove it's correct, then you win.
Something seems wrong about this. Could you not describe the combination of all provers as a formal system? Would this system then not run afoul of Goedel?
In practice, it is still worthwhile to have N redundant independently developed provers to prove each other, and to benchmark against each other. This will help uncover subtle bugs that result from ordinary programming mistakes.
This will, however, not protect against deeper issues that are common to all N provers. e.g. Potential biases or mistaken assumptions in the formal verification comunity.
At the end of the day, it is an engineering decision. How much of the available resources are you willing to through at the practical problem of making your implementation approach asymptotically to the theoretical limits. The fact that those limits is indeed a peg or two below perfection is an independent issue.
"In practice, it is still worthwhile to have N redundant independently developed provers to prove each other, and to benchmark against each other. This will help uncover subtle bugs that result from ordinary programming mistakes."
I strongly advise that to deal with both mistakes and subversion. Mutually-suspicious parties in various countries using different logics, compilers, hardware, etc. with a common, executable spec that's basically state machines or functions. That way they get same output from same input.
So, when I saw Milawa was verified first-order, I immediately looked for other verifications of first-order provers (found 1). Then the idea would be diverse implementations of executable parts of that stack with diverse reviews of logic combinations they did. Exhaustive testing done by each party. Even do implementations of key parts of TCB in other logics to check results. The prover with its code is trustworthy when everyone got the same results on same logics, code, shared tests, and individual tests. Plus have paper copies of the resulting text files with hashes stashed away somewhere in case a nation-state wants to throw money at countering diversity via hacking host systems. ;)
Then we rinse and repeat for more complex provers. The fun never stops in this game.
The end result you get out is something like "if the prover n accepts the claim P, then prover 1 also accepts the claim P". In logical terms you can think of it as a relative consistency proof.
In practice, even much simpler theorems would already raise our confidence a lot. E.g., the inference rules of Coq can be stated in a couple of pages of text, but the kernel checker that you need to trust is 14k lines of OCaml. A proof that the checker actually correctly implements the inference rules would already catch the majority of the bugs that happen in practice. Because of Gödel we then can't hope for a proof in Coq that there is no way to use those inference rules to derive False, but that's less of a concern.
> Gödel proved that we will never get such a prover
Not one that can handle Peano arithmetic—but there are plenty of interesting systems sub-Peano arithmetic. Presburger arithmetic (https://en.wikipedia.org/wiki/Presburger_arithmetic) is the one that springs to mind; I doubt that it's directly useful as a theorem prover, but it does show that interesting mathematics can be done in complete theories.
> Right, but the system we're interested in is normal computers, which is more powerful than arithmetic.
On the other hand, a theorem prover does not have to be able to prove everything about a system to be able to prove some useful things. One might also say "the system mathematicians are interested in is all of mathematics, which is more powerful than arithmetic", or, for that matter, "the system programmers are interested in is all of theoretical computation, which is (at least) Turing complete"—but sometimes it's useful intentionally to restrict our power. (I think of https://news.ycombinator.com/item?id=10567408 , for example.)
(If we're being very picky, then, as pfortuny points out (https://news.ycombinator.com/item?id=12762192), a computer, having only a finite address space, isn't even as powerful as ordinary arithmetic; but it's fair to guess that this isn't a useful kind of pickiness.)
This is actually a common misconception perpetuated by zealous Scientific American reporters (and cognitive scientists :) . Many complete, consistent self-justifying axiom systems exist today. Formal verification schemes employing metamathematical logic also exist and can be both complete and consistent. Even MORE systems have been created that can verify their own semantic tableux exclusively (and by extension are consistent & decidable). Even still there are systems like Presburger Arithmetic, which can by proven decidable (although they cannot prove themselves)
On page 12 of the english translation (and possibly the original german), Godel addresses certain criteria for distinguishing a sufficiently powerful system (as well as recursively axiomitizable, etc).
I have personally written such a software verification scheme, and even have an inferior interaction mode for SMT solvers that can help you verify such schemes (it's in my profile & github).
There are also great books like the extremely cheesily-named graduate textbook: "Your Guide to Automated Reasoning", that examine this issue from every perspective imaginable.
Honestly this objection always seemed very handwavey to me, especially given how much abuse the incompleteness theorems endure during extension beyond statements about the Natural numbers. Like, what exactly happens if you try to write a bootstrapped theorem prover? Does a deity descend from the heavens to unplug your keyboard? I'd be very interested in reading a paper or blog post documenting a practical attempt here.
I don't mean to be rude, but this objection only seems handwave-y if you don't understand Gödel's proofs (common), or you have some refutation of them (obviously uncommon, but definitely interesting).
This is a good not-too-technical but not-condescending explanation https://www.amazon.com/G%C3%B6dels-Proof-Ernest-Nagel/dp/081... (it does get into the technical details; it just doesn't go line by line as you would if you wanted to recreate the proofs).
As long as you have the expressive power of Peano arithmetic (which all programming languages ought to, and definitely all proof systems ought to), you can find a Gödel sentence in the system and thus prove that it can't prove, of itself, "x is consistent".
It's worth looking into, if only for the enjoyment.
It isn't at all obvious that "a theorem prover proving itself correct" and "an axiomatic system proving its own consistency" are the same thing, beyond both being in the category of things which refer to themselves in the domain of mathematical proofs. Software correctness isn't the same as consistency. A program is correct if it refines its specification, and an axiomatic system is consistent if it does not lead to contradiction. Before doing a deep dive into Gödel we need to firmly establish we're actually looking at the right problem.
Even if Gödel is shown to apply in the strict generalized theoretical sense, it might be irrelevant to practice. Theory and practice often differ enough for practice to be very useful; we easily prove halting for most programs we care about, compute solutions to average-case NP-Complete problems without a noticeable rise in CPU temp, and (as the other poster at this level said) write compression algorithms which work well on all relevant inputs. This is poor analogous reasoning, of course, but I include it because demonstration of universal practical pitfall would be very interesting in much the same way specific manifestations of availability loss in fault-tolerant consensus algorithms are interesting.
Really though, I am tired of the above exact exchange playing out in every thread I've seen here on formal verification. Someone raises the obvious concern of bugs in the theorem prover. Someone else raises the bootstrapping solution, cribbed from compilers. Someone else raises the Gödel objection; and there the conversation dies. Dig deeper.
Good write-up. I hear you about the patterns of debate that prop up in these threads. My favorites are the ones where people show up with specific logics, techs, or R&D suggestions that others and I learn from. Such an approach might lead to people solving hard problems over time.
My favorite counter to the Godel thing, esp about termination, was sklogic showing how far from reality these concerns were by illustrating that a while loop counting down toward zero with "stop if zero" condition will guarantee a termination of any algorithm working piece-by-piece in inner loop so long as hardware functions. No magic math at all required to defeat the termination requirement in real-world. I added watchdog timers or even shelf-life of modern parts can do the same.
I did something similar when I was concerned about infinite loops or DOS attacks from malicious input. Just box it up in something I know will work with something checking up on it. Another was algorithm theory telling me QuickSort performed better but choose HeapSort if worst-case is a problem. A smarter programmer told me to just time average for QuickSort, run it with a watchdog, kill it on any instance it takes too long, and use HeapSort for that instance. Those are the kind of tips I appreciate about these terrible problems the theoretical side brings me. Hell, if only theory people would find, generalize, and expand on all the effective cheats in engineering instead of idealistic or abstract stuff. :)
But doesn't the constant Gödel reference push people, exactly, to work toward effective cheats? Imagine if the people putting in effort to make real systems instead spent their time trying to create the self-verifying prover.
No it distracts people by making them think some legend in math showed verification was provably a waste of time. The number that show up to keep saying it on reputable forums like this one leads to increased belief it's true via repetition effect. So, it's overall bad vs mentioning something relevant to practical verifications.
Plus, hardly anyone is making a self-verifying prover. Someone just asked about that. I tried what I hope is the proper response of simply linking to a self-verifying prover that succeeded in that goal up to first-order logic. So, that's already done with us just needing similar ones for checkers of HOL, Coq, ACL2, etc. Contrary to your implication, the Milawa techniques are based on and supporting some of most practical verification work to ever be done. So, one of few times someone totally ignored Godel in what non-experts would consider Godel's domain led to one of best results in formal methods.
We don't need Godel at all in these discussions. We just need to explain what formal methods are, their successes, their limitations, how to properly use them, and ideas for new projects.
I understand the frustration, but the Gödel objection will always be raised as long as someone asks for a prover that can prove itself correct (read "proves false statements or fails to prove true statements," which just is consistency). To put a finer point on it, once you get how "an axiomatic system [can't prove] its own consistency," it actually is rather obvious that "a theorem prover [can't prove] itself correct." How would it? Without axioms or classical logic? What's it doing for the things that aren't it itself?
Notice though, Gödel's incompleteness didn't end formal logic (all the better for Gödel). I know we're in a public forum, but I would note that your "But what about the practical applications!?" is misplaced (though not poor analogous reasoning--I think you're hitting the nail on the head, honestly). Formal verification is not futile because of Gödel's proof. However, a self-verifying formal system is--which is a useful insight, especially for those interested in formal verification.
As an aside, I don't think you need to convince anyone that's actually recreated Gödel's proofs to approve of continued work on formally verified code. We would get excited about that stuff even if it weren't practical.
You do know that Shannon proved that general purpose compression is impossible, right? Why bother if that 75% compression ratio is provably impossible?
The incompleteness theorem effectively states that some statements will require an infinite amount of time for this bootstrapping approach to work. The only deity involved is the finite-ness of the natural world.
You don't need to prove that Peano arithmetic is consistent! You just need to prove that the theorem prover doesn't contain any bugs, i.e. that it correctly implements peano arithmetic. That's a much lower bar, and theorem provers have verified themselves in the past.
Mmmhhhh.... Arithmetic in computers is bot infinite, so one might have a consistence theorem for "computers with arithmetic modulo 2^3000" for example.
Your exponent is a bit too small, though; it should be 2^(number of bits of RAM), like 2^34359738368 for a computer with 4 GB RAM (plus a little bit for CPU registers).
Assuming that the computer programs under study aren't allowed to interact with a disk, of course. :-)
Just port Milawa to run on multiple architectures with separate CPU's or implementations from multiple vendors. Include verified, VAMP processor on two FPGA's. Check Milawa itself then check the software in FOL until HOL is done. I have a paper somewhere on HOL-to-FOL converter, too. :)
Note: I've seen a bunch of things implemented in Prolog. I figure it could be modified to execute the program and produce an execution trace of its logic that could be run through Milawa. Not specialist enough to say. A prior, certified compiler for Pascal-like language was specified in Z shown equivalent to Prolog functions. So, there's potential here.
It's impossible to guarantee that your theorem prover is bug free -- for example, if it has a bug that causes it to always return "correct", then it proves its own correctness, even though it is clearly wrong.
This is not to say that self-hosting isn't useful, since most bugs are not likely to be of this form.
It's always been there. The messiness & limitations of analog were actually main the reasons for digital being invented in the first place. Then the digital stuff had issues but way easier to surmount with formal methods. Or just computationally like heuristic placement of parts. Each problem that was found could be dealt with using known algorithms or verification techniques. There's been many correct-by-construction methods to building stuff. So, we just apply that without the weird, over-complicated additions that happened largely for marketing reasons. The result is a highly-correct, reference implementation there for verification or checking the optimized ones.
Now, the analog and RF portions are The Devil. These will be a (censored) to get correct with automated means. Already are with manual methods. There's been some progress in synthesizing and verifying them. Still mostly ad-hoc, manual work. Best bet is analog portions... much as one can use... happen with simple components on as old a process node as you can use. Less design rules, less weird physics, more reliability due to maturity of processes, and more synthesis results. Either whole thing is done on old nodes or we can pull a Tekmos doing old (eg 350nm) for analog with new (eg 90nm) for digital part with careful integration.
EDIT: To address other commenter, put it behind EMSEC shielding to address that. The analog components can also be designed for fault-tolerance against things like cosmic rays or voting configuration used to correct errors.
I think the solution is to formally verify your hardware with a prover that is aware of electromagnetic forces. Software has to assume correctness of hardware (rad-hard applications have to assume ridiculous hardware faults are possible, but that's to prevent solar flares from corrupting firmware, not Rowhammer).
Thought experiment; you are some sort of intrusion prevention system. You've detected that your hardware is untrustworthy. Now what?
Think of it as a double checking your code. If you had a bug per 100 LOCs you can reduce that to a bug per 1000-10000 LOCs. Which is achievable with other methods too, without the burden of formal verification.
Are the other methods for reducing errors to one bug per 10000 LOCs measured to be less burdensome? I've read with interest about how Jet Propulsion Laboratory produces code with low defect rates for spacecraft (without formal verification), but it appears to be a pretty drastic departure from ordinary commercial practice too.
All three have some upfront time in specification with Praxis doing refinements, too. However, they all knock out tons of debugging and integration problems since errors don't slip in at all. Altran/Praxis charges 50% premium on top of normal contract so the extra labor is probably less than 50% given they're profitable. Cleanroom anecdotes showed the reduced debugging and maintenance issues meant it actually barely cost anything extra in time or money. A few outliers had it saving time.
There's also a number of people and companies that are encoding business logic into specs that they implement with logic programming languages. This eliminates mismatch between business requirements and code with quite a bit of productivity and safety advantages if it works in one's use-case. I haven't evaluated their effectiveness but here's two:
Which is great given they're precise statements about what you want to do that are easier for humans and computers to analyze. They don't have the language-level problems either. Tools can then turn those into code or do much of the work from there.
If your spec is wrong, your code will be wrong anyway. More often, people use the code as the spec which is IMHO (for this kind of programing) really bad.
Technically, 3) No bugs in your proof checker. A bug-infested theorem prover can still be useful.
As long as I'm well-actuallying, 2) The proven properties establish my own informal goals. If a hacker's goals are also satisfied, well, maybe they want to DDoS the internet but it's Not My Problem.
Anecdote – Erlang designer Armstrong does contract work for formal verification with high-assurance applications. A widget-maker for a large European car manufacturer wanted to ensure their widget was fully compliant with all of the specifications as issued by the overseeing administrative authority (ISO, ANSI, whatever). He and his team of engineers sat down and took 1k +/- .2k pages of written specifications, translated them into invariant properties, and let 'er rip. Of the ~foo bugs they found, ~40% of the bugs were internally inconsistent bugs within the specification. [An average car of model year 2016 has somewhere between 500-800 MCUs within them. Scary prospect…] So, lets say LDAP's RFC has 3 'bugs' in them (a proposition: p in section one, then a proposition: ~p in section 3). Microsoft wants to implement Active Directory to have LDAP compliance to work with Kerberos, Apple devices, etc etc. By definition, this implementation will be buggy. These bugs are of a class which have no fix!
... if you want to read some formal proofs from the software and mathematics worlds, see https://www.isa-afp.org/
Haha, that was my comment! Glad you liked it. Never thought it'd show up on HN. I normally don't comment on Quanta, but there were so many half-informed remarks propagating misinformation that I felt compelled to make not one, but two, comments in that thread (and formal verification isn't even my domain of expertise!)
The practice of formal methods has progressed by acknowledging and accepting those limitations. It now attempts to prove useful properties of your code, rather than stamping it "correct" or "incorrect."
FTA:
“We’re not claiming we’re going to prove an entire system is correct, 100 percent reliable in every bit, down to the circuit level,” Wing said. “That’s ridiculous to make those claims. We are much more clear about what we can and cannot do.”
Yeah, they really need to knock that off. Formal verification creates hard-to-hack code is still a decent headline. Or "Researchers using formal verification stopped hackers in their tracks." That's entirely accurate plus can be followed by specifics to understand what context it was true and what contexts it might be true elsewhere.
Hacker-proof is just going to hurt branding of formal methods the second a hacker takes out a high-assurance system. Whereas, the formal methods and high-assurance literature written by anyone with sense acknowledges plenty of assumptions that might be targeted plus just human error. Even the people they quote in the article don't say it's hacker-proof!
It's true but that's a specification problem. The hardware should be in the model since it's in the TCB. The earliest tools used for high-assurance security, like Gypsy Verification Environment, had information-flow analysis for detecting covert channels. Today, we have information flow control, distributed version of it, and hardware versions of non-interference that go to the gates. We're actually past being able to handle that with current tooling if ultra-high-performance is a non-goal.
It's analog and RF side channels that's going to screw them. DOD et al did decades of cat and mouse games under that with their TEMPEST activities. Open research just getting started. Just waiting for The Flood. :)
See: Microsoft Outlook deciding that executable attachments were a good idea. The feature worked according to spec (I assume), but wow did that ever open the door to a whole lot of hacking...
> "Beware of bugs in the above code; I have only proved it correct, not tried it." - Donald Knuth
"Beware of people slinging around the famous Knuth quote as if it invalidated progress in program proving." - Me
(Not a reference to you personally, but I think that this quote is too often used to disparage all work on program provers, rather than as just—as I take it to be—pointing out that, no matter how good your prover is, you should still check your code.)
That and the Godel one above. For first point, they could link to a presentation on high-assurance systems showing one needs a combination of specs, proofs, testing, and human review for max assurance. For the second, the alternative is showing how the prover problem is greatly reduced by using simple, proof checkers and/or logics in the process that are easy to verify by hand.
This "quote" is probably bogus. When I was at lunch with Knuth once (brag, brag), someone asked him where he said that. Knuth said he couldn't recall saying it and he doubted that he ever said it, because he would not attempt to prove code without also testing it. Then he said it might be a paraphrase (not a quote) from an analysis he made of an assembly language program where he did not have hardware with that instruction set available to him.
Side-channel attacks, attacks that depend on physical characteristics of the system, are a thing [0] and formal correctness doesn't really prevent them.
Also there's crazy stuff like Row Hammer [1] which depends on escaping the abstraction that memory bytes are independent entities.
While these are maybe crazy or extreme attacks, they are still practical in some cases. Any abstraction can eventually be hacked simply because an abstraction is a simplification of a more complex system which can be manipulated.
Reminds me of the joke of the junior functional programmer convinced that their code is side effects free and who then proceeds to crash the garbage collector because memory allocation is not considered a side effect...
The military, EE's, and even RAM suppliers knew about those issues. There are products in high-security (esp Defense) that try to counter as many as they can afford to. The problem was the knowledge was ignored or intentionally not applied. The first just keeps happening in general in so-called INFOSEC. The second is a result of market incentives where the suppliers' goal is to hit specific performance, energy, area, cost, etc targets to maximize market share and profit. On cost, they also want to minimize that in general for the executives and shareholders to pocket more income. This really affects RAM vendors as people involved in QA have said on various forums that they're basically just pushed to get shit out the door as quickly as possible since competition is intense. That they do this with ever-increasing complexity in designs and physics in shrinking nodes means problems can only go up.
So, it's not a new discovery in terms of hardware correctness or even an accident. These problems are intentionally there to increase profits. Compare that to the design of things like 1802, NonStop, AAMP7G, rad-hard ASIC's... all that trade off marketable criteria at higher prices to get higher reliability. They exist but almost nobody buys them. That's why such practices aren't and won't be status quo.
EDIT to add NonStop link to show what earlier systems accomplished with careful hardware and software design. Similar, even more clever, techniques were used inside ASIC's as well.
The point that I was trying to make is that they have proven correctness within a specific abstraction, but the article incorrectly say the code is "hacker-proof". Hacking in most cases happens because we can break through abstractions.
"The point that I was trying to make is that they have proven correctness within a specific abstraction, but the article incorrectly say the code is "hacker-proof"."
The title is terrible as several of us argued here. The article itself at least describes the process accurately with many quotes talking about how tricky it is or that you should test it anyway. Just a terrible title...
No new breakthroughs in formal proofs or deriving programs, just incremental progress. The cost of formal methods stays high. The high cost means that it does not work well with iterative programming, or changes in specs.
So far only some military and safety critical applications and parts of Intel's silicon have been worth of the cost.
I took my first course in formal methods using the book Programming: The Derivation of Algorithms by A. Kaldewaij. It would be perfectly reasonable to implement something similar to Kaldewaij's guarded programming language today over some other programming language and use it to derive formally proven programs if someone really values the correctness. It would be also cool and fun but not for everyone.
"No new breakthroughs in formal proofs or deriving programs, just incremental progress. The cost of formal methods stays high. The high cost means that it does not work well with iterative programming, or changes in specs."
That's a load of crap. SPARK lets you practically throw together embedded programs with code-level proofs. CompCert was a huge leap over the non-optimizing, toy compilers that came before it. CRYPTOL specified algorithms at high level with C generation. LANGSEC is automating secure parsers. Another did protocols. The seL4 project included AutoCorres that can pull HOL specs right out of C projects. Verisoft gave us a full stack with processor (VAMP), C-like language, compiler, OS, and apps. Myreen and Davis have verified ASM, LISP, ML, extraction, provers, and hardware. Rockwell-Collins has a flow from Simulink models to specs and SPARK code for their mathematically-secure AAMP7G. Recent work verifies non-interference and other properties down to the gates.
There's been a ton of work knocking out whole categories of problems with no effort followed by other work that lets us do full verifications with a fraction of prior effort. You should read up on successes like those before claiming nothing has been going on. Heck, lets skip to one of the best where non-experts verify an ext2 filesystem in fraction of the labor it took to do seL4:
What you describe is software tool chains that can take formally verified (or partially verified) code and spit out lower level code that is also formally verified. There is no theoretical breakthrough there just lots of new tools.
I think you should describe what counts as a theoretical breakthrough to you. The rest of us seem to think that writing some flowcharts or nearly English code in 10 minutes with a verified application coming out is a mix of great advances in theory and tooling over the years. Sure didn't do that in the COBOL days.
Another example is the K framework where they just use rewrite rules in a decidable logic with some supported tooling. That specific theory, their logical strategy, let them do a whole semantics for C with undefined behavior included in just a few thousands lines of specs. In form of KCC, it can also be used as a compiler.
The oldest one, from co-inventor of software engineering, let you specify functional, structural, and resource requirements in a logical notation with the tool synthesizing about everything else for you. That was 001 Tool Suite. The thing is, I can only imagine tools getting much better than that because any development will require human brains turning problem descriptions, constraints, etc into precise statements. Which is what specs are. That we're already at spec-to-code synthesis and proof for some of these tools seems like something that can't be hand-waived away as "no theoretical breakthrough there just lots of new tools."
>I think you should describe what counts as a theoretical breakthrough to you
It would be some new idea in proof theory that would produce significant reduction of required manual work in practical applications, making it easier for humans to work out the proofs and proof assistants help humans.
Linear logic, affine logic, etc. can provide incremental advantages, not breakthroughs.
I hope that one of these days (years?) we might start seeing formally verified versions of code for dealing with standards that don't change rapidly and that have proved to be security problems in the past. I'm thinking of things like font rasterizers, TIFF readers, full PDF readers, and JavaScript engines intended for browsers.
The problem with many of these formats is they're Turing-Complete languages with input driving what they do. The input is also malicious. So, the solution so far is to isolate and mediate such things with verification going into the mechanisms for that.
An example of one for exactly your types of programs is Microsoft's Xax:
My preferred approach is the old-school method of highly-assured microkernels that just separate everything else from one another. Here's a browser OS based on that principle:
The kind of verification they undergo varies considerably. The amount of errata in modern CPU's shows both they're not fully verified for correctness and that the verification they get works wonders given number of components & interactions. Even if fully verified, the verification would be that the Turing-Complete hardware properly implemented a model of the Turing-Complete hardware.
That kind of buys us nothing in terms of whether we'll know if programs are malicious or contain them, eh? And look at the giant, losing game that followed between the mice and the well-fed cats. It's why safety-critical systems often used deterministic, state-machines running on simple MCU's or PLC's. :)
In the examples you mention (document readers) the bugs become security problems because programs have too much privileges. Easiest way to remove security problems is not to remove bugs, but remove privileges. Privilege revocation, privilege separation, W^X work wonders.
Protocols like https, ssl and cryptography and operating systems would be my first choices.
Curious if there is an opposite process: having some program, ask some prolog-like system if given program has any vulnerabilities, and how to exploit them?
You're basically describing static analysis but with Prolog and a weaponize option. I haven't looked up bug-to-attack research in ages but last I did it was generating exploits from patches. That one was pretty evil if one is aware of why delaying patches often makes sense for mission-critical apps. They need testing as they can do damage themselves. So, instantly turning even a chunk of patches into exploits can give an attacker an advantage in that windows between release and application.
1) Your model of the world corresponds to the world in which the program runs - for example, hardware is usually assumed to be bug-free.
2) The properties you've proven the program to hold are sufficient to stop the hacker from accomplishing their goals.
3) No bugs exist in your theorem prover.