Hacker News new | past | comments | ask | show | jobs | submit login
How Did Software Get So Reliable Without Proof? (1996) [pdf] (gwern.net)
111 points by tosh 38 days ago | hide | past | favorite | 123 comments

It didn't - but most errors are simply tolerated because imperfect automation still has absurd economy of scale, very few applications are on a regulated field or have a well-defined quality standard to meet, and unreliable software tends to at least fail consistently, so it's still a win to diagnose and fix processes compared to humans making creative mistakes.

The other major factor is fault tolerance. Software has bugs, and hardware breaks, but that's accepted. Programmers in industry have invested a lot of time in routing around the problems. That buys time for human operators to get paged and fix things.

For every major outage that makes the news, there are thousands of smaller problems that users never notice because of CDNs, failover, graceful-ish service degradation, autoscaling and/or overprovisioning, and fallbacks. Services can be reliable even when the software and hardware that backs them is not.

In other words, "reliability" is a property of the whole system, including the human operators, not just the code. The code can actually be pretty bad, yet the entire system can still be reliable in terms of being available and functional when the users want to use it.

It is also replacing humans most of the time as well, who are prone to making all sorts of errors. Worse, humans can make the same error twice even after being corrected.

Software can also make the same mistake, thousands of times per second or more :)

"same" is the operative word. Easier to correct the same mistake every time than a different mistake on unpredictable executions.

Those types of errors are usually pretty easy for humans to find and fix.

The phrase "imperfect automation still has absurd economy of scale" is more insightful than many full-length books.

> because imperfect automation still has absurd economy of scale

That's what FULL AFK botting in games taught me, kinda random % values, so in my experience it was

Automating 80% of stuff is easy, can be done in 1h

90% is starting getting tricky

95% requires you to spend days of testing and writting various scripts

100% is additionally limited by BOT API and would require you to write your own stuff and talk to bot via e.g file system and try to handle things like client/bot crashes, networking maybe, damn...

But yea, having to do something once a day for 10min instead of playing 10h is great enough

Just to clarify: those are private servers where everybody's botting.

> Just to clarify: those are private servers where everybody's botting.

I don't think that you need that clarification. The greed of many game developers doesn't have any borders. Artificial delays that force you to spend money to work around them, ridiculous drop rates etc - I, personally, would never judge badly someone who bots, if its doesn't affect other players.

Cheating (such as aimbots, wallhacks, radars, nethacks etc) is totally different story though

Which games, out of curiosity? Sounds like a whole new way to enjoy games.


Some random thoughts because I'm kinda busy

Game is great and insanely addictive(seriously, I can talk about this more if youre interested), but community on private servers is insanely toxic and uneducated (I don't really like saying this like that, but seriously that's something you tend to do not experience in other games) due to... well, that's interesting topic.

This game has relatively high entry level because it's like 23 years old game which has A LOT of content, even more if you do not use spoilers for quests/missions.

If you don't play on private servers, then you have to buy "premium account" to be able to play the game to its full potential which costs like 10 euro for a month

Dying in this game is very punishing, but it used to be worse like decade ago because nowadays you can buy ingame_currency relatively cheap. Dying decade ago without "protections" could waste your like 2-3 hours of exp and some items, add to that fact that you can be killed on PvP servers by anyone "just because" :P

This game is also interested from programming perspective -

Private servers are an initiative by programmers to recreate this game engine and host their own server. Nowadays it's mature.

One of the most popular private server engines: https://github.com/otland/forgottenserver

It's really impressive project.

When it comes to botting FULL afk scripts

You have to write code which goes to the exp place, don't get stuck somewhere, exp there for a hours (using spells, drink health/mana potions, take loot from corpses, run on the more or less complex spawns (map), go back to the city if you don't have potions, sells loot, take cash to the bank, and repeat

Many things can go wrong during that cicle - like other played blocked your "road" to the exp for 1min, or you character just died due to PK(players), lag and maaaaaaaany other.

Your script has to consider a lot of those different conditions and trade offs (sometimes you don't want to kill monsters that are between your city and your spawn because it would take 10min, so you just want to run through them, but you can get "trapped" (sorrounded) by them, so you have to consider this)

It's not easy to do this good enough that you can start bot on VPS and it will running for a day, week or something similar.

Correct software has surprisingly little business value.

Well, actually it did.

Some of the reasons are:

1) Most applications use very reliable databases as a store. Back in the day, programmers used to write their own one-off BTREE libraries, which stopped with msql/mysql, jet, dbm and sqlite.

(A holdover from that era is when interviewers asked you to roll your own sort algorithm, which you should almost never do for production apps.)

2) Some software is fuzzed.

3) Some software has managed memory.

4) Some software has tests.

5) Some software is formally verified

Formal verification is huge these days I would like to think.

Huge? I have worked in this industry for a while now and never heard anyone outside of HN even mention it. It’s decidedly not huge for any definition of huge (outside of academia maybe?)

The "big production" example that kept being referred to in university was a (Belgian?) metro system. I think it was well suited to formal verification because it had a finite number of valid states, and the cost of failure was sufficient to justify the investment.

EDIT: I now think it was French, source https://www.prover.com/portfolio-items/ratp-paris/

It's used to some extent in the aviation industry for flight-critical software. So maybe huge in some contexts.

From the conclusion:

> Formal methods researchers who are really keen on rigorous checking and proof should identify and concentrate on the most critical areas of a large software system, for example, synchronisation and mutual exclusion protocols, dynamic resource allocation, and reconfiguration strategies for recovery from partial system failure. It is known that these are areas where obscure time-dependent errors, deadlocks and livelocks (thrashing) can lurk untestable for many years, and then trigger a failure costing many millions. It is possible that proof methods and model checking are now sufficiently advanced that a good formal methodologist could occasionally detect such obscure latent errors before they occur in practice. Publication of such an achievement would be a major milestone in the acceptance of formal methods in solving the most critical problems of software reliability.

I feel like this presages the way TLA+ (in particular) has had a relative surge among practitioners.

Because reliable and proof are not the same concepts and one is not needed for the other. Proof is logical certainty for all time ( in the past, present and future ). Reliability is not.

Reliability doesn't require mathematical proof - this goes for airplanes to your kitchen faucet to software. Software became more and more reliable through testing, errors, fixes/patches, etc. Also, many times good enough is good enough. We don't need the perfect for most situations.

Also, it's a subjective call too. For some software is an unreliable mess and we need to throw everything out and rebuild it from scratch. Search for haskell, rust, etc threads. The neverending arguments against side-effects, mutation, pointers, etc. For most, they are good enough and get the job done.

The control systems for airplanes often have a significant number of mathematical proofs.

Formal methods aren't as widespread in safety critical industries as you might hope or believe. Model-based testing is still considered cutting edge, never mind formal verification.

Formal verification is only as good as your model, anyway. Problem comes from outside your model? Formal verification won't save you.

My father-in-law works on control systems for airplanes and I get the impression that it's based much more on testing in flight simulators than on any amount of proofs.

The level he works at might be higher level though. It sounds like control systems for airplanes are fairly modular.

Donald Knuth quote: Beware of bugs in the above code; I have only proved it correct, not tried it.

If it is possible to use proof that is lovely; but it isn't necessary. Things have been engineered to very high levels of reliability with only the most tangential reliance on proofs.

Software is highly unusual in that proofs can even be deployed. Most highly safe processes can't even use proofs. How do you prove that a dam won't fail? It is impossible.

Sure because the alternative is death

Me shit posting is not a life or death situation so a bug here or there is fine

The bolts don't, and yet they don't (usually) fail.

Every single bolt on a plane is tracked from the steel foundry to retirement with serial numbers and testing and certification and everything.

What do you mean? They're designed and tested according to calculations done by mathematically proven methods.

But that isn't formal verification via mathematical proof. And this is the crux of the flaw with the paper. There are lots of ways to reliability. Mathematical proof is just one.

Proofs are only valid as long as the requirements don't change and assumptions are still valid. Two things that any first year CS student could tell you are absurd to even imagine.

You seem to be dismissing formal methods, but they've been used successfully in various domains.

Late to the party, sorry for the delayed answer. That's a very important observation you are making there. I think another one to complement it is that reliability does not make any assumptions about underlying systems performing as advertised, including the hardware. Many (all?) formal verification methods do make that assumption.

A proof is only performed under certain assumptions and is far from a logical certainty for all time.

Proof is how the authors of this paper discovered that OpenJDK's java.utils.Collection.sort() was broken


I'm not sure if that's a victory or a loss for formal proofs. Sure, it found the bug, but for a language that popular, it wasn't a very meaningful bug, and sorting algorithms is one of the easier things to analyze.

For algorithms without provided proof, we can only estimate their correctness based on how battle-tested they are. If an algorithm has been run many times and it has been correct every one of those times, then we feel somewhat confident that we got it right. Over time, we'll find weird edge cases where it doesn't work, and fix them.

Few algorithms are as battle-tested as the sorting algorithm from the Java standard library. I can't even estimate how many calls are made to this function every day. The fact that we can still find bugs in it, no matter how obscure, suggests to me that we simply can't write correct nontrivial software without proof.

The victory isn't in showing that this specific algorithm is incorrect in some fairly obscure way. It's in the fact that even software that's this battle-tested is still not fully reliable.

> even software that's this battle-tested is still not fully reliable

But does it matter?

I also don't understand formal methods well, but wouldn't they also be subject to bugs? It seems like all we can do is make software less unreliable.

> But does it matter?

That depends on the application. It probably doesn't most of the time. My point is not that it is always economical to use formal methods, just that I suspect their cost is overestimated compared to the benefit (because most programmers are simply unfamiliar with FM).

> I also don't understand formal methods well, but wouldn't they also be subject to bugs?

Yes, they are. There are multiple classes of errors FM are vulnerable to.

1. In FM you prove your code correct with respect to a specification. But you can make errors in the specification too. The arguments for FM are: a) the spec is much smaller than the program, so the opportunity for error is lower, and spec errors are more obvious; b) often if your spec is wrong, you'll find out when you try to prove your algorithm; and c) combining testing with FM is very likely to uncover whatever errors are left.

2. If your proof of correctness is not mechanically verified, then you may still have trivial implementation errors like mixing up variables and the like.

3. If it is mechanically verified, then theoretically there's still opportunity for errors. The verifier might contain a bug, the compiler might contain a bug, the hardware you run the algorithm on might contain a bug, cosmic radiation may flip a bit during the verification/compilation process, invalidating the correctness, etc. I can soapbox about the implications of this but I will only do that when prompted to :)

So indeed, there is no method that gives you zero probability of failure. But you can get the probability orders of magnitude lower, which may be worth it depending on what you're working on.

The first sentence of the article begins "Twenty years ago", and yet the article itself is undated.

This appears to be entirely normal for scientific papers.

Can anyone tell me how this foolishness came to be considered acceptable among scientists, and why it persists?

Because traditionally, articles were published in journals, and journals had the date written on the cover, so it was superfluous to add it on every page.

That seems rather sloppy to start with: you could easily end up a year or two out (the date that matters isn't the date of publication, it's the date that was in the author's head when they wrote something like "Twenty years ago").

In any case nowadays I think there's no excuse for this practice continuing. It's far too common for a paper to get sent around, or put on the author's personal website, with no associated date at all.

Yes. It's become a real problem. Sometimes you have to look at the references, which do have dates, to get a rough idea of when the paper was published. Or at least what date it could not have been published before.

Some journals have the date of submission and the date of acceptance in the paper header for this reason. Unfortunately, it is not a common practice.

> you could easily end up a year or two out (the date that matters isn't the date of publication, it's the date that was in the author's head when they wrote something like "Twenty years ago")

Doesn't matter. If the author was thinking of a specific year, they'd say the year. If they say "20 years ago", everything that was "20 years ago" this year is also "20 years ago" a couple years from now.

It's common for papers to include words like "recently", or indeed "has not yet", which are rather more time-sensitive and still don't specify an exact year.

Presumambly the PDF doesn't include the date because it is camera-ready copy for a journal (in this case the filename includes the year of publication).

In general, I'd say people are sloppy about this because publishing is like a super-public form of notarization, which has the effect of placing the name of the article into public indices and databases, most importantly the list of contents of that journal. It's not that people don't care about the date—it's that they have no difficulty finding out what it was.

I observe that the same is true for the paper's authorship information, but nobody seems to have any difficulty placing that prominently near the top of the document itself.


1996 is in the URL.

This is less a science thing and more an Internet culture thing, IMO.

Documentation norms online have changed with variablized URLs full of computed data

This was published in 1996, where the journal was dated. https://link.springer.com/chapter/10.1007/3-540-60973-3_77

It depends on how you come across the paper. Peer-reviewed scientific journals (as well as some preprint channels that aren't peer reviewed) are fully explicit about the date of publication. However, if those same papers are published on another website, they might lack metadata such as date and publication venue. This is a good reason to link to the actual publication source instead of to a PDF copy of the paper (unless you have good reason not to, such as a paywall).

PS: in this case: it turns out that the paper was published in the International Symposium of Formal Methods Europe, 1996: https://link.springer.com/chapter/10.1007/3-540-60973-3_77

> This is a good reason to link to the actual publication source instead of to a PDF copy of the paper (unless you have good reason not to, such as a paywall).

It would still be best to include the date inside the artifact itself instead of relying on external and contextual metadata like you are describing. As long as we're at it, documents should also ideally include their DOI.

1996 predates most of the web as we know it today. It was a different time with different standards.

I agree. My comment wasn't about shaming the old standards but about establishing new ones.

I presume it's a slow feedback loop. It's always fine immediately after you do it, and I don't think brains learn very well from small mistakes we made years ago.

First: Software got so abundant without proof. How much of an effort is a formal proof of a program? I suspect that it's at least as much effort as the total effort of writing the program, and maybe several times that. If I'm right, then formally proving all programs would result in more than twice as much effort, which would mean less than half as much software. So, think of all the programs that you interact with in a day. Now think of a world where half of those programs simply do not exist. Is that better than this world of buggy programs? Or worse?

Second: Programs got so reliable by killing the bugs that made them most unreliable. That is, if a program has two bugs, one of which 10% of people hit, and one of which 0.0001% of people hit, the 10% bug gets fixed. The program is still buggy, but it's buggy in a way that most people never encounter.

And third: For (many kinds of) formal proof, you need a formal specification. Much of software is somewhat discovered. (What's the right user interaction flow for this, anyway? Let's try some ideas out and see.) If you write the formal specification before you have really discovered what the software should be, then you can have formally-proven software that does less-than-optimal things.

And then you can have bugs in the specification. (Hello, MCAS!) Formally proving that you correctly implemented an incorrect spec does not make the software more correct.

> First: Software got so abundant without proof. How much of an effort is a formal proof of a program? I suspect that it's at least as much effort as the total effort of writing the program, and maybe several times that. If I'm right, then formally proving all programs would result in more than twice as much effort, which would mean less than half as much software. So, think of all the programs that you interact with in a day. Now think of a world where half of those programs simply do not exist. Is that better than this world of buggy programs? Or worse?

I don't think it'd require twice the effort unless you insisted on a full and complete formal proof of the entire program. Which most contemporary advocates of formal methods do not want (and this paper basically concurs with). Rather, see TLA+ where the advocation is for people to model critical parts, or the concurrency model, or the synchronization model of their distributed system. You can elide so many details that it's nowhere near the same level of work as actually writing the full program, but can inform your design so that you spend less time reworking later.

OTOH, I've seen lots of programs where the testing comprises at least as much effort, if not some small integer multiple, as the actual coding portion. But no one is advocating for removing tests (though some people strongly despise, if not outright hate, certain test approaches).


OTOH, the less you prove correct, the more bugs you leave, and you get people asking "How did software get so reliable with so little proof?"

On the third hand (I think that's the number I'm up to now), types are a kind of proof, and I really do want those. So I think there's some kind of curve, with no proof giving you lots of bugs (which are expensive to fix, and bad for your product's PR), and full proof taking forever and costing a ton, and somewhere in between being the sweet spot.

And where is that sweet spot? I would say at least type systems, but less than full formal proof. Somewhere in between, which leaves a lot of room and therefore is not much help. Worse, where exactly the sweet spot is probably depends on several things, which means we can't give any simple answer.

Uncle Bob makes some interesting points about software testing and scientific methodology. His main point is that science is mostly not about proving things correct but about attempting to find ways to falsify an hypothesis by doing experiments. Failing to do falsify means the hypothesis is more likely to be true at some level. It could still be wrong or imperfect and theories are often refined or replaced by new theories. Science is about trying to find ways to prove things wrong and occasionally failing to do that in interesting ways. The more you fail at that, the better a theory becomes. If you succeed (by failing), you refine your theory and run more experiments.

Software testing is exactly the same. The more tests you have, the more evidence you build up that the theory that the software is incorrect in some way may be wrong. If you find out about a new way it could still be wrong, you write a test to prove that wrong. And then if you are right about that, you fix the software.

Uncle Bob is insufferable and thinks that anything less than 100% testing and your code doesn't work.

> The more tests you have, the more evidence you build up that the theory that the software is incorrect in some way may be wrong.

Or you could actually use the software. You can have 100% unit tests passing and a completely broken integration.

He's not exactly wrong in that analogy but I think that in practice such utopia does not exist.

I'm not advocating TDD either. But he has a point that testing (manual or automated), or even just using the software is a form of experimentation and thus a perfectly valid and scientific way of assessing whether software works or not.

One thing I would think helps software is Linus's law: "Given enough eyeballs, all bugs are shallow".


As far as I know, this is not something available to other forms of engineering.

"Given enough projects, all eyeballs are elsewhere."

Yes. Outside of the top 20 or so open source projects, few people are watching. Often only one.

I don't think that's really true. I have open source projects that nobody else actually uses, but I still get messages from people reading and looking at the code. There's often only one person maintaining open source software, but I'd bet there's a lot of eyes on a lot of projects.

But most of them have no critical impact on essential infrastructures. Eyeballs are always their for big projects.

OpenSSL is a good example of something that was essential everywhere that had few eyeballs on it for a number of years.

Thanks for that.

Unfortunately, we've found out in the subsequent 20+years since that was said, that most projects, even major ones, don't have enough eyeballs...

There was a point a few years ago that GTK+ dev was complaining that only he was left to work on the project. A single person for a whole GUI lib..

And GTK+ is the basis of so many FOSS it's not even funny...


The main advantage of open source for me isn't that all bugs are found. It's that when I encounter a bug I can check the code, report it, fix it in my fork and get it fixed in the main repo eventually.

With closed source you have to make do with workarounds.

I don't think that's true, or at the very least it's misleading. Qualified eyeballs are often expensive, and moreover there are consistent intellectual biases of human beings that interfere with catching all bugs.

Though the 'law' is associated with reading source code, I think it can also apply to people using the code, and reporting errors (sometimes automatically).

They may not be able to fix it, but just noticing there is a problem is a big first step.

Sure, I'm just saying in reality it's not as ideal as it would seem. You're not going to get thousands of new qualified people reading every codebase every year, things will inevitably slip through the cracks, and at a guess open-source lines of code probably have fewer people reading them net than all currently maintained proprietary software, just because there is so much vastly more open source software these days.

This would be more in the context of FOSS, so the economics are certainly different. I am not sure qualified eyeballs are expensive in this context.

Look at OpenSSL and Heartbleed. Nobody found it and it ended up being a catastrophe. It has since seen major industry investment. The problems have been mostly alleviated.

Anyone worth their salt that spent longer than 10 minutes lokoing at openssl's source code knew that it was riddled with problems.

Regardless of that none of the many billion dollar companies that relied on it invested anything into improving it.

Definitely true for my domain, mechanical engineering. Enough reviews and bugs will be found and corrected.

In most other fields, there are far fewer eyeballs, because far less work is opensource.

If most/all CAD designs were opensource, you can imagine there would be projects underway to automatically "find all screws in the world that are too short to hold the design load". With enough projects of that nature, standards across the entire field will improve.

Reminds me of the old jokes that start with "if cars were like computers".


Self-driving cars and IoT turned that joke into prophecy.

Cars must necessarily follow the laws of physics. Software does not have such limitations, which creates hundreds times more ways to shoot yourself in a foot.

"No one reads anything. Those who do read, they do not understand. Those who do understand, they immediately forget."

"Linus's law" is refered to as "Linus's fallacy" now.

And the ones that don't forget eventually get discouraged by having to go through people like him.

Formal proofs are useless because most systems in the real world are far too complex to be formally verified. The formal proofs for most systems would be so long and complex that they would be almost guaranteed to suffer from human error; the proofs are much more likely to have errors than the underlying code itself.

It is a tool invented by bureaucrats for the sole purpose of job creation.

If society keeps becoming more bureaucratic, soon enough, someone will invent the concept of unit testing for formal proofs... There is no limit to how ridiculous this can get. The Fed's mandate is job creation, and job creation is just what it'll do! Any job! Mostly useless jobs!

Most real world systems would be practically impossible to even write a formal specification for, much less actually prove that the system follows it correctly.

I don't know for others but my software became reliable because my users work a few meters from me and have no problem coming to my desk and question my dev skills after finding a bug :)

I think the reason is natural selection.

Since the beginning of computer software (and computer hardware) the solutions that were not reliable enough for practical purposes were discarded.

I've seen a paper provide a natural selection model of bugs: If a bug occurs frequently enough, someone will bother to fix it.

Hence, popular software has less bugs and becomes more popular. Using popular software in a different way will trigger unfound bugs.

Security bugs were special, as nobody triggers them by default. There is no selection pressure to make them disappear.

I think of bugs as signal.

Bugs are energy of noise on continuous spectrum where frequency describes types of bugs.

When you write software you generate initial signal which will depend on what you are doing and your ability (awareness of different types of bugs, etc.)

Then during further stages of testing these bugs tend to be filtered out. Some types of bugs are attenuated very efficiently. For example a bug that would cause your application to not compile would have very little chance of being shipped...

Some bugs are only detected (as signal) only in some circumstances. For example, users use a detector that has different characteristics from the detector used by developer.

You don't generally want to invest in attenuating all frequencies. What you do is you ignore frequencies that don't matter (that just pass and cause no harm) and focus on harmful frequencies.

In the end, the best way to reduce the output noise of the system is usually to reduce the input noise (ie. signal produced by developer).

In my opinion its just not practical for most programmers to understand a book like "The science of programming" and consciously apply proofs each time they write code. Most programs can be delivered with bugs, These can be discovered and corrected over time. So having strong QA teams as an alternative to proofs should be the reason of having and abundance of reliable software.

Which is part of the discussion in the paper.

A relevant paper to discuss alongside this one: https://www.cs.umd.edu/~gasarch/BLOGPAPERS/social.pdf

Social Processes and Proofs of Theorems and Programs

Even rigorous mathematics requires social processes to become 'reliable', ie truthful.

I am usually programming in Haskell (the little programming I do besides research) and I try to do property based testing a lot. And of course, you have to come up with the properties to test for when generating test cases.

With a proof doubly so, as you need a specification that you can prove your implementation against. In my time in industry (YMMV) I think I've never seen a formal enough specification that would be useful in that sense.

I think we're in a specification crisis where behavior is unspecified and documentation diverges from that as well. APIs break regularily and services are down daily. It's a miracle the whole house of cards has not come down yet.

Natural Selection.

If you account for survivorship bias, I would bet that software is not very reliable.

Agree. Plus plenty of labor has gone into customer facing bugs that autos, hw, consumer electronics etc prob wouldn't stand for. And we build on hw which is formally verified.

> And we build on hw which is formally verified.

Ehm, what?

Intel (and presumably others) use formal methods to verify their hardware designs, or at least large portions of them. This started in earnest after the FDIV bug. Here's a presentation on the topic (dated now, but my understanding is the use of formal methods has only increased since then).


Software ... the part that software developers write ... is created to run on a strong foundation of CPU, GPU, NICs, RAM, PCI buses, SSD and so on (which the EE, ChemE, Chem, math, physics people make). That makes our stuff better than if it was software all the way down. HW is formally verified.

The proof is in the pudding, software program either works or it doesn't. Also it's never a one time thing like writing a math formula, software is constantly improved until it's not.

Uncountable man hours going into squashing bugs, Googling, trial and error, getting it to work, and not touching it.

user find bug, dev fix bug. happy path! other path: dragons be here

The tree or tributary structure of software suggests that deeper, more fundamental, more general bugs will be revealed first and frequently. And logically will be fixed at the root, rather than hacking at the myriad leaves... right...?

Are there any research projects trying to make "proof" programs / languages?

Dependently typed languages may fit what you're asking for. See Idris and the book Type-Driven Development with Idris for an example of encoding critical information into the type system.

Beyond that, see SPARK where you're limited to a provable subset of the Ada language (the subset has been extended with each version) so you can make assertions and it will attempt to prove (as part of compilation) that the assertions hold. As in, you could have a function that should always return a non-negative integer, but if it does something like (not fluent, so may be wrong but the principle is correct):

  function Foo(A : Integer; B :integer) return Integer with
    Post => Foo'Result > 0;

  function Foo(A : Integer; B : Integer) return Integer is
    return A - B;
  end Foo;
This should fail when the proof checker is run on this program.

Combine Spark with Z as a specification language and you can get to "correct by construction". The only problem seems to be a reluctance to actually formulate a spec.

Yep. I've tried to use Z Notation in the past but haven't had much luck (doesn't help that I had no collaborators so was learning on my own). I've often, also, worked on projects that had decent (but massive and all prose) specs. They would've benefited by formalizing at least portions of them.

I'm not sure I'd try to use Z Notation at this point, given that there are other systems out there that seem to fill the same niche but with better tooling. Event-B and TLA+ are the two I've explored the most.

Yes, this is an extremely active area of research with lots of connections in the broader programming languages and formal methods communities. For the last decade or so, POPL has had a co-located workshop on proof, the programs of which would be a great place to see the current state of the art. Here's last year's: https://popl20.sigplan.org/home/CPP-2020#event-overview

Margaret Hamilton, of Apollo fame, designed and still advocates something called the Universal Systems Language: https://en.wikipedia.org/wiki/Universal_Systems_Language


> The seL4® Microkernel. Security is no excuse for bad performance. The benchmark for performance. The world's most highly assured OS kernel. Open source & community-supported under the seL4 Foundation.

Dafny attempts to do something like this for imperative-style languages, using pre- and post-conditions:


I've noticed several articles about Ada and Rust here on HN recently. Summarizing one comment I saw: Ada has a subset that is provable and certified as such, and Rust is close but not provable nor certified.

Obviously because it just works.

This didn't age very well. Especially the part celebrating "over-engineering" including the explicit duplication of code, instead of re-use, since that is "risky".

I do however like the strong emphasis on (manual) testing.

That section actually criticizes duplication, it does not endorse it:

> So it seems safer to take an entirely fresh copy of the existing code, and modify that instead. Over a period of years there arise a whole family of such near-clones, extending over several generations. Each of them is a quick and efficient solution to an immediate problem; but over time they create additional problems of maintenance of the large volumes of code. For example, if a change is made in one version of the clone, it is quite difficult even to decide whether it should be propagated to the other versions, so it usually isn't. The expense arises when the same error or deficiency has to be detected and corrected again in the other versions.

Note his use of the phrase "it seems safer", he doesn't say it is safer and at the end explains what the actual risk/expense is with the copy/paste approach.

And at the end he discusses the issues with over-engineering in summary:

> The limitation of over-engineering as a safety technique is that the extra weight and volume may begin to contribute to the very problem that it was intended to solve. No-one knows how much of the volume of code of a large system is due to over-engineering, or how much this costs in terms of reliability. In general safety engineering, it is not unknown for catastrophes to be caused by the very measures that are introduced to avoid them.

Code reuse can result in failures. Sometimes those failures are big and loud. The Ariane 5 failure also happened in 1996, which is quite funny. :)



In my view, type systems have replaced formal methods as the main way of enforcing guarantees around program behavior. Type systems are conceptually simpler, and although they aren't nearly as powerful as formal verification, they're good enough for many common purposes.

Unit testing.

tl;dr: The testing of code is done by intelligent adversaries.

That's a poor TL;DR for this paper. It contains a lot more than that.

because humans are halting oracles

lol, tell that functional programming proponents.

They will tell you that current software is basically crap and they can't understand how the majority tolerates this state of affairs.

The title seems to suggest that software or computer programs are provable. They are not.

Programs or algorithms only guarantee that they can go from one state to another state.

Somehow computer programs have been conflated with relability and proof possibly because of close adjacency to mathematics.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact