
How Did Software Get So Reliable Without Proof? (1996) [pdf] - tosh
https://www.gwern.net/docs/math/1996-hoare.pdf
======
hcarvalhoalves
It didn't - but most errors are simply tolerated because imperfect automation
still has absurd economy of scale, very few applications are on a regulated
field or have a well-defined quality standard to meet, and unreliable software
tends to at least fail consistently, so it's still a win to diagnose and fix
processes compared to humans making creative mistakes.

~~~
tester34
> because imperfect automation still has absurd economy of scale

That's what FULL AFK botting in games taught me, kinda random % values, so in
my experience it was

Automating 80% of stuff is easy, can be done in 1h

90% is starting getting tricky

95% requires you to spend days of testing and writting various scripts

100% is additionally limited by BOT API and would require you to write your
own stuff and talk to bot via e.g file system and try to handle things like
client/bot crashes, networking maybe, damn...

But yea, having to do something once a day for 10min instead of playing 10h is
great enough

Just to clarify: those are private servers where everybody's botting.

~~~
runawaybottle
Which games, out of curiosity? Sounds like a whole new way to enjoy games.

~~~
tester34
Tibia

Some random thoughts because I'm kinda busy

Game is great and insanely addictive(seriously, I can talk about this more if
youre interested), but community on private servers is insanely toxic and
uneducated (I don't really like saying this like that, but seriously that's
something you tend to do not experience in other games) due to... well, that's
interesting topic.

This game has relatively high entry level because it's like 23 years old game
which has A LOT of content, even more if you do not use spoilers for
quests/missions.

If you don't play on private servers, then you have to buy "premium account"
to be able to play the game to its full potential which costs like 10 euro for
a month

Dying in this game is very punishing, but it used to be worse like decade ago
because nowadays you can buy ingame_currency relatively cheap. Dying decade
ago without "protections" could waste your like 2-3 hours of exp and some
items, add to that fact that you can be killed on PvP servers by anyone "just
because" :P

This game is also interested from programming perspective -

Private servers are an initiative by programmers to recreate this game engine
and host their own server. Nowadays it's mature.

One of the most popular private server engines:
[https://github.com/otland/forgottenserver](https://github.com/otland/forgottenserver)

It's really impressive project.

When it comes to botting FULL afk scripts

You have to write code which goes to the exp place, don't get stuck somewhere,
exp there for a hours (using spells, drink health/mana potions, take loot from
corpses, run on the more or less complex spawns (map), go back to the city if
you don't have potions, sells loot, take cash to the bank, and repeat

Many things can go wrong during that cicle - like other played blocked your
"road" to the exp for 1min, or you character just died due to PK(players), lag
and maaaaaaaany other.

Your script has to consider a lot of those different conditions and trade offs
(sometimes you don't want to kill monsters that are between your city and your
spawn because it would take 10min, so you just want to run through them, but
you can get "trapped" (sorrounded) by them, so you have to consider this)

It's not easy to do this good enough that you can start bot on VPS and it will
running for a day, week or something similar.

------
Jtsummers
From the conclusion:

> Formal methods researchers who are really keen on rigorous checking and
> proof should identify and concentrate on the most critical areas of a large
> software system, for example, synchronisation and mutual exclusion
> protocols, dynamic resource allocation, and reconfiguration strategies for
> recovery from partial system failure. It is known that these are areas where
> obscure time-dependent errors, deadlocks and livelocks (thrashing) can lurk
> untestable for many years, and then trigger a failure costing many millions.
> It is possible that proof methods and model checking are now sufficiently
> advanced that a good formal methodologist could occasionally detect such
> obscure latent errors before they occur in practice. Publication of such an
> achievement would be a major milestone in the acceptance of formal methods
> in solving the most critical problems of software reliability.

I feel like this presages the way TLA+ (in particular) has had a relative
surge among practitioners.

------
disown
Because reliable and proof are not the same concepts and one is not needed for
the other. Proof is logical certainty for all time ( in the past, present and
future ). Reliability is not.

Reliability doesn't require mathematical proof - this goes for airplanes to
your kitchen faucet to software. Software became more and more reliable
through testing, errors, fixes/patches, etc. Also, many times good enough is
good enough. We don't need the perfect for most situations.

Also, it's a subjective call too. For some software is an unreliable mess and
we need to throw everything out and rebuild it from scratch. Search for
haskell, rust, etc threads. The neverending arguments against side-effects,
mutation, pointers, etc. For most, they are good enough and get the job done.

~~~
chas
The control systems for airplanes often have a significant number of
mathematical proofs.

~~~
thereisnospork
The bolts don't, and yet they don't (usually) fail.

~~~
feteru
What do you mean? They're designed and tested according to calculations done
by mathematically proven methods.

~~~
compiler-guy
But that isn't formal verification via mathematical proof. And this is the
crux of the flaw with the paper. There are lots of ways to reliability.
Mathematical proof is just one.

------
thefifthsetpin
Proof is how the authors of this paper discovered that OpenJDK's
java.utils.Collection.sort() was broken

[http://envisage-project.eu/wp-
content/uploads/2015/02/sortin...](http://envisage-project.eu/wp-
content/uploads/2015/02/sorting.pdf)

~~~
dehrmann
I'm not sure if that's a victory or a loss for formal proofs. Sure, it found
the bug, but for a language that popular, it wasn't a very meaningful bug, and
sorting algorithms is one of the easier things to analyze.

~~~
ookdatnog
For algorithms without provided proof, we can only estimate their correctness
based on how battle-tested they are. If an algorithm has been run many times
and it has been correct every one of those times, then we feel somewhat
confident that we got it right. Over time, we'll find weird edge cases where
it doesn't work, and fix them.

Few algorithms are as battle-tested as the sorting algorithm from the Java
standard library. I can't even estimate how many calls are made to this
function every day. The fact that we can still find bugs in it, no matter how
obscure, suggests to me that we simply can't write correct nontrivial software
without proof.

The victory isn't in showing that this specific algorithm is incorrect in some
fairly obscure way. It's in the fact that even software that's this battle-
tested is still not fully reliable.

~~~
dehrmann
> even software that's this battle-tested is still not fully reliable

But does it matter?

I also don't understand formal methods well, but wouldn't they also be subject
to bugs? It seems like all we can do is make software less unreliable.

~~~
ookdatnog
> But does it matter?

That depends on the application. It probably doesn't most of the time. My
point is not that it is always economical to use formal methods, just that I
suspect their cost is overestimated compared to the benefit (because most
programmers are simply unfamiliar with FM).

> I also don't understand formal methods well, but wouldn't they also be
> subject to bugs?

Yes, they are. There are multiple classes of errors FM are vulnerable to.

1\. In FM you prove your code correct with respect to a specification. But you
can make errors in the specification too. The arguments for FM are: a) the
spec is much smaller than the program, so the opportunity for error is lower,
and spec errors are more obvious; b) often if your spec is wrong, you'll find
out when you try to prove your algorithm; and c) combining testing with FM is
very likely to uncover whatever errors are left.

2\. If your proof of correctness is not mechanically verified, then you may
still have trivial implementation errors like mixing up variables and the
like.

3\. If it is mechanically verified, then theoretically there's still
opportunity for errors. The verifier might contain a bug, the compiler might
contain a bug, the hardware you run the algorithm on might contain a bug,
cosmic radiation may flip a bit during the verification/compilation process,
invalidating the correctness, etc. I can soapbox about the implications of
this but I will only do that when prompted to :)

So indeed, there is no method that gives you zero probability of failure. But
you can get the probability orders of magnitude lower, which may be worth it
depending on what you're working on.

------
mjw1007
The first sentence of the article begins "Twenty years ago", and yet the
article itself is undated.

This appears to be entirely normal for scientific papers.

Can anyone tell me how this foolishness came to be considered acceptable among
scientists, and why it persists?

~~~
londons_explore
Because traditionally, articles were published in journals, and journals had
the date written on the cover, so it was superfluous to add it on every page.

~~~
mjw1007
That seems rather sloppy to start with: you could easily end up a year or two
out (the date that matters isn't the date of publication, it's the date that
was in the author's head when they wrote something like "Twenty years ago").

In any case nowadays I think there's no excuse for this practice continuing.
It's far too common for a paper to get sent around, or put on the author's
personal website, with no associated date at all.

~~~
thaumasiotes
> you could easily end up a year or two out (the date that matters isn't the
> date of publication, it's the date that was in the author's head when they
> wrote something like "Twenty years ago")

Doesn't matter. If the author was thinking of a specific year, they'd say the
year. If they say "20 years ago", everything that was "20 years ago" this year
is also "20 years ago" a couple years from now.

~~~
mjw1007
It's common for papers to include words like "recently", or indeed "has not
yet", which are rather more time-sensitive and still don't specify an exact
year.

------
AnimalMuppet
First: Software got so _abundant_ without proof. How much of an effort is a
formal proof of a program? I suspect that it's at least as much effort as the
total effort of writing the program, and maybe several times that. If I'm
right, then formally proving all programs would result in more than twice as
much effort, which would mean less than half as much software. So, think of
all the programs that you interact with in a day. Now think of a world where
half of those programs simply do not exist. Is that _better_ than this world
of buggy programs? Or _worse_?

Second: Programs got so reliable by killing the bugs that made them most
unreliable. That is, if a program has two bugs, one of which 10% of people
hit, and one of which 0.0001% of people hit, the 10% bug gets fixed. The
program is still buggy, but it's buggy in a way that most people never
encounter.

And third: For (many kinds of) formal proof, you need a formal specification.
Much of software is somewhat discovered. (What's the right user interaction
flow for this, anyway? Let's try some ideas out and see.) If you write the
formal specification before you have really discovered what the software
should be, then you can have formally-proven software that does less-than-
optimal things.

And then you can have bugs in the specification. (Hello, MCAS!) Formally
proving that you correctly implemented an incorrect spec does not make the
software more correct.

~~~
Jtsummers
> First: Software got so abundant without proof. How much of an effort is a
> formal proof of a program? I suspect that it's at least as much effort as
> the total effort of writing the program, and maybe several times that. If
> I'm right, then formally proving all programs would result in more than
> twice as much effort, which would mean less than half as much software. So,
> think of all the programs that you interact with in a day. Now think of a
> world where half of those programs simply do not exist. Is that better than
> this world of buggy programs? Or worse?

I don't think it'd require twice the effort _unless_ you insisted on a full
and complete formal proof of the entire program. Which most contemporary
advocates of formal methods _do not want_ (and this paper basically concurs
with). Rather, see TLA+ where the advocation is for people to model critical
parts, or the concurrency model, or the synchronization model of their
distributed system. You can elide so many details that it's nowhere near the
same level of work as actually writing the full program, but can inform your
design so that you spend less time reworking later.

OTOH, I've seen lots of programs where the testing comprises at least as much
effort, if not some small integer multiple, as the actual coding portion. But
no one is advocating for removing tests (though some people strongly despise,
if not outright hate, certain test approaches).

~~~
AnimalMuppet
Fair.

OTOH, the less you prove correct, the more bugs you leave, and you get people
asking "How did software get so reliable with so little proof?"

On the third hand (I think that's the number I'm up to now), types are a kind
of proof, and I really do want those. So I think there's some kind of curve,
with no proof giving you lots of bugs (which are expensive to fix, and bad for
your product's PR), and full proof taking forever and costing a ton, and
somewhere in between being the sweet spot.

And where is that sweet spot? I would say at least type systems, but less than
full formal proof. Somewhere in between, which leaves a lot of room and
therefore is not much help. Worse, where exactly the sweet spot is probably
depends on several things, which means we can't give any simple answer.

------
jillesvangurp
Uncle Bob makes some interesting points about software testing and scientific
methodology. His main point is that science is mostly not about proving things
correct but about attempting to find ways to falsify an hypothesis by doing
experiments. Failing to do falsify means the hypothesis is more likely to be
true at some level. It could still be wrong or imperfect and theories are
often refined or replaced by new theories. Science is about trying to find
ways to prove things wrong and occasionally failing to do that in interesting
ways. The more you fail at that, the better a theory becomes. If you succeed
(by failing), you refine your theory and run more experiments.

Software testing is exactly the same. The more tests you have, the more
evidence you build up that the theory that the software is incorrect in some
way may be wrong. If you find out about a new way it could still be wrong, you
write a test to prove that wrong. And then if you are right about that, you
fix the software.

~~~
raverbashing
Uncle Bob is insufferable and thinks that anything less than 100% testing and
your code doesn't work.

> The more tests you have, the more evidence you build up that the theory that
> the software is incorrect in some way may be wrong.

Or you could actually use the software. You can have 100% unit tests passing
and a completely broken integration.

He's not exactly wrong in that analogy but I think that in practice such
utopia does not exist.

~~~
jillesvangurp
I'm not advocating TDD either. But he has a point that testing (manual or
automated), or even just using the software is a form of experimentation and
thus a perfectly valid and scientific way of assessing whether software works
or not.

------
ErikAugust
One thing I would think helps software is Linus's law: "Given enough eyeballs,
all bugs are shallow".

[https://en.wikipedia.org/wiki/Linus%27s_law](https://en.wikipedia.org/wiki/Linus%27s_law)

As far as I know, this is not something available to other forms of
engineering.

~~~
TheOtherHobbes
"Given enough projects, all eyeballs are elsewhere."

~~~
Animats
Yes. Outside of the top 20 or so open source projects, few people are
watching. Often only one.

~~~
_AzMoo
I don't think that's really true. I have open source projects that nobody else
actually uses, but I still get messages from people reading and looking at the
code. There's often only one person maintaining open source software, but I'd
bet there's a lot of eyes on a lot of projects.

------
jondubois
Formal proofs are useless because most systems in the real world are far too
complex to be formally verified. The formal proofs for most systems would be
so long and complex that they would be almost guaranteed to suffer from human
error; the proofs are much more likely to have errors than the underlying code
itself.

It is a tool invented by bureaucrats for the sole purpose of job creation.

If society keeps becoming more bureaucratic, soon enough, someone will invent
the concept of unit testing for formal proofs... There is no limit to how
ridiculous this can get. The Fed's mandate is job creation, and job creation
is just what it'll do! Any job! Mostly useless jobs!

~~~
TheCoelacanth
Most real world systems would be practically impossible to even write a formal
specification for, much less actually prove that the system follows it
correctly.

------
jmnicolas
I don't know for others but my software became reliable because my users work
a few meters from me and have no problem coming to my desk and question my dev
skills after finding a bug :)

------
lmilcin
I think the reason is natural selection.

Since the beginning of computer software (and computer hardware) the solutions
that were not reliable enough for practical purposes were discarded.

~~~
hyperman1
I've seen a paper provide a natural selection model of bugs: If a bug occurs
frequently enough, someone will bother to fix it.

Hence, popular software has less bugs and becomes more popular. Using popular
software in a different way will trigger unfound bugs.

Security bugs were special, as nobody triggers them by default. There is no
selection pressure to make them disappear.

~~~
lmilcin
I think of bugs as signal.

Bugs are energy of noise on continuous spectrum where frequency describes
types of bugs.

When you write software you generate initial signal which will depend on what
you are doing and your ability (awareness of different types of bugs, etc.)

Then during further stages of testing these bugs tend to be filtered out. Some
types of bugs are attenuated very efficiently. For example a bug that would
cause your application to not compile would have very little chance of being
shipped...

Some bugs are only detected (as signal) only in some circumstances. For
example, users use a detector that has different characteristics from the
detector used by developer.

You don't generally want to invest in attenuating all frequencies. What you do
is you ignore frequencies that don't matter (that just pass and cause no harm)
and focus on harmful frequencies.

In the end, the best way to reduce the output noise of the system is usually
to reduce the input noise (ie. signal produced by developer).

------
ransith
In my opinion its just not practical for most programmers to understand a book
like "The science of programming" and consciously apply proofs each time they
write code. Most programs can be delivered with bugs, These can be discovered
and corrected over time. So having strong QA teams as an alternative to proofs
should be the reason of having and abundance of reliable software.

~~~
Jtsummers
Which is part of the discussion in the paper.

------
atdixon
A relevant paper to discuss alongside this one:
[https://www.cs.umd.edu/~gasarch/BLOGPAPERS/social.pdf](https://www.cs.umd.edu/~gasarch/BLOGPAPERS/social.pdf)

Social Processes and Proofs of Theorems and Programs

Even rigorous mathematics requires social processes to become 'reliable', ie
truthful.

------
maweki
I am usually programming in Haskell (the little programming I do besides
research) and I try to do property based testing a lot. And of course, you
have to come up with the properties to test for when generating test cases.

With a proof doubly so, as you need a specification that you can prove your
implementation against. In my time in industry (YMMV) I think I've never seen
a formal enough specification that would be useful in that sense.

I think we're in a specification crisis where behavior is unspecified and
documentation diverges from that as well. APIs break regularily and services
are down daily. It's a miracle the whole house of cards has not come down yet.

------
MR4D
Natural Selection.

If you account for survivorship bias, I would bet that software is not very
reliable.

~~~
doonesbury
Agree. Plus plenty of labor has gone into customer facing bugs that autos, hw,
consumer electronics etc prob wouldn't stand for. And we build on hw which is
formally verified.

~~~
viluon
> And we build on hw which is formally verified.

Ehm, what?

~~~
Jtsummers
Intel (and presumably others) use formal methods to verify their hardware
designs, or at least large portions of them. This started in earnest after the
FDIV bug. Here's a presentation on the topic (dated now, but my understanding
is the use of formal methods has only increased since then).

[https://www.cl.cam.ac.uk/~jrh13/slides/nijmegen-21jun02/slid...](https://www.cl.cam.ac.uk/~jrh13/slides/nijmegen-21jun02/slides.pdf)

------
suyash
The proof is in the pudding, software program either works or it doesn't. Also
it's never a one time thing like writing a math formula, software is
constantly improved until it's not.

------
bamboozled
Uncountable man hours going into squashing bugs, Googling, trial and error,
getting it to work, and not touching it.

------
hyperpallium2
user find bug, dev fix bug. happy path! other path: dragons be here

The tree or tributary structure of software suggests that deeper, more
fundamental, more general bugs will be revealed first and frequently. And
logically will be fixed at the root, rather than hacking at the myriad
leaves... right...?

------
snarkypixel
Are there any research projects trying to make "proof" programs / languages?

~~~
Jtsummers
Dependently typed languages may fit what you're asking for. See Idris and the
book _Type-Driven Development with Idris_ for an example of encoding critical
information into the type system.

Beyond that, see SPARK where you're limited to a provable subset of the Ada
language (the subset has been extended with each version) so you can make
assertions and it will attempt to prove (as part of compilation) that the
assertions hold. As in, you could have a function that should always return a
non-negative integer, but if it does something like (not fluent, so may be
wrong but the principle is correct):

    
    
      function Foo(A : Integer; B :integer) return Integer with
        Post => Foo'Result > 0;
    
      function Foo(A : Integer; B : Integer) return Integer is
      begin
        return A - B;
      end Foo;
    

This should fail when the proof checker is run on this program.

~~~
stan_rogers
Combine Spark with Z as a specification language and you can get to "correct
by construction". The only problem seems to be a reluctance to actually
formulate a spec.

~~~
Jtsummers
Yep. I've tried to use Z Notation in the past but haven't had much luck
(doesn't help that I had no collaborators so was learning on my own). I've
often, also, worked on projects that had decent (but massive and all prose)
specs. They would've benefited by formalizing at least portions of them.

I'm not sure I'd try to use Z Notation at this point, given that there are
other systems out there that seem to fill the same niche but with better
tooling. Event-B and TLA+ are the two I've explored the most.

------
angel_j
Obviously because it just works.

------
tpmx
This didn't age very well. Especially the part celebrating "over-engineering"
including the explicit duplication of code, instead of re-use, since that is
"risky".

I do however like the strong emphasis on (manual) testing.

~~~
Jtsummers
That section actually criticizes duplication, it does not endorse it:

> So it seems safer to take an entirely fresh copy of the existing code, and
> modify that instead. Over a period of years there arise a whole family of
> such near-clones, extending over several generations. Each of them is a
> quick and efficient solution to an immediate problem; but over time they
> create additional problems of maintenance of the large volumes of code. For
> example, if a change is made in one version of the clone, it is quite
> difficult even to decide whether it should be propagated to the other
> versions, so it usually isn't. The expense arises when the same error or
> deficiency has to be detected and corrected again in the other versions.

Note his use of the phrase "it seems safer", he doesn't say it is safer and at
the end explains what the actual risk/expense is with the copy/paste approach.

And at the end he discusses the issues with over-engineering in summary:

> The limitation of over-engineering as a safety technique is that the extra
> weight and volume may begin to contribute to the very problem that it was
> intended to solve. No-one knows how much of the volume of code of a large
> system is due to over-engineering, or how much this costs in terms of
> reliability. In general safety engineering, it is not unknown for
> catastrophes to be caused by the very measures that are introduced to avoid
> them.

------
jasonhansel
In my view, type systems have replaced formal methods as the main way of
enforcing guarantees around program behavior. Type systems are conceptually
simpler, and although they aren't nearly as powerful as formal verification,
they're good enough for many common purposes.

------
tus88
Unit testing.

------
ouid
tl;dr: The testing of code is done by intelligent adversaries.

~~~
Jtsummers
That's a poor TL;DR for this paper. It contains a lot more than that.

------
yters
because humans are halting oracles

------
k__
lol, tell that functional programming proponents.

They will tell you that current software is basically crap and they can't
understand how the majority tolerates this state of affairs.

------
senthil_rajasek
The title seems to suggest that software or computer programs are provable.
They are not.

Programs or algorithms only guarantee that they can go from one state to
another state.

Somehow computer programs have been conflated with relability and proof
possibly because of close adjacency to mathematics.

