Hacker News new | past | comments | ask | show | jobs | submit login
How did software get so reliable without proof? (1996) [pdf] (gwern.net)
83 points by elcapitan 6 months ago | hide | past | web | favorite | 33 comments

Thanks for this! Hoare (partially) answers a question I asked in the comments on "The Little Typer" (a good treatment of programming with dependent types).

That approach to dependent types brings our code closer to being verifiable. Unfortunately as a number of commenters testified it also makes software (currently) impossible to scale to complex algorithms, dependence on loosely coupled libraries, and so forth.

Since Hoare wrote 22 years ago, I'm now wondering if there's a fundamentally intractable problem with improving formal validation of complex software. Dependent types seem like a natural way of embedding the programmer's assumptions and reasoning into software, but clearly current technology in that area is still very very far short of helping with software at current scale.

Meanwhile our practical means of building complex software have improved considerably and we can develop systems that are reliable "enough", even at enormously greater scale than Hoare anticipated. The gap between viable software complexity and formal methods of describing software semantics has grown much wider since Hoare, rather than shrinking.

I wonder if we could step back from this style of formal methods, and look for an alternative approach. We need ways to capture the actual reasoning and methods that work for programmers building complex systems -- rather than trying to pretend they are mathematical objects.

For me, notation is everything. A computer program is a model of a problem. There is more than one representation of that problem: in the static code and in the running code. If you go to a kind of naive way of writing programs (that we all went through when we were first starting out), you write the code and then you step through it (either on paper, or with a debugger), viewing how the code is running.

There are many techniques available for helping you reason about your code. However, it's important to understand that software code is a notation for modelling the problem. Often formal methods introduce another notation that proves that the encoding of the problem in the first notation is "correct" according to some definition of "correct". The more we move into modelling the correctness in the second notation, the more we move the burden of reasoning about the issue to the "code" in the second notation.

It's often very useful to model the problem more than once. For example, I'm quite a big fan of unit tests. "Give the code in the system, I expect X to be true/false in this situation". I'm just as likely to make mistakes in my unit tests as I am with the production code. However, it's the comparison of the two that allows you to reason effectively about the code. They are two sides of the same coin (intentionally coupled -- which, as an aside, is why you should avoid trying to decouple your tests from your production code!!!).

Even though it is useful to model twice, I'm not really sold on the idea of doing it in general (and I have a very specific idea of how to write unit tests that's heavily coloured by this point of view). Everything that you add to the code should make reasoning about something easier and more convenient than not doing it. Otherwise you are better off with just writing the production code. You will have less complexity to deal with overall.

In my opinion, a good example of a tool that helps incredibly well are strongly typed languages with type inference systems. For a small penalty of type notation, the system can tell me, without a doubt, when I've made a mistake with the types of my variables. Not only that, but the notation itself allows me to reason about why I've gotten myself into a problem. It can even help me reason about what the function does, without having to even look at the implementation. This is incredibly helpful.

I think as we go forward, we need to be looking for this kind of tool. Dependent types seem like a perfect example of this. However, there is always going to be a small price to pay. Writing code in a statically typed language is harder than writing code in a dynamically typed language -- precisely because the dynamically typed language allows you to write code without having to think up front about the consequences. Of course, you may leave bugs in your code because of it, but the initial cognitive load is smaller. Similarly, it's much better to write code where you can reason about the run time behaviour without it running, but that is much harder than writing some code and walking through it, playing "computer".

We need to balance these kinds of factors because a tool that is not used is a useless tool. I'd be perfectly happy to work in a pure functional, statically typed language because it is much easier to reason about my code. Many of my colleagues simply are not willing (or possibly even able) to pay the upfront cost of working in that kind of ecosystem. Hence, I'm not able to work in it either.

This was a great comment.

The fact that new notations/representations (be that tests, simulations, formal methods, etc.) must justify their additional cognitive overhead is a great point.

Aside from unit and integration tests, I think that substantial new notation/infrastructure is most justifiable when the second notation/setting allows you to model parts of the problem that can't be modeled in the original notation.

Embedded software is a good example. Although there are some tools that allow you to model the physical system alongside the control software, typically the code is written in C and so there's no way of expressing very important aspects of the problem in the implementation language. So a second notation can help. For example, simulators and HiL setups for testing or things like https://github.com/LS-Lab/KeYmaeraX-release for formal methods.

Security is another good example. Typically code does not include an explicit threat model or explicit security properties, and these aspects of the problem are often scattered across the code and difficult to reason about. So being able to capture those things directly in a second notation can add a lot. For example, automatic vulnerbaility scaners for testing or https://tamarin-prover.github.io/ for formal methods.

Distributed/fault-tolerant systems are yet another example where it's very hard to state the thing you want to state in the implementation language. Netflix's chaos monkey comes to mind.

Agree about type inference being a great tool along the lines we need. Notably it extracts information from the code and lifts it to the type level, needing only occasional guidance when it can't figure enough out by itself.

We don't seem to have the same level of help yet with dependent types. A detailed comment in a Reddit thread: <https://www.reddit.com/r/haskell/comments/3zc81v/tradeoffs_o... indicates that we still can't use them on large scale systems. This was some time ago but AFIK we still have all the problems listed.

I wonder if there's good work on inference of minimal dependent types for APIs etc. I am not very familiar with current work so would be very interested in getting references. It seems that deriving canonical minimal types would resolve a lot of the listed problems.

We've gotten to where we are using slow incremental improvements. Better tools, better compilers, vastly better linters, unit testing, CI, etc. Now and then we manage to take some academic research and turn it into practible stuff e.g. the current generation of linters like FindBugs, Coverity, etc.

And that is pretty much as it has always been, a slow drip of practical improvements fed by theoretical research.

was 1996 really that different than today? because reliable is the last word i would use to describe software today.

It surely is different. We at least managed to contain some of the issues in normal operation. Back in the day, crashing whole OS in BSOD/panic style was more common, drivers unstable, things written in not-so-safe languages more popular.

So ultimately it depends what do you mean by reliable - by itself, functionality might not be so much better, but yes, sometimes you don't mind as much as you would do back then.

The internet wasn't as big and software frequently shipped as a boxed product on a disk, cartridge, or CD-ROM. You had one shot to get it right without major bugs because fixing them required another run of physical media.

Even in the late 90s, source code revision control was a dark art: either you used extremely expensive proprietary systems or else you used CVS which was very low level and completely lacked integration with any bug tracking system. Merging in a big feature in CVS was a nerve-racking experience.

There was more Pascal and more assembly language.

Some believe that Macintosh suffered a lot of instability from the switch to C from Pascal.

I don't have any source code to point to however, only anecdote.

As a small data point I'd say that Leslie Lamport claims that Amazon has used TLA+ on a good number of big projects; many people think of formal proving as an activity that is separately layered on top of programming, as opposed to a proving-language. Also, when people use Rust, are they using proving techniques?

Not yet; we’re still working on the semantics of the language, and proofs for it; you need that before you can prove things about the code in the language. Or at least, the stuff we’re most interested in.

The EU is funding some of this work.

For people interested in looking deeper, this work is called RustBelt. Useful web pages are the project's homepage (https://plv.mpi-sws.org/rustbelt/) and the research paper describing the project (https://dl.acm.org/citation.cfm?doid=3177123.3158154&preflay...)

If you are interesting in a programming language suitable for system level programming, and using proving techniques, maybe ATS would be of interest.



1996 being the pre Windows 98 Era

Hmm. The paper argues that a predicted software crisis caused by unreliability didn't materialise, and software became tremendously more reliable, despite industry essentially ignoring decades of research into formal methods and proof systems.

With this I agree. I'd also contribute another couple of factors that came into play perhaps too late for a paper written in 1996: successfully bringing garbage collection and managed memory languages to widespread usage, starting with Java and JavaScript and the growth of the web that allowed much more software to be written in server side scripting languages like Perl and Python. And the deployment of crash reporters and online update tools, which massively tightened the feedback loop between crashes occurring in the wild and fixes being deployed.

Intuitively at least, I feel these two things made a huge difference to the reliability and scalability of software. It's typical for large Java or JavaScript programs to now incorporate so many libraries that dependency version management is a significant pain point that is bottlenecking the growth in software complexity. In the 1990s the closest you got to this was DLL hell and that was more of a Windows deployment issue - the software worked fine on the developer's box, it just didn't work when deployed to OS installs that had different bits on the. Now we face issues where dependency graphs can pull in two or even three different incompatible versions of the same library on the developer's machine. This is a testament to the new scales we are exploring in software complexity.

However, the paper then goes on to make some possibly very dubious arguments, I suppose because the author is a famous academic and maybe doesn't want to contemplate the obvious conclusion. He says:

The conclusion of the enquiry will be that in spite of appearances, modern software engineering practice owes a great deal to the theoretical concepts and ideals of early research in the subject; and that techniques of formalisation and proof have played an essential role in validating and progressing the research.

However, technology transfer is extremely slow in software, as it should be in any serious branch of engineering. Because of the backlog of research results not yet used, there is an immediate and continuing role for education, both of newcomers to the profession and of experienced practitioners. The final recommendation is that we must aim our future theoretical research on goals which are as far ahead of the current state of the art as the current state of industrial practice lags behind the research we did in the past. Twenty years perhaps?

This is a very problematic set of conclusions which speaks to more general concerns growing in my mind in recent times about the value of much academic research.

Because of course, it's now more than 20 years since C.A.R. Hoare wrote this, and yet formal methods are no more used now than they were in 1996. It's not that there's a backlog of research that isn't used. It's that academia has gone down this rabbit hole for decades and it has never been widely used. 40 years and hardly any success! Why would anything be significantly different given another 20?

The paper does an admirable job of trying to congratulate everyone, whilst putting a brave face on lack of adoption - saying that a 20 year lag time between research and adoption is perfectly reasonable and a sign of maturity in the software industry. With another 20 years of hindsight, we can perhaps see now that industrial practice isn't lagging behind academic research, when it comes to formal methods and proofs. It's written it off and ignored it for good. The closest industrial practice gets to this is we use slightly stronger type systems now, but nothing a programmer teleported from 1996 wouldn't recognise.

I've been doing software for at least 20 years now, and occasionally heard about formal methods (a class I narrowly dodged while getting my BS in Computer Engineering); it seems like formal methods can approach proving you've followed the specifications. However, I've rarely actually had specifications, much less specifications that were complete or even relatively static.

I think there's some value in formal methods for baseline things that could have formal specifications, things like schedulers, compilers, optimizers (maybe just enough to show that the optimizer didn't make anything worse), filesystems (although fairly complex), etc. But applications are way too wishy-washy most of the time to even have a spec, let alone something that would hold still long enough to verify.

Formal methods have seen adoption in hardware where the cost of bugs is much higher. The fact is it’s cheaper to patch buggy software than it is to apply formal methods. Quality has remained above a level that would impede progress, and basically it just isn’t that big of an issue.

On a smaller scale it’s been used in code checkers to find uses of uninitialized variables and “can be null” errors. Just in a limited form.

I used to believe that was true, but recently I read that even in the hardware space formal methods have limited application - like software, they're usually used only on very specific sub-parts like MESI protocols. But it's not the case that a modern CPU is extensively applying formal methods.

W.R.T. things like static type systems, I agree. But progress there has been fairly minimal in the widely adopted languages. Mostly, in fact, the type systems did not get stronger, but the focus was on making them easier to use through more type inference.

Full top down proofs are rare but it’s more common at the module level.

> With another 20 years of hindsight, we can perhaps see now that industrial practice isn't lagging behind academic research, when it comes to formal methods and proofs. It's written it off and ignored it for good.

I think that academics ignored the cost of proofs. They said "This is the right way. You should do it this way." But industry said "This is really expensive. If we try to use it for everything, you're only going to get 1% as much software, if that." And that wasn't realistic for industry. There were customer needs (and money to be made meeting them). For the real world, formal methods are too expensive to be worthwhile (in all but a very few situations).

Yes, in a sense. But the reason they're expensive is they don't work very well and prove only trivial things.

Try using any imperative language with integrated proof predicates like Dafny, Whiley, etc. You'll immediately notice that they struggle to understand even properties of the code that looks extremely obvious, for instance they require lots of loop annotations. The examples being proven correct all end up being so trivial that you can just as well prove them correct by simply reading and thinking about them.

In the functional space things are hardly better. Formal methods in FP has evolved into dependent types research. But dependent types are more or less dependent on functional languages - the flagship is Idris, which is based on Haskell. But Haskell and other pure FP languages have been around for decades. Despite being extensively promoted to students at universities around the world, these languages have not seen wide adoption, so it stands to reason dependent types can't either. And again, the sorts of things being proven tend to be fairly trivial.

The way industry is going is pretty clearly small improvements to existing type systems, in particular, implementing value types into managed languages so the costs of using strong types is lower, where otherwise raw/primitive types would be used. And there's a huge focus on tuning and improving generics, flow typing, type inference and other usability improvements.

In other words, industry practitioners don't believe lack of proving power is our problem. They believe we aren't using the power of the type (proof) systems we already have, because the usability is too low.

The popularity of JavaScript as a server side language would seem to suggest they are right.

Despite being extensively promoted to students at universities around the world

Are they? I attended a pretty good university in my EU country, and the only functional language teaching was a couple of weeks of OCaml. We had more time of Prolog than that, but 90% of the course was Java and C.

Isn't pure FP used very heavily in finance?

Not as far as I'm aware. A few firms do compete in the market for job candidates by adopting unusual languages that correlate well with passionate / mathematically inclined developers, but it's not really about the tech itself. Jane Street are famous for using O'Caml but that's about it? Most finance is Java, even the HFT stuff these days.

At least one aspect would be that stricter guarantees don’t really matter, except when they do. If ease of use, learning, compile time, efficiency, etc was equivalent to a non-strict language, then presumably industry would naturally make the leap (ignoring the inherit inertia of established norms and actual code). It would be dumb not too, if you could get fewer bugs for free.

But they’re obviously not equivalent, and most businesses only care about correctness to a certain limit (at which point its superceded by other factors; diminishing returns and all that), then formal certains will naturally only appear in common use until they reach that limit.

To truly claim futility of formal systems, you should be looking at the guys who prioritize it over basically anything else: if say even space companies couldn’t give a shit about advancements in formal systems, then its clearly divorced itself from practicality. If NASA prefers javascript over whatever formal system, then clearly the academics have gone down a dark road.

But at the same time, is it the academic’s job to build practical systems, to advance industry? Isnt that the role of the industrial researcher? As I understood it, academia’s ideal role is to simply advance human understanding in interesting ways: whether it has immediate practical consequence should not be in their domain. ie research ML decades before it has any real hope of being useable. It’s industry itself who takes the role of poring over the academic’s work, and figuring out if they can use it for anything. And if its cost-effective.

And if industry specifically wants academia to look into things of immediate interest, they shouldn’t be waiting on universities for it; they should be hiring an academic to specifically research it.

Resource allocation is a zero sum game. If taxes are being levied to pay for a vast intellectual class living in ivory towers, that's less funding available to pay for industrial research.

Academia love to claim that practical application should not be any concern of theirs. But why should this be so? After all, they eat practical food and live in practical houses. Someone has to pay for it all. And the idea, espoused in this essay, that eventually, on a long enough timescale, their research will all be adopted and it's just that industry is super slow to recognise the awesomeness of their research and productize it - well, this claim seems to be getting weaker all the time. We now have many areas of computer science that have spent decades going down very expensive rabbit holes with no serious adoption and none on the horizon.

Not just formal methods, but pure FP languages like Haskell remain rarely adopted despite huge hype and massive academic investment. There was a few days ago, a post asking what happened to the semantic web with many commenters positing that the semantic web became an academic money pit that absorbed huge quantities of grant money and delivered nothing whatsoever. I'm sure there are many other examples lurking if you scratch the surface.

If academic funding was cut significantly and returned to companies in the form of lower corporation taxes, perhaps connected to research incentives, would it be so bad? There'd probably be less research done overall but if lots of current research goes nowhere and is never adopted that'd be a net win.

What do you make of corporations subsidizing seemingly pure academic research, then? Take the aforementioned Haskell, for example - you speak of "massive academic investment", but Simon Peyton Jones is still a Microsoft [Research] employee, and Simon Marlow is a Facebook employee (and former MSR as well)! And yes, you don't see Microsoft use Haskell in its products - but the ongoing funding clearly indicates that there's substantial perceived value in that investment. Given the amount of time this has been going on, surely that's more than can be explained away by "hype"?

I think the corporate world is still learning how to do effective research. Too many companies think that doing research means replicating a tiny university within their own walls.

I'm not saying industry always gets it more right than academia, but I think even the orgs like Microsoft Research that are basically just clones of academia, are still on a slightly better path. For instance MSR has done a lot of cutting edge research into operating system design. Still, the money was in the end wasted - nothing of Midori ever surfaced in real products.

The gold standard for me is Google. They're one of the few firms that managed to completely fuse research and industrialisation into a totally cohesive whole. Whilst they technically speaking have a research department, the research they're known for is largely not done there and Google has managed in the past to routinely create new innovations from all over the company.

It's worth realising that a lot of industrial and corporate research isn't labelled as such. It's just "innovative product development".

Example: If we look at JetBrains, they developed a new programming language from scratch (Kotlin) that has several innovative and cutting edge features. Do we think of JetBrains as a sponsor of research or Kotlin as the product of research, probably not, yet it solves many problems that were previously unresolved, like how to make a new language that has seamless interop with Java, and how to improve the usability of features like generics and nullity checking. The work involved in creating it explored new frontiers that academia has ignored, simply because usability improvements and practical topics like interop with existing code is not considered 'smart' enough to be worth the title of research.

You see this pattern crop up repeatedly. Academia, left to its own devices, researches ever more exotic extensions to Haskell, knowing full well most programmers don't use Haskell. The corporate funded academic research done by JKU Linz and Oracle has focused on how to make existing languages like Ruby and Python super fast to execute, super debuggable, cheap to develop runtimes for, how to solve the language interop problem once and for all and so on. They've produced something real - Graal - which looks on track to actually let Ruby call into C++ call into Java call into JavaScript call back into Ruby again, in a seamless and fully optimised manner.

Now that is cutting edge but incredibly practical research that could have industry-shaking impact within the next 10 years. It isn't primarily funded by the NSF or EU though. It's funded by a corporation.

It's just lame to play off practical against theoretical research and academic vs corporate research. Each of them has its advantages and disadvantages.

> academia’s ideal role is to simply advance human understanding in interesting ways

Yes and no. This is an engineering science. In a pure science, the world is out there to explore and find out about, constraining what you do. An engineering science is very much about useful things we can build and how we can get better at doing that. Buildings that stay up and house people/things, planes that can fly, ships that float, that sort of thing.

Without a world to find out about and without being constrained by stuff to build, things get a bit arbitrary. At that point, it's really more pure art than anything else. Math is sort of the exception where we allow that kind of freedom. Sabine Hossenfelder's Lost in Math is a reminder of the dangers of applying that approach to other disciplines.

Anyway, I think it is actually wrong to pin academia's irrelevance to this fundamental divide. The problems in industry aren't too easy / boring to be of interest to academics. They are too hard to be solvable by anyone with the current insane level of publication requirements.

Michael Stonebraker talked about this in detail:


It's a great talk.

I think the devil is in the details of this one.

The shop coder may have never heard of a formal method and thinks it is something that must be "slow" or "too complicated", but they inevitably come crashing into tools and techniques that were informed by formal methods. In the 80's it was hard to get your hands on a C compiler. In the 90's, everyone still thought that OOP was going to change everything. By the 00's we finally saw widespread adoption of garbage collected languages. And more recently we've had better package management and efforts to rework lower level coding.

We've never left behind cargo-culting, given the growth in the number of programmers, but the strides are there, always hovering a few steps behind the code that is considered the "production code". The production code is seen as far too fast moving to be considered for "formal methods", yet it still ends up using "design patterns", echoing Hoare's notes about defensive coding techniques. We don't "prove" our C++ code is good, but we run a "static analysis" tool or a "fuzzer" on it.

Perfect, the enemy of good enough.

Sure, but if you look carefully most static analysis engines are developed by industry not academia because they're collections of heuristics that tend to work in practice rather than lots of clever-as-possible sounding theory.

But Hoare talks specifically about formal methods, which is widely understood to mean proving code meets a spec using techniques like constraint solvers. Dependent types may meet the definition too. And none of these techniques are used widely or have ever been used widely.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact