You will never be able to come up with all possible test cases to cover all possible scenarios. That's the idea. This applies to all practical programs. There is literally not a single piece of software that doesn't have a missed edge case in the world.
> You will never be able to come up with all possible test cases to cover all possible scenarios.
Technically it's possible and is called "exhaustive testing", you can do it for fairly small input states e.g. you can test trig functions on all 32 bits inputs, that's only 4 billion values.
Non-trivial programs have way more state than that, consider a vector of f32, just the full vector (2^64 items) contains 2^96 possible states.
Not sure what machine can possibly completely test a problem with a 32-bit input space in any "practical" or "reasonable" time. Once states are added in, it quickly explodes beyond a simple 32-bit input space. I have yet to see any literature showing that "Exhaustive" testing is even marginally feasible. I know in my own work, we don't even bother. We come up with the test cases we can and leave the rest to the Software gods.
It's the same order of magnitude as your CPU frequency, and trivially partitionable (and thus parallelisable), how fast exactly depends on the complexity of the function being tested, but assuming the entire thing happens in native code it's on the order of 1~10 minutes to go through all values, call both the FUT and the oracle and compare the results.
> Once states are added in, it quickly explodes beyond a simple 32-bit input space.
Then you're not testing a problem with a 32b input space anymore. My comment specifically points out that it's doable for small input states and gives 32 bits as an anchor at the upper end. If the program under test is fast enough and you're fine with hours-long tests you could get up to the low 40s and still be able to test every single value but since every bit doubles the number of states (and thus the time spent testing) it grows unreasonable extremely quickly.
> I know in my own work, we don't even bother.
Of course not, the average program is billions of orders of magnitude beyond 32 bits of input state. The current HN homepage is 42.5KB, that's ~350000 bits of input space. If you take an input file or some such you start stacking exponentials just so the number of bits in your input space doesn't get unwieldy.
Sure, you can’t in the general abstract case, but my exact point is that you can do this in practice, for programs which we actually write.
Perhaps you could show an example of a useful program we can’t do that for?
In my experience, it’s not due to either of those theoretical considerations that we don’t see it done — we just don’t see it done in practice for cost reasons.
When I was at grad school most of my research was about devising highly impractical (but kinda cute) algorithms for idealized machines with concurrent processes. These algorithms were usually about twenty lines long, but the correctness proof would span a dozen pages (and sometimes several hundred numbered equations) - at one point a reviewer got so disgusted that they called it "Proof by intimidation".
So, yeah, imagine doing that for any program with more than five hundred lines of code.
> Sure, you can’t in the general abstract case, but my exact point is that you can do this in practice, for programs which we actually write.
No, you can not do that for programs which we actually write. Even just a trivial program taking an array of single-precision floats can't be exhaustively tested, there's north of 2^100 input states (assuming a 64b platform and arrays can't be bigger than that).
If a program on a 32b Windows system takes a file as input, that's like 2^10^15 possible input states…
We can test every single case for very simple, trivial programs e.g. we can exhaustively test a function with 32 bits of input (a 32-bit integer or a single-precision float: https://randomascii.wordpress.com/2014/01/27/theres-only-fou...) in a minute or so if the function is not too complex. It might be feasible to add a few more bits, but assuming the same trivial function at 40 bits you've jumped to ~4 hours, at 43 bits you need more than a day.
At 64 bits you need 8000 years.
Alternatively: let's say we want to exhaustively test 32 bits division. That's 64 bits of input * 26 cycles per division (we're optimistic) and let's say 4GHz, 2^64 * 2^4.7 / 2^31.9 = 119647558364 seconds, or about 3800 years.
And that's just the division itself, mind, we still need to account for some incrementing, jumping, etc…
And that's just to exhaustively test the division of two single-precision floats.
A computer is a state machine, with a number of potential states represented by 2^n (where n is the number of bits of storage/memory). As soon as n approaches some non-trivial number (which necessarily includes queued processor instructions, in addition to data!), the number of possible states rapidly exceeds the number of atoms in the known universe. There are simply too many permutations to test 100% of all cases for any program more complicated than "$x=2+2; assert($x==4);".
And that's just one constraint that makes the mythical "100% test coverage" practically infeasible. Another is: what tests the tests?
Well yeah, in practice we don't have Turing Machines with infinite bands, but finite memory, so the number of states is bounded.
The number of possible inputs and states for even a moderately complex program is also so astronomically high that it quickly far surpasses the number of atoms that earth consists of, so, yeah, "cost reasons" is one way to put it. We also don't transform sheet rock into gold for "energy reasons".
Do there exist inputs -- either string or repetitions -- for which the following function does not panic owing to allocation issues (as documented) but either 1. cause a SIGBART or similar to be thrown or 2. fail to produce a new string which is the correct multiple size of the original string slice?
It's not possible, I contend, to answer this question with tests. The cardinality of input strings and repetitions is bounded but very, very high. You can for sure find _examples_ where `str::repeat` functions as documented but demonstrating that it always will is a different thing.
GP is asking for cases where testing is demonstrably insufficient, not cases where you personally aren't convinced. If `str::repeat` is actually buggy for some inputs, in spite of testing, that would suffice as a counterexample.
Ah, sure. What I'm trying to get at is discovering those inputs is the real trick. Comments elsewhere in this thread about exhaustive testing do a better job of getting at the same idea.
The previous poster is correct. Software cannot be proven to be bug-free. This was considered axiomatic when I worked at IBM as a mainframe programmer on software that ran ATM machines, Social Security checks, etc.
You should definitely read up on the halting problem.
> The previous poster is correct. Software cannot be proven to be bug-free.
Yes, it can, that's what the whole formally verified software branch is about. Of course, even with formally verified software, there can be failures caused by external factors such as faults in hardware etc. But some piece of software itself absolutely can be proved to be correct (ie. bug-free). It just takes special approaches such as programming languages amenable to formal proofs (eg. Idris).
Software is usually written in bug-inducing ways and using languages that hinder formal verification not because provably bug-free software is impossible but because it is typically too costly and difficult to write.
The thing that's always bothered me about this argument is the translation between proof and code+underlying system. Bugs can still be introduced there. Certainly significantly fewer bugs in the final result, but I don't think I'd be confident in calling it proven 100% bug-free.
In languages with termination analysis and dependent types, the source code _is_ the proof and it's verified by the compiler. Basically when such a program compiles, it's proved to be correct according to the specification.
Of course this still leaves space for bugs outside the program itself which still may influence it, such as bugs in the operating system or firmware or the like. Which may be prevented by having a formally verified OS too :)
But even in such a case, there might still be hardware errors (electrical noise, for example). Which is why for example spacecrafts or critical industry machinery double or triple their whole architectures. In such case the formally verified software runs on eg. three instances simulatenously and these instances cross-check each other's results all the time. When a hardware fault occurs (a bit randomly flipped in memory for example), they find out they're not matching and re-calculate.
Software can absolutely be proved bug free. By being proved. Software can not be proved bug-free by tests (and even that assertion is not completely true, you can prove software through exhaustive testing if it's very very simple).
> You can't come up with enough tests to prove that software is bug-free.
Sure you can: you can prove software is correct through exhaustive testing. As noted previously it only works for small input space and simple programs (functions, really) but it does work, you can prove that a boolean xor is correct by enumerating its 4 inputs and checking that all of them produce the expected output.
I did read it. I stick by my assertion. In practice, software cannot be proven to be bug free. This assertion has been borne out by numerous companies with nearly unlimited budgets who continue to find severe bugs in their software, despite extensive testing. While it's interesting academically that a single function could be tested and shown to be bug-free, it's not relevant in practice. It's the interaction of multiple functions and subsystems that leads to many bugs -- and inputs in edge cases lead to even more than that.
> I stick by my assertion. In practice, software cannot be proven to be bug free.
You can't "stick by your assertion" then provide something which has almost no relation to your original assertion, that's called "moving the goalposts".
This moving of goalposts is even more asinine when you're moving them exactly where the comment which you originally decried had put them.
Reminder: here is what you originally felt you needed to note was wrong:
> Software can not be proved bug-free by tests (and even that assertion is not completely true, you can prove software through exhaustive testing if it's very very simple).
> Sure, you can’t in the general abstract case, but my exact point is that you can do this in practice, for programs which we actually write.
No, it can't be done with testing even for practical everyday programs, precisely because you never know how much your program is crossing into the "abstract general case" and which parts of it are more general than you think. Very often the cause of bugs is precisely this - that some part of a program behaves in a more general / less restricted way than the programmer thought.
It can be done with formal verification, but that's a whole another story.
SQLite had a vulnerability recently[0], despite being considered one of the most thoroughly tested projects out there. The reason why not is combinatorial explosion[1]: because complex software has so many possible inputs and states, it is not feasible for tests to cover every possible edge case.
The halting problem generalizes to, roughly spoken, the fact that you cannot generally know what a program does (you might be able to tell for a specific program, but I can always give you a program where that fails).
What this means in this context, for practical programs, ones you are "likely to write as an SDE", is that it is in general impossible to know in advance whether your program has bugs or not.
So, the notion that a program can be proven bug free "if you write enough tests" is kinda funny for programs of even moderate complexity, ones that you are likely to write as an SDE.
Aside: This "how does it matter in the real world" attitude reeks of anti-intellectualism and annoys me to no end. It does, you just don't know how.
The problem with your reasoning is much like there being uncountably many real numbers: it turns out in practice, I only deal with a countable subset of computable numbers, and get by just fine doing so, and hence many results about “real numbers” don’t apply to my life.
The mere existence of programs you can’t determine the halting status of doesn’t necessarily imply anything about the much, much smaller subset of programs we’d want to write for practical reasons — eg, managing my bank account.
It’s also very strange to quote things like “if you write enough tests”, when in fact I didn’t say anything like that.
Rather than address how a theoretical result connects to the real world, you assume without any thought that the kind of problematic cases which exist in theory actually relate to what we do in practice.
Show me the actual connection, or admit the halting problem isn’t really a concern in practice: where in the course of my life as an SDE does it actually cause problems, because in my experience, I work in the subset of programs you can reason about.
Many of the posters here make the same fundamental mistake: the existence of programs we can’t reason about doesn’t mean that there are no programs we can reason about — and in practice, we encounter the ones we can’t extremely rarely.
> Show me the actual connection, or admit the halting problem isn’t really a concern in practice: where in the course of my life as an SDE does it actually cause problems,
But were it not for the halting problem, we could automatically prove programs to be correct! So it affects you all the time: Because of the pesky halting problem, making sure that your program does what it should is really really hard instead of being plain automatic.
I think the point you don't get is that the halting problem is not just "you cannot prove that a problem halts", it extends to "you cannot prove that a program does what it should do, period". And this is the reason why we need exhaustive tests on one side and elaborate verification methods on the other side, both still never giving 100% confidence.
> because in my experience, I work in the subset of programs you can reason about.
You can reason about them, but you cannot guarantee that they are bug-free.
You ask me for a program "where the halting problem matters", and I say "pretty much any program you can find". On the contrary even, I ask you to provide me with a non-trivial program (i.e. one which is not just an academic exercise) that has been proven bug-free, through testing or other methods.
> were it not for the halting problem, we could automatically prove programs to be correct!
But we actually can do this for a huge subset of programs — as long we we’re okay with false negatives, ie programs which are correct but that we can’t prove using our system.
Precisely what I’m trying to call out is your mistake here: the existence of programs (in theory) which we can’t analyze (eg because of the halting problem) doesn’t imply anything at all about the programs we’re likely to encounter in practice. Those programs live in the subset of programs for which automatic reasoning does work.
The correct interpretation of “it’s not possible to prove all programs” is “there exists some programs we can’t prove correctness of”, not “there are no programs we can prove the correctness of”.
Show me that the programs we find useful overlap with the ones we can’t automatically prove — because in my experience, things like the halting problem are an abstract concern, and the programs we want to verify live solidly in the subset of programs we can automatically reason about.
For any program we’re likely to encounter in practice, we’re in the subset of programs we can automatically prove correct, if we spent the effort to do so. (And I have done so before.)
I mean I don't even know what you are asking for when you say "show me a program where the halting problem matters"... Can I not just pick any program you (or anyone else) have not proven to be bug free?
If you want a direct application of the halting problem itself, as in "does it terminate"... I don't know if my web browser finishes rendering all websites (even without JavaScript), so I guess I pick that?
Could you give me an example of a practical program (none of that "hello world" or "multiply two numbers"-grade stuff) that has entirely been proven correct, through tests or formal methods?
Let's start with tests: If you don't write a test for every single possible case, you will not know if there is not an edge case in your program that you did not catch. Unfortunately, the number of cases for any program that isn't entirely trivial (and thus mostly useless) grows so enormously quickly that you simply cannot write an exhaustive amounts of tests.
So instead, you have to classify your inputs and test "representative" inputs for each of those classes. However, to make the reasoning of what is a perfect classification, i.e. a classification where each input you give in your tests behaves the same (for a reasonable definition) as any other input in that class, requires proving non-trivial properties about the program. And that, the halting theorem tells us, is impossible as well.
In simpler words: In even relatively simple programs, there are more tests to write than available lifetime, but theory also tells us that we can't know how to reduce the test cases to only the interesting ones.