I wonder if they have looked into property based testing in any language which has solid libraries for it. Like the obligatory Haskell examples but also in more widespread languages such as Python, which has the phenomenal Hypothesis library https://hypothesis.readthedocs.io/
Which allows you to build up the kind of what if scenarios, and also has the primary purpose of allowing you to build the sorts of “simulation space” primitives that they are talking about. “I want to test this code with the first argument as odd integers divisible by 7 and the second argument is random days of the week” and Hypothesis will generate dozens to hundreds (how exhaustive it runs is configurable) of scenarios and run them through the code, which basically does offload the kind of mental simulation debugging the author is talking about.
I’ve driven complex Django installable apps to better than 100% test coverage this way. I had already hit 100% test coverage but the property based testing discovered so many more un-handled behaviour branches that it really elevated the overall code quality to another level. I try to use it for all my Python testing whenever I can. Once you learn (and it’s not super hard to learn how to use it for the basics) it it really isn’t hard to use at all for basic testing.
In my experience code quality falls out from making mental simulations easier to produce. Abstraction boundaries should be drawn where mental processing starts feeling burdensome.
When running abstract mental simulations, do you assume that I/O operations succeed or fail for knowable reasons? It seems likely. If the code were expressed to match your mental simulation, for example by not performing I/O within the function, but by merely receiving I/O messages from a foreign actor, it seems like you're not far away from offloading the full mental simulation to something like quickcheck and writing down explicit invariants!
This is why I say my ADHD has made me a better programmer. I can't mentally simulate more than the most basic state interactions. I have to write code that my rainy-Monday-morning, pre-meds, pre-coffee brain can reason about. And I see that as a win/win. It means someone foreign to the codebase has to put less thought into reasoning about it. My brain handles big picture thinking super well, but I can keep about 1.5 pieces of state in my working memory.
I love my IDE. I love interfaces, type systems, and static checks. ADHD pretty much forced me into a very functional programming style, because mutation is so more to think about.
Conversely, some of the most inscrutable and buggy code I've written was when my meds were a tad high and I could easily reason about really complex interactions.
There are multiple ways we can frame the solution to this. Other developers (all of whom are much more capable than I am) have attempted to address this problem by designing new languages, or by designing new development environments. (Eve, LightTable, SmallTalk etc..).
It seems to me right off the bat that the nature of the visualisation required is very much an application dependent thing, making it hard to design a general-purpose solution for any but the most trivial of use cases. Secondly, moving to a new programming language (or even a new development environment) is not something that most of us have the freedom to do in our daily work, and overcoming that inertia is a real challenge.
The vision is, however, compelling enough for us to persist. Besides, giving up is never something that I like to countenance. Perhaps there's an easier and quicker way?
Dynamic languages like Python (and anything else with an 'eval') certainly offer the ability to reload program logic on the fly, and it's not terribly difficult to monitor the file-system and trigger re-loading functionality when it changes. Couple this with a data flow framework that guarantees determinism (e.g. a Kahn Process Network), and you've got the beginnings of something that can run and re-run chunks of program logic and compare the outputs. Couple this with some nice building blocks for program-state introspection and visualisation ... and I think you get a significant distance of the way towards the vision of interactive, dynamic, graphically rich (and distributed!) programming.
I.e. rather than looking for a panacea or a magic bullet, we can just try to use the tools that we have today to get as close as we can to the vision that so many of us share.
> the nature of the visualisation required is very much an application dependent thing
I agree. Here gtoolkit is a recent example that's exploring the building of custom visualizations very easily. It's a compelling environment.
> Secondly, moving to a new programming language (or even a new development environment) is not something that most of us have the freedom to do in our daily work, and overcoming that inertia is a real challenge.
Couldn't agree more. But over time, things could change. Python was a small language a couple of decades ago.
> rather than looking for a panacea or a magic bullet, we can just try to use the tools that we have today to get as close as we can to the vision that so many of us share
Yes this is one way to move forward. The main issue I've seen is supporting 100% of Python becomes too hard since this idea is being retorfitted and the language/environment hasn't been designed with this from the start (consider getattr/setattr and IO). So certainly a large subset of Python (or another dynamic language) withing a custom environment could work as a starting point.
There is already some amount of both practical (Swift/IDEs) as well as theoretical (Bret Victor) ways that do simulation of a specific function.
What's often missing is the visualization or simulation of how code blocks, classes, modules, libraries fit together. Modern frameworks and programming languages hide a lot of detail behind layers which compound this problem.
Certainly a great but missing assist to current IDEs.
I've been working on something along these lines [1].
See the "fork" statement which is close to the "what if" branching logic referenced in OP's article. I've been trying to revisit some of these concepts for chatbots or game AI. Reach out to me if you want to discuss via voice/video, since this topic is what my PhD was on.
TypeScript's type system, with tagged unions, `never` type and type inference being used in an app with good Language Server support (e.g. VSCode) is probably the closest I've seen to this.
The biggest limitation still in place is that it can't reason very well about value ranges (e.g. numbers between 20 and 30) without you enumerating every single possibility (i.e. 20 | 21 | 22 | 23 | ... | 30) which isn't really feasible or desirable for larger ranges.
We're probably very close to this level of generality, though, with the recent string template types being a major step forward in expressiveness.
If you can manually enumerate it, and typescript reasons about it correctly, then it seems to me it would be trivial for the compiler to generate the enumeration? It sounds like the job is already done, at least for slow but correct, they just didn’t finish up
>There is the family of Lisps, Smalltalks and friends, where the system is live and you can set variables to values whenever you wish. The problem there tends to be isolation of speculation. I don't want my what-ifs to turn into has-becomes. "What if X is 33?"... OK done and now the computer shall never forget. I want all speculation to fly in a transient layer above the existing stable codebase.
Is this a real issue? Don’t you just need 2 copies of the code base, and terminate your 2nd copy to make it “transient”?
There’s probably a practical argument to be made against the theoretical possibility of building such a system: I think that just the halting problem is proof that we cannot implement a system that does this.
But I think there are two arguments that might be worth more from a practical perspective: first, there are certainly programs with sections that could be optimized if you knew of some restriction on inputs. However, you could do that with tools available today (turn a variable parameter to a function into a constant, then compile and decompile the program). But by extension there must be programs whose representation is already the simplest form of expressing the computation they do (eg. hailstone sequence).
Second, there are real world influences that reduce a program from theoretical computation to the practice of it (I/O considerations are just one example).
The language that could be fully analyzed would have to strip some of that complexity and in process become sub-Turing-complete.
Since my team at work maintains something like a data flow language, I always find a different question that interests me: what if we selectively gave up Turing completeness? What if we embraced pure functional languages for more of our work? What could that buy us in terms of maintainability/quality of software?
> There’s probably a practical argument to be made against the theoretical possibility of building such a system: I think that just the halting problem is proof that we cannot implement a system that does this.
I think the halting problem only tells you that you cannot, in general, opinionate on what the program does, without expending an amount of computational effort equivalent to running that program. A better counterpoint is that you can't know ahead of time what values will be returned by IO calls.
But then, none of that really matters, because aiming for perfection is the wrong approach. You can make a tool described by the article, if you accept that it'll need to be advised sometimes. That is, at any point where the tool can't compute something - a value of a variable taken from the network, a function to call through a pointer indirection, etc. - it can ask you to give it a default value, give it additional assumption.
You'd also bound it using explicit heuristics, so it doesn't e.g. get stuck in long loops. Simulation of a marked bit of code takes more than 10ms? "Sorry, nope, can't process that." Then you could add options to annotate the offending loop, telling the simulator to finish it after at most 20 iterations. Or something like this. This tool would be a dialogue with code, not a batch job.
We know it's doable, because all such tool would do is to automate the thing we do when looking at code - running pieces of it in our head. In a way, it's just a different interface for doing the same thing you might be doing with unit tests - except it would be transient, and free to exercise any internal code in your program, not just poke at public module interfaces.
> We know it's doable, because all such tool would do is to automate the thing we do when looking at code - running pieces of it in our head.
Yes exactly. Except it would be much faster, more precise and with more coverage than what we get with mental execution. It still wont give you total coverage, but it would be able to show where it doesn't have coverage.
> In a way, it's just a different interface for doing the same thing you might be doing with unit tests - except it would be transient, and free to exercise any internal code in your program, not just poke at public module interfaces.
Yes. One additional aspect would be the ability to run with partial information. Eg. if a function needs three arguments, you need all three to run a test. However in abstract interpretation, you could just specify one argument and still trace the execution to some degree.
> I think the halting problem only tells you that you cannot, in general, opinionate on what the program does, without expending an amount of computational effort equivalent to running that program. A better counterpoint is that you can't know ahead of time what values will be returned by IO calls.
One corolary of the non-computability of the halting problem is that the problem "does this line ever run" is also non-computable
The halting problem only means that the problem "does this line ever run" is not computable for every case, since it's possible to construct undecidable cases. However, not all cases are undecidable and for most lines of most programs simple static analysis can indeed determine that certain lines will definitely run and certain lines won't.
As long as you accept that the solution won't give a yes/no answer but yes/no/maybe, an automated system definitely can opiniate on what the program does; for example, by actually attempting to run the involved fragments for a limited time.
This is https://observablehq.com the way you can partially modify code while retaining program state is amazing, plus it works with a production debugger (chrome devtools)
Something we need is languages with conditioned types, that allow us to set a variable to match a condition without assigning a specific value, like this:
let x = a number that is < 7
That way we won't need to test with lots of random numbers.
There's nothing wrong with having a loose cached copy of a program in your head. The real problem is getting distracted and having to rebuild that cache! This is why open plan offices don't really work for coding. You need as few distractions as possible, and have to be able to dial in and have laser focus and have everything else tuned out. One thing I can't forgo is music however, as it makes all your coding seem epic, and activates creative parts of the brain.
It feels like the author is describing a healthy type system and language purity.
In most programming languages, the pain points they describe are real - how do we know that a program will give back exactly what we put in?
The answer is to provide it with an appropriately restrictive type such that any output is necessarily of the right shape. Types allow us to do this exact kind of "offloading", so that we simply trust the compiler to know when we are plumbing things together correctly.
Language purity is really helpful with trusting that a piece of code does what it says it does, and nothing else. If our language is impure then the scope of any given function is essentially unbounded. With purity, it can only perform computations based on the things that are locally available to it, and that severely bounds the amount of leeway a programmer has in its implementation - and hence the number of ways things can go wrong.
In short, an effective, ergonomic, statically-typed programming language can solve most or all of the author's woes. I hope they find one that fits their needs.
It sounds to me like the author wants to see how their code behaves on certain inputs instead of predicting its correctness mentally. Current static type systems don't solve this. There are proof assistants, but limited to academia.
For a concrete (and possibly oversimplified) counterexample, try to use your favorite static type system to verify that a function returns the n-th prime.
There is a difference between the shape of a thing and the specific instance of the thing. To explore the operation of code at an edge case you need to create an instance of the thing that represents that edge case.
Which allows you to build up the kind of what if scenarios, and also has the primary purpose of allowing you to build the sorts of “simulation space” primitives that they are talking about. “I want to test this code with the first argument as odd integers divisible by 7 and the second argument is random days of the week” and Hypothesis will generate dozens to hundreds (how exhaustive it runs is configurable) of scenarios and run them through the code, which basically does offload the kind of mental simulation debugging the author is talking about.
I’ve driven complex Django installable apps to better than 100% test coverage this way. I had already hit 100% test coverage but the property based testing discovered so many more un-handled behaviour branches that it really elevated the overall code quality to another level. I try to use it for all my Python testing whenever I can. Once you learn (and it’s not super hard to learn how to use it for the basics) it it really isn’t hard to use at all for basic testing.